|
Post by toughdiamond on Apr 17, 2021 18:09:48 GMT
Force lower case if needs be but the files “should” be the same case and perhaps need reported if not. Having looked again at the "buggy" results of SORT, it turns out they're not buggy at all - just that the command ignores case, which doesn't matter if I use your method for finding uniques. It tripped me up at the time because back then I was still trying to streamline my long method of comparing each filename in one list with each filename in the other, so I was trying to split the alphabetized list into shorter sections based on (the ASCII code of) the leftmost character. At that point, the prospect of adapting your code instead began to look rather less daunting. It's hard to know how long it'll be before I've got it ready to do a test drive, but it's looking good so far. I'll be back.
|
|
|
Post by B+ on Apr 17, 2021 20:39:00 GMT
Sigh... my life is complete, code that neither Anatoly or B+ can better Rod Sorry, you shouldn't have bragged, your life not complete, your code buggy under certain circumstances that you would have quickly found had you tested more border conditions. I looked at all the comparing your code was doing and was sure it was slower than mine so I tried to put your code in my test environment that I had set up for 2 lists of 10 items, now works for any amount (see below) but 10 is enough to begin to show problems. I have been trying to fix and fix to get it to work in my test environment and it won't fix easily that I can see. Never got it to work on every test, either out of bounds error or incomplete listing. Take a look: ' Ladder B+ versus Rod Time Test 'B+ 2021-04-17 ' I had to modify my version to base 1 arrays to match Rods code and then to be more generic with nItems instead of 10
' I gave up trying to get Rod's code to work under my test conditions
nItems = 10 'per list dim list1$(nItems), list2$(nItems) i1 = 1 : i2 = 1 : ia = 1 ' modify to base 1 arrays indexes to each list and all$ = where index
'create lists use same list for testing this is same for both methods for i = 1 to nItems list1$(i) = RndStr$() list2$(i) = RndStr$() 'print list1$(i), list2$(i) 'check raw OK next 'test extreme case when both lists end on same items and list2$ has it a number of times list1$(nItems) = "FFF" list2$(nItems) = "FFF" list2$(nItems-1) = "FFF" list2$(nItems-2) = "FFF"
sort list1$(), 1, nItems sort list2$(), 1, nItems if nItems <= 1000 then for i = 1 to nItems print list1$(i), list2$(i) 'check sort OK next end if tryRod = 1
' reset arrays and index for each method dim all$(2*nItems), where(2*nItems) ' where 1 for list 1, 2 for list 2, 3 for both i1 = 1 : i2 = 1 : ia = 1 s=time$("ms")
if tryRod goto [Rod] print " B+ method ==============================================" while i1 <= nItems and i2 <= nItems ' mod for < 10 to <= nItems if list1$(i1) < list2$(i2) then all$(ia) = list1$(i1) where(ia) = 1 ia = ia + 1 i1 = i1 + 1 else if list1$(i1) > list2$(i2) then all$(ia) = list2$(i2) where(ia) = 2 ia = ia + 1 i2 = i2 + 1 else all$(ia) = list2$(i2) where(ia) = 3 ia = ia + 1 i1 = i1 + 1 i2 = i2 + 1 end if end if wend if i1 > nItems then 'finish out list 2 for i = i2 to nItems ' mod to nItems all$(ia) = list2$(i) where(ia) = 2 ia = ia + 1 next else 'finish out list 1 but possible to end with = items so check i1 if i1 <= nItems then for i = i1 to nItems ' mod to nItems all$(ia) = list1$(i) where(ia) = 1 ia = ia + 1 next end if end if ' B+ method end if tryRod = 0 then goto [Finish]
[Rod] print " Rod method ================================== " while i1<=nItems or i2<=nItems 'this needed = otherwise it wouldn't finish 'if i1 > nItems or i2 > nItems then exit while ' fix Rod's code while list1$(i1)<list2$(i2) and i1<=nItems 'print "Unique to 1 ";list1$(i1,0) all$(ia) = list1$(i1) where(ia) = 1 ia = ia + 1 'print ia i1=i1+1 'if i1 > nItems then exit while ' fix Rod's code 'maybe not needed wend 'if i1 > nItems or i2 > nItems then exit while ' fix Rod's code while list1$(i1)=list2$(i2) and i1<=nItems and i2<=nItems ' Rod was not outputting anything all$(ia) = list2$(i2) where(ia) = 3 ia = ia + 1 'print ia i1=i1+1 i2=i2+1 'if i1 > nItems or i2 > nItems then exit while ' fix Rod's code needed!! wend 'if i1 > nItems or i2 > nItems then exit while ' fix Rod's code 'maybe not needed while list1$(i1)>list2$(i2) and i2<=nItems 'print "Unique to 2 ";list2$(i2,0) all$(ia) = list2$(i2) where(ia) = 2 ia = ia + 1 'print ia i2=i2+1 'if i2 > nItems then exit while ' fix Rod's code 'needed!! ocasionally wend wend
[Finish] print "Completed in ms ";time$("ms")-s tot = 0 'reinit ' show results This is same for both tests print "index", "list 1", "list 2", "item", "source (3 = in both)" for i = 1 to ia - 1 'modify to base 1 arrays if i <= nItems then ' mod to nItems if nItems <= 1000 then print i, list1$(i), list2$(i), all$(i), where(i) else if nItems <= 1000 then print i, "***", "***", all$(i), where(i) end if tot = tot + where(i) next print "All present and accounted for ";3 * nItems;"?= ";tot
Function RndStr$() 'limit possibilities to increase duplicate rl = Int(Rnd(0) * 3) + 1 b$ = "" For j = 1 To rl b$ = b$ + Mid$("ABCDEF", Int(Rnd(0) * 6) + 1, 1) Next RndStr$ = b$ End Function
|
|
|
Post by tsh73 on Apr 18, 2021 6:16:30 GMT
I think out-of-bounds error is because of compound condition in While
while list1$(i1)=list2$(i2) and i1<=nItems and i2<=nItems If you write something like that in C
while (i1<=nItems && i2<=nItems && list1$(i1)==list2$(i2)) it will stop evaluate condition just after first FALSE, never touching list1$(i1) out of bounds. (it is designed and documented to work this way) But JB will evaluate whole string so i1<=nItems is false but list1$(i1) still accessed - in condition! erroring with OutOfBounds.
In this particular case, dimming array for one item more will circumvent it
dim list1$(nItems+1), list2$(nItems+1)
Not sure if it could be general fix.
|
|
|
Post by toughdiamond on Apr 18, 2021 7:05:25 GMT
Watching......
|
|
|
Post by Rod on Apr 18, 2021 8:44:35 GMT
Glad we have such good coders on board. I think my error was in not catching any trailing files. So that needs another pair of loops for each array. Now when the comparison runs out of pairs it will list any trailing files. B+ will need to confirm if that was the error he was seeing.
Whether this is faster or slower is pretty relative compared to the timing of the original strategy. I remember competing in a sorting challenge on the forum here. I was fairly proud of a thirty second bubble sort. Welopez smashed the time with a quicksort routine that took about 20ms, I had never seen such code before. Its all been done before somewhere by someone.
This is Anatolys changes to my code tweaked a little to test more boundary conditions.
n1=7 n2=6 dim info1$(10,10) dim info2$(10,10) 'files "c:\basic\a scratchpad", info1$() 'files "e:\basic\a scratchpad", info2$() info1$(0,0)=str$(n1) info2$(0,0)=str$(n2) for i = 1 to n1 info1$(i,0)=word$("0 1 2 3 5 6 8",i) next for i = 1 to n2 info2$(i,0)=word$("1 2 4 5 6 7",i) next
no1=val(info1$(0,0)) no2=val(info2$(0,0)) print "number of files found in 1 :";no1 print "number of files found in 2 :";no2 i1=1 i2=1 while i1<no1 or i2<no2 while info1$(i1,0)<info2$(i2,0) and i1<=no1 print "Unique to 1 ";info1$(i1,0) i1=i1+1 wend while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2 i1=i1+1 i2=i2+1 wend while info1$(i1,0)>info2$(i2,0) and i2<=no2 print "Unique to 2 ";info2$(i2,0) i2=i2+1 wend wend while i1<=no1 print "Unique to 1 ";info1$(i1,0) i1=i1+1 wend while i2<=no2 print "Unique to 2 ";info2$(i2,0) i2=i2+1 wend
end
|
|
|
Post by tsh73 on Apr 18, 2021 11:32:01 GMT
general solution for compound while condition (while index not exceeds bound and some contition on array(index)) convert
while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2 i1=i1+1 i2=i2+1 wend to
while i1<=no1 and i2<=no2 if not(info1$(i1,0)=info2$(i2,0)) then exit while i1=i1+1 i2=i2+1 wend
, having "index not exceeds bound " WHILE condition and moving other part of condition inside IF NOT(condition) WHEN EXIT WHILE.
As for extra loop Rod just added to process "tail" - likely this could not be avoided?
|
|
|
Post by Rod on Apr 18, 2021 13:18:04 GMT
I put in more printing to let you see where the comparisons are made. Seems to work well, I have tried a good few combinations but perhaps I am not seeing the error.
n1=7 n2=7 dim info1$(10,10) dim info2$(10,10) 'files "c:\basic\a scratchpad", info1$() 'files "e:\basic\a scratchpad", info2$() info1$(0,0)=str$(n1) info2$(0,0)=str$(n2) for i = 1 to n1 info1$(i,0)=word$("1 2 3 4 5 6 7",i) next for i = 1 to n2 info2$(i,0)=word$("1 2 3 4 5 6 7",i) next
no1=val(info1$(0,0)) no2=val(info2$(0,0)) print "number of files found in 1 :";no1 print "number of files found in 2 :";no2 i1=1 i2=1 while i1<no1 or i2<no2 while info1$(i1,0)<info2$(i2,0) and i1<=no1 print "Upper loop, Unique to 1 ";info1$(i1,0) i1=i1+1 wend while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2 print "Equal loop" i1=i1+1 i2=i2+1 wend while info1$(i1,0)>info2$(i2,0) and i2<=no2 print "Lower loop,Unique to 2 ";info2$(i2,0) i2=i2+1 wend wend while i1<=no1 print "Tail 1 loop Unique to 1 ";info1$(i1,0) i1=i1+1 wend while i2<=no2 print "Tail 2 loop Unique to 2 ";info2$(i2,0) i2=i2+1 wend
end
|
|
|
Post by Rod on Apr 18, 2021 14:23:12 GMT
@ B+ The <= is taken care of in the inner loops, the outer loop is just a control that ensures the looping stops.
|
|
|
Post by B+ on Apr 18, 2021 15:59:33 GMT
My hypothesis is that Rod's original code did not encounter duplicates in either file list though duplicates can exist between lists. That is extremely likely to happen with file lists from one drive ie if files were the same name they would have to be in another folder so they would be unique by path.
This is about an hour after I posted my experiments above. So right after supper I started a No Duplicates in list test code and ran into the Weird Unfinished For loop I posted here in Discussion Board. I think without Duplicates the code would run into less out of bounds error and less incomplete listings. A duplicate at the end of both lists would end correctly noting it is in both files, I think.
I think I added the 2 ='s in this line (the outer WHILE loop):
while i1 <= nItems and i2 <= nItems ' mod for < 10 to <= nItems In attempts to get contents of both lists to complete for full listing of each list, if that loop was only to stop the looping then likely can set it back < without = specially when I get No Duplications in each list fixed up.
|
|
|
Post by tsh73 on Apr 18, 2021 18:07:23 GMT
I tried <= in outer loop but "tail" was not processed, so extra loop after main is still needed.
|
|
|
Post by B+ on Apr 18, 2021 18:42:51 GMT
Yes! I now have Rod's code working in my test environment with these mods commented here:
[Rod] print " Rod method ================================== " ' MOD 0: No Duplicates in either list but Duplicates allowed between lists ' MOD 1: Dim lists 1 more than nItems to remove out of bounds errors ' MOD 2: use <= nItems instead of just < nItems to get complete accounting of both lists while i1 <= nItems or i2 <= nItems ' add = to both tests so that we can get complete listings, while list1$(i1)<list2$(i2) and i1<=nItems 'print "Unique to 1 ";list1$(i1,0) all$(ia) = list1$(i1) where(ia) = 1 ia = ia + 1 i1=i1+1 wend while list1$(i1)=list2$(i2) and i1<=nItems and i2<=nItems ' Rod was not outputting anything when item in both lists all$(ia) = list2$(i2) where(ia) = 3 ia = ia + 1 i1=i1+1 i2=i2+1 wend while list1$(i1)>list2$(i2) and i2<=nItems 'print "Unique to 2 ";list2$(i2,0) all$(ia) = list2$(i2) where(ia) = 2 ia = ia + 1 i2=i2+1 wend wend
Preliminary tests suggest it is going to take allot of items on lists to see any time difference between B+ code and Rod's if there is any, so as I said before, basically they are the same the caveat being we are assuming Non Duplicated lists. After all, I did get the "ladder method" idea from Rod and dropped Binary search idea like a hot potato! ;-))
|
|
|
Post by Rod on Apr 18, 2021 18:43:24 GMT
edit: Cross posted with B+, off to see what he has cooked up.
So, the outer loop < is deliberate. The original code, still unchanged, handles matched files, unique files and duplicate files in either list. It handles duplicates in that it will match pairs till it gets one or either unique. So if there are three rod.dat files, one in list one and two in list two it will list the third one as unique to list two.
However duplicate file names in the same directory is kinda extreme.
I did forget the trailing files but the final two loops catch any tail end differences.
If there is still something I am missing please demonstrate by altering the numeric listings in the most recent code.
I am sure B+ has a point about his alternative method. Will be good to see it working.
|
|
|
Post by toughdiamond on Apr 19, 2021 7:18:28 GMT
Looking forward to one or other of the methods getting the green light for me to try grafting onto my new program. Do you think it's ready for me to try as is? I don't mind if it's not perfect yet, and it'd be good to try it in a real-life situation with my big volumes, just that I'm not quite sure which block of code to copy and paste in - is there a "final form" yet or am I jumping the gun?
|
|
|
Post by Rod on Apr 19, 2021 10:46:22 GMT
Often it is worthwhile dummying up some processing if you have concerns about limits or lengths of time. So this code builds two 80000 file arrays messes them up a little and then hunts for differences. It gets the job done in 700ms
How fast do you need it to be? Even if B+ gets it to run in half the time does it now matter? Which set of code have you understood best? thats more like the deciding factor, if you can better understand one set then use it. they both solve the problem.
no1=80000 no2=80000 dim info1$(80001,10) dim info2$(80001,10)
'build some dummy file lists t=time$("ms") for i = 1 to 80000 info1$(i,0)="file";str$(i) info2$(i,0)="file";str$(i) next
'make a few files unique for n= 1 to 5 i=int(rnd(0)*80000) info1$(i,0)=info1$(i,0)+"0" i=int(rnd(0)*80000) info2$(i,0)=info2$(i,0)+"0" next
print "built two file arrays of 80000 files in ";time$("ms")-t;"ms" t=time$("ms")
print "number of files found in 1 :";no1 print "number of files found in 2 :";no2 i1=1 i2=1 while i1<no1 or i2<no2 while info1$(i1,0)<info2$(i2,0) and i1<=no1 print "Upper loop,Unique to 1 ";info1$(i1,0) i1=i1+1 wend while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2 i1=i1+1 i2=i2+1 wend while info1$(i1,0)>info2$(i2,0) and i2<=no2 print "Lower loop,Unique to 2 ";info2$(i2,0) i2=i2+1 wend wend while i1<=no1 print "Tail 1 loop Unique to 1 ";info1$(i1,0) i1=i1+1 wend while i2<=no2 print "Tail 2 loop Unique to 2 ";info2$(i2,0) i2=i2+1 wend print "matched 80000 files in ";time$("ms")-t;"ms" end
|
|
|
Post by toughdiamond on Apr 19, 2021 17:33:53 GMT
How fast do you need it to be? Even if B+ gets it to run in half the time does it now matter? Which set of code have you understood best? thats more like the deciding factor, if you can better understand one set then use it. they both solve the problem. I'm happy with any speed that doesn't take hours to run. Both sets of code are about equal when it comes to my understanding them - i.e. still as clear as mud for the most part, but if I can get one to work then that might help me to figure out what goes on. I was about to try adapting one set of code to fit my program yesterday but was stopped by the realisation that I'm not quite sure what all$() and nItems are, don't know what value to give nItems (sum of both lists?), and I'm wary of declaring any more large arrays than absolutely necessary because I already have plans to use more than the two that hold the filenames of the two volumes - pathnames, dates and sizes - and I fear I might end up exceeding JB's 256mb memory limit. Amyway, the code in your last post doesn't seem to contain the array or variable that stopped me before, so I'll see if I can wire it up to my program. For filename sources I've stuck to my old way of using DOS to make directory dumps as text files, which provides a convenient way of testing any uniques-finding code - I just make 2 copies of the same directory dump and then alter lines in one or both of them, then run the code to see if it finds out what I did.
|
|