|
Post by toughdiamond on Apr 15, 2021 23:33:12 GMT
Oh good, with a built in Files command (I didn't know) and I forgot about new Sort command also built in that should help tremendously with speed (that surely must work normal when comparing strings). Yes I've just tried the SORT command and it's very quick indeed Well, the SORT command doesn't seem so hard to figure out, but if there's a command-line switch that will order the whole shebang rather than just per directory, that might be easier. Either way should be pretty quick to run, because that's not the process that's up against the exponential rise in the number of times it has to run. You get 50% usage on the CPU because you are using a single core, 50% of the dual core capacity. But you cant use both cores with Just BASIC, or many other programs. And even if I could use both cores, I'd be wary of running the CPU so hard unless I knew it wouldn't be for very long. Thanks for the info about FILES. I'm still struggling to get used to that command. But the SORT command proved so easy and lightning-fast that I'm hoping that will be all I need. So, once I've got the program to put the filenames in alphabetical order, what's the most efficient way of making use of that and speeding up the search for matches that's so critical to the total running time? I was thinking one way might be to create an index - perhaps pointing to the start of the files beginning with "a," with "b," "c," etc., and to get the match-search routine to only search through the elements that begin with the same letter as the filename it was looking for, if that makes any sense. I'm starting to see how converting everything to the same case would help, for one thing. But I suspect that if I plough ahead I'd be reinventing the wheel. What would be the conventional way to search efficiently through an ordered list?
|
|
|
Post by toughdiamond on Apr 16, 2021 0:22:55 GMT
Rod's code ought to cut the amount of comparing by vast amounts. I would guess around 2 * amount of files. You're very likely correct, but I don't understand how it works yet, so if I try to adapt it to fit my own program I think I'll probably just make a mess of it. Some of the programming I've seen since finding this place the other day is like nothing I've ever seen before. Very impressive. I'm a slow learner though. I'll keep looking at that code, and maybe the penny will drop eventually. Meanwhile an efficient method for searching the ordered list of filenames I have would probably be enough to do the trick.
|
|
|
Post by B+ on Apr 16, 2021 4:07:03 GMT
For understanding I will attempt to describe:
To start you have two sorted lists and an index for each that starts at 1st item in each. list1() and list2() and index1 and index2 that track where you are in the 2 lists.
You look at the first item in the first list, list1(index1) and compare to first item in 2nd list, list2(index2).
It will < or = or > item in 2nd list.
If less than, it is unique the 2nd list hasn't got it, so note that and shimmy up 1 on the first list, index1 = index1 + 1. Goto compare again
If it is equal to item in list2(index2), note the duplicate and shimmy both up one notch (increase each index by 1) to compare new items in each list. Goto compare again.
If it (the item in list 1) > item in list 2, note the item in list 2 as unique and shimmy list 2 item up one notch and goto compare again.
Keep this up until you get to end of one of the lists and then everything left is unique from list remaining.
Test the idea out with 2 small lists of 10 items to debug until you see it working, then change to big numbers and lists.
|
|
|
Post by B+ on Apr 16, 2021 4:08:09 GMT
Funny how after I describe something I have to code it to see that it works as expected.
It does here:
' Ladder (my name for this method B+ 2021-04-16 dim list1$(9), list2$(9), all$(19), where(19) ' where 1 for list 1, 2 for list 2, 3 for both i1 = 0 : i2 = 0 : ia = 0 ' <<<<<<< these are indexes they track where we are in each array 'create lists (the lists don't have to be equal in length, you have to know number in each though.) for i = 0 to 9 list1$(i) = RndStr$() list2$(i) = RndStr$() 'print list1$(i), list2$(i) 'check raw OK next ' list1$(9) = "FFF" 'test extreme ' list2$(9) = "FFF" sort list1$(), 0, 9 sort list2$(), 0, 9 'for i = 0 to 9 'print list1$(i), list2$(i) 'check sort OK 'next
while i1 < 10 and i2 < 10 if list1$(i1) < list2$(i2) then all$(ia) = list1$(i1) where(ia) = 1 ia = ia + 1 i1 = i1 + 1 else if list1$(i1) > list2$(i2) then all$(ia) = list2$(i2) where(ia) = 2 ia = ia + 1 i2 = i2 + 1 else all$(ia) = list2$(i2) where(ia) = 3 ia = ia + 1 i1 = i1 + 1 i2 = i2 + 1 end if end if wend if i1 > 9 then 'finish out list 2 for i = i2 to 9 all$(ia) = list2$(i) where(ia) = 2 ia = ia + 1 next else 'finish out list 1 but possible to end with = items so check i1 if i1 < 10 then for i = i1 to 9 all$(ia) = list1$(i) where(ia) = 1 ia = ia + 1 next end if end if ' show results print "index", "list 1", "list 2", "item", "source (3 = in both)" for i = 0 to ia - 1 if i < 10 then print i, list1$(i), list2$(i), all$(i), where(i) else print i, "***", "***", all$(i), where(i) end if tot = tot + where(i) next print "All present and accounted for 30 ?= ";tot
Function RndStr$() 'limit possibilities to increase duplicate rl = Int(Rnd(0) * 3) + 1 b$ = "" For j = 1 To rl b$ = b$ + Mid$("ABCDEF", Int(Rnd(0) * 6) + 1, 1) Next RndStr$ = b$ End Function
Sample Output:
index list 1 list 2 item source (3 = in both) 0 AAC BBA AAC 1 1 AB BE AB 1 2 AC DBD AC 1 3 BBA ECC BBA 3 4 BBC F BBC 1 5 BCE F BCE 1 6 C FB BE 2 7 CE FD C 1 8 CFD FE CE 1 9 FB FFB CFD 1 10 *** *** DBD 2 11 *** *** ECC 2 12 *** *** F 2 13 *** *** F 2 14 *** *** FB 3 15 *** *** FD 2 16 *** *** FE 2 17 *** *** FFB 2 All present and accounted for 30 ?= 30
|
|
|
Post by Rod on Apr 16, 2021 6:21:23 GMT
The very important point about my code is that the files are compared once only. So it will process the task in milliseconds not days. Please cast aside your doubts and experiment.
|
|
|
Post by toughdiamond on Apr 16, 2021 6:24:24 GMT
For understanding I will attempt to describe...... Thanks. That's starting to make a little more sense, though being unfamiliar with WHILE....WEND it's an uphill struggle......I see there are 3 of those "in series," nested within a fourth one in this code. I understand in principle what WHILE...WEND does - anything between WHILE and WEND only gets done when the argument is true - and I guess it must implicitly act as a loop, as there's nothing else in the code that tells it to loop, though clearly it wouldn't get far if it didn't loop. I've been using various dialects of BASIC for around 40 years and somehow managed to dodge having to fathom that. I guess the earlier dialects just didn't have it, and so I got into the habit of using other ways of achieving the same result, and so never saw the point of finding out about it. Like I said, I'm self-taught, and mostly have only learned a new command when the program I was writing couldn't be made to work any other way. As a slow learner it'll probably be a few days before I can interpret the thing with any confidence. Yes, hands-on experience with it ought to move things forward - I never could fathom much unless I did it myself. As for the SORT command, I've reproduced the bug - it doesn't completely separate lower case from upper case, but that can be put right with UPPER$ or LOWER$, if I don't find the right switch for the directory dump first......or if the DIR command in the code mentioned above doesn't do a better job than SORT.
|
|
|
Post by toughdiamond on Apr 16, 2021 6:42:34 GMT
The very important point about my code is that the files are compared once only. So it will process the task in milliseconds not days. I please cast aside your doubts and experiment. I'll do that. Any doubt on my part is with my own abilities, not yours. So for example, the idea that it's possible to do the job in only one pass rather than gazillions, that astonishes me, which isn't to say I don't believe it, I just can't see how it could possibly work, even though it does. Like quantum mechanics.
|
|
|
Post by Rod on Apr 16, 2021 6:48:59 GMT
While / Wend is just another way to loop. For / Next could do the job but While / Wend suits the task and breaks out the loop at the top cleanly. Just BASIC SORTS in dictionary order. The point that B+ makes is that dictionary order prioritises capital letters Ab sorts before ab and it is not an ascii sort. But don’t get hung up on that the sort is fast and powerful. Force lower case if needs be but the files “should” be the same case and perhaps need reported if not.
Also FILES command gets folders and files for the target directory. So it is relatively easy to iteratively use my sample code against a list of directories. Please spend a little time with both files and sort commands. Your time will not be wasted.
Reread the post I linked about files to see the structure of the result array. It looks more complex than it is. The directories found are listed in the array immediately after the files found. So it is easy to get a list of directories found and use that in a new files command to get its files and folders.
|
|
|
Post by toughdiamond on Apr 16, 2021 7:01:35 GMT
Funny how after I describe something I have to code it to see that it works as expected...... So is that essentially the same as Rod's code, apart from the fact that yours generates its own test data arrays rather than takes them in from a real-life folder?
|
|
|
Post by B+ on Apr 16, 2021 15:57:17 GMT
Funny how after I describe something I have to code it to see that it works as expected...... So is that essentially the same as Rod's code, apart from the fact that yours generates its own test data arrays rather than takes them in from a real-life folder? I would say correct, I have one master decision maker to do the compares and skip the 3 inner While... Wend loops but same difference I think, with 3 less While... Wend might be less scary and easier to follow. The main While is checking that you are still in bounds of your arrays with the 2 indexes, less you error out by calling an index of array out of it's range. On technical note While... Wend is way faster loop than For... Next and comparable to Do... Loop with Boolean Expressions (Expressions that evaluate to True|False or 0|1) I am certain there are command switch options with Windows dir command that will give the list of the whole drive or the list of a folder and ALL it's subfolders and files into one file, the files would likely be fully pathed, but as Rod says practice with JB's Sort and Files won't be time wasted.
|
|
|
Post by Rod on Apr 16, 2021 18:29:22 GMT
You start with one list being the lead list. While the lead list is ahead of the second list you have an extra file, print it. Print files till you are back in sync. While you are in sync, or if you were in sync anyway, just move on till you are out of sync. When you are out of sync you will either be ahead or behind of the lead list. Either way one of the While Wend loops will pick up and progress the comparison.
So you are either ahead, in sync or behind. You need three loops to catch each condition and one overarching loop to move to completion.
|
|
|
Post by toughdiamond on Apr 16, 2021 19:18:06 GMT
You start with one list being the lead list. While the lead list is ahead of the second list you have an extra file, print it. Print files till you are back in sync. While you are in sync, or if you were in sync anyway, just move on till you are out of sync. When you are out of sync you will either be ahead or behind of the lead list. Either way one of the While Wend loops will pick up and progress the comparison. So you are either ahead, in sync or behind. You need three loops to catch each condition and one overarching loop to move to completion. Ah, that explains how your algorithm would be much more efficient. Ingenious! Does it still figure it out correctly if (for example) there's one missing file in the first list followed immediately by one missing file in the second? The sync would look like this - (the integers represent filenames): 1 2 3 5 6 1 2 4 5 6 I'll wager you've thought of that already, but I'm not yet at the point where I can verify it by looking at the code. I should crack on and test it myself, but I'm still at the first step of deciding whether to use the command-line DIR or the JB's FILES to create the lists. So many new ideas all at once that my head spins, but I'll get there.
|
|
|
Post by tsh73 on Apr 16, 2021 19:58:13 GMT
Indeed it is
files found in 1 :5 files found in 2 :5 (1,1) (2,2) (3,3) Unique to 1 3 (4,3) Unique to 2 4 (4,4) (5,5) (6,6)
'1 2 3 5 6 '1 2 4 5 6
n=5
dim info1$(10,10) dim info2$(10,10) 'files "c:\basic\a scratchpad", info1$() 'files "e:\basic\a scratchpad", info2$() info1$(0,0)=str$(n) info2$(0,0)=str$(n) for i = 1 to n info1$(i,0)=word$("1 2 3 5 6",i) next for i = 1 to n info2$(i,0)=word$("1 2 4 5 6",i) next
no1=val(info1$(0,0)) no2=val(info2$(0,0)) print "files found in 1 :";no1 print "files found in 2 :";no2 i1=1 i2=1 print "(";i1;",";i2;")" while i1<no1 or i2<no2 while info1$(i1,0)<info2$(i2,0) and i1<=no1 print "Unique to 1 ";info1$(i1,0) i1=i1+1 print "(";i1;",";i2;")" wend while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2 i1=i1+1 i2=i2+1 print "(";i1;",";i2;")" wend while info1$(i1,0)>info2$(i2,0) and i2<=no2 print "Unique to 2 ";info2$(i2,0) i2=i2+1 print "(";i1;",";i2;")" wend wend
end
|
|
|
Post by Rod on Apr 16, 2021 20:03:06 GMT
Sigh... my life is complete, code that neither Anatoly or B+ can better
|
|
|
Post by toughdiamond on Apr 16, 2021 20:19:55 GMT
Indeed it is files found in 1 :5 files found in 2 :5 (1,1) (2,2) (3,3) Unique to 1 3 (4,3) Unique to 2 4 (4,4) (5,5) (6,6)
Perfect In that case, I don't see how it could ever fail.
|
|