Any tips on speeding up a JustBASIC program?

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 17, 2021 18:09:48 GMT Rod likes this

Quote

Post by toughdiamond on Apr 17, 2021 18:09:48 GMT

Apr 16, 2021 6:48:59 GMT Rod said:

Force lower case if needs be but the files “should” be the same case and perhaps need reported if not.

Having looked again at the "buggy" results of SORT, it turns out they're not buggy at all - just that the command ignores case, which doesn't matter if I use your method for finding uniques. It tripped me up at the time because back then I was still trying to streamline my long method of comparing each filename in one list with each filename in the other, so I was trying to split the alphabetized list into shorter sections based on (the ASCII code of) the leftmost character. At that point, the prospect of adapting your code instead began to look rather less daunting.

It's hard to know how long it'll be before I've got it ready to do a test drive, but it's looking good so far. I'll be back.

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 17, 2021 20:39:00 GMT

Quote

Post by B+ on Apr 17, 2021 20:39:00 GMT

Apr 16, 2021 20:03:06 GMT Rod said:

Sigh... my life is complete, code that neither Anatoly or B+ can better

Rod Sorry, you shouldn't have bragged, your life not complete, your code buggy under certain circumstances that you would have quickly found had you tested more border conditions.

I looked at all the comparing your code was doing and was sure it was slower than mine so I tried to put your code in my test environment that I had set up for 2 lists of 10 items, now works for any amount (see below) but 10 is enough to begin to show problems. I have been trying to fix and fix to get it to work in my test environment and it won't fix easily that I can see. Never got it to work on every test, either out of bounds error or incomplete listing.

Take a look:


' Ladder B+ versus Rod Time Test 'B+ 2021-04-17
' I had to modify my version to base 1 arrays to match Rods code and then to be more generic with nItems instead of 10

' I gave up trying to get Rod's code to work under my test conditions

nItems = 10  'per list
dim list1$(nItems), list2$(nItems)
i1 = 1 : i2 = 1 : ia = 1 ' modify to base 1 arrays indexes to each list and all$ = where index

'create lists use same list for testing  this is same for both methods
for i = 1 to nItems
    list1$(i) = RndStr$()
    list2$(i) = RndStr$()
    'print list1$(i), list2$(i) 'check raw OK
next
'test extreme case when both lists end on same items and list2$ has it a number of times
list1$(nItems) = "FFF"
list2$(nItems) = "FFF"
list2$(nItems-1) = "FFF"
list2$(nItems-2) = "FFF"

sort list1$(), 1, nItems
sort list2$(), 1, nItems
if nItems <= 1000 then
for i = 1 to nItems
    print list1$(i), list2$(i) 'check sort OK
next
end if
tryRod = 1


' reset arrays and index for each method
dim all$(2*nItems), where(2*nItems)  ' where 1 for list 1, 2 for list 2, 3 for both
i1 = 1 : i2 = 1 : ia = 1
s=time$("ms")

if tryRod goto [Rod]
print " B+ method =============================================="
while i1 <= nItems and i2 <= nItems  ' mod for < 10 to <= nItems
    if list1$(i1) < list2$(i2) then
        all$(ia) = list1$(i1)
        where(ia) = 1
        ia = ia + 1
        i1 = i1 + 1
    else
        if list1$(i1) > list2$(i2) then
            all$(ia) = list2$(i2)
            where(ia) = 2
            ia = ia + 1
            i2 = i2 + 1
        else
            all$(ia) = list2$(i2)
            where(ia) = 3
            ia = ia + 1
            i1 = i1 + 1
            i2 = i2 + 1
        end if
    end if
wend
if i1 > nItems then 'finish out list 2
    for i = i2 to nItems ' mod to nItems
        all$(ia) = list2$(i)
        where(ia) = 2
        ia = ia + 1
    next
else 'finish out list 1 but possible to end with = items so check i1
    if i1 <= nItems then
        for i = i1 to nItems ' mod to nItems
            all$(ia) = list1$(i)
            where(ia) = 1
            ia = ia + 1
        next
    end if
end if
' B+ method end
if tryRod = 0 then goto [Finish]

[Rod]
print " Rod method ================================== "
while i1<=nItems or i2<=nItems  'this needed = otherwise it wouldn't finish
    'if i1 > nItems or i2 > nItems then exit while  ' fix Rod's code
    while list1$(i1)<list2$(i2) and i1<=nItems
        'print "Unique to 1 ";list1$(i1,0)
        all$(ia) = list1$(i1)
        where(ia) = 1
        ia = ia + 1
        'print ia
        i1=i1+1
        'if i1 > nItems then exit while  ' fix Rod's code 'maybe not needed
    wend
    'if i1 > nItems or i2 > nItems then exit while  ' fix Rod's code
    while list1$(i1)=list2$(i2) and i1<=nItems and i2<=nItems
        ' Rod was not outputting anything
        all$(ia) = list2$(i2)
        where(ia) = 3
        ia = ia + 1
        'print ia
        i1=i1+1
        i2=i2+1
        'if i1 > nItems or i2 > nItems then exit while  ' fix Rod's code  needed!!
    wend
    'if i1 > nItems or i2 > nItems then exit while  ' fix Rod's code  'maybe not needed
    while list1$(i1)>list2$(i2) and i2<=nItems
        'print "Unique to 2 ";list2$(i2,0)
        all$(ia) = list2$(i2)
        where(ia) = 2
        ia = ia + 1
        'print ia
        i2=i2+1
        'if i2 > nItems then exit while ' fix Rod's code 'needed!! ocasionally
    wend
wend


[Finish]
print "Completed in ms ";time$("ms")-s
tot = 0 'reinit
' show results  This is same for both tests
print "index", "list 1", "list 2", "item", "source (3 = in both)"
for i = 1 to ia - 1 'modify to base 1 arrays
    if i <= nItems then ' mod to nItems
        if nItems <= 1000 then print i, list1$(i), list2$(i), all$(i), where(i)
    else
        if nItems <= 1000 then print i, "***", "***", all$(i), where(i)
    end if
    tot = tot + where(i)
next
print "All present and accounted for ";3 * nItems;"?= ";tot


Function RndStr$() 'limit possibilities to increase duplicate
    rl = Int(Rnd(0) * 3) + 1
    b$ = ""
    For j = 1 To rl
        b$ = b$ + Mid$("ABCDEF", Int(Rnd(0) * 6) + 1, 1)
    Next
    RndStr$ = b$
End Function

Last Edit: Apr 17, 2021 20:48:42 GMT by B+

tsh73
Global Moderator

Posts: 1,265

Any tips on speeding up a JustBASIC program? Apr 18, 2021 6:16:30 GMT

Quote

Post by tsh73 on Apr 18, 2021 6:16:30 GMT

I think out-of-bounds error is because of compound condition in While
while list1$(i1)=list2$(i2) and i1<=nItems and i2<=nItems
If you write something like that in C
while (i1<=nItems && i2<=nItems && list1$(i1)==list2$(i2))
it will stop evaluate condition just after first FALSE, never touching list1$(i1) out of bounds.
(it is designed and documented to work this way)
But JB will evaluate whole string
so i1<=nItems is false but list1$(i1) still accessed - in condition! erroring with OutOfBounds.

In this particular case, dimming array for one item more will circumvent it
dim list1$(nItems+1), list2$(nItems+1)

Not sure if it could be general fix.

Last Edit: Apr 18, 2021 6:17:38 GMT by tsh73

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

toughdiamond Member in Training Posts: 56	Any tips on speeding up a JustBASIC program? Apr 18, 2021 7:05:25 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by toughdiamond on Apr 18, 2021 7:05:25 GMT Watching......

Rod
Administrator

Posts: 671

Any tips on speeding up a JustBASIC program? Apr 18, 2021 8:44:35 GMT

Quote

Post by Rod on Apr 18, 2021 8:44:35 GMT

Glad we have such good coders on board. I think my error was in not catching any trailing files. So that needs another pair of loops for each array. Now when the comparison runs out of pairs it will list any trailing files. B+ will need to confirm if that was the error he was seeing.

Whether this is faster or slower is pretty relative compared to the timing of the original strategy. I remember competing in a sorting challenge on the forum here. I was fairly proud of a thirty second bubble sort. Welopez smashed the time with a quicksort routine that took about 20ms, I had never seen such code before. Its all been done before somewhere by someone.

This is Anatolys changes to my code tweaked a little to test more boundary conditions.


n1=7
n2=6
dim info1$(10,10)
dim info2$(10,10)
'files "c:\basic\a scratchpad", info1$()
'files "e:\basic\a scratchpad", info2$()
info1$(0,0)=str$(n1)
info2$(0,0)=str$(n2)
for i = 1 to n1
    info1$(i,0)=word$("0 1 2 3 5 6 8",i)
next
for i = 1 to n2
    info2$(i,0)=word$("1 2 4 5 6 7",i)
next

no1=val(info1$(0,0))
no2=val(info2$(0,0))
print "number of files found in 1 :";no1
print "number of files found in 2 :";no2
i1=1
i2=1
while i1<no1 or i2<no2
    while info1$(i1,0)<info2$(i2,0) and i1<=no1
        print "Unique to 1 ";info1$(i1,0)
        i1=i1+1
    wend
    while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2
        i1=i1+1
        i2=i2+1
   wend
    while info1$(i1,0)>info2$(i2,0) and i2<=no2
        print "Unique to 2 ";info2$(i2,0)
        i2=i2+1
    wend
wend
while i1<=no1
    print "Unique to 1 ";info1$(i1,0)
    i1=i1+1
wend
while i2<=no2
    print "Unique to 2 ";info2$(i2,0)
    i2=i2+1
wend

end

tsh73
Global Moderator

Posts: 1,265

Any tips on speeding up a JustBASIC program? Apr 18, 2021 11:32:01 GMT

Quote

Post by tsh73 on Apr 18, 2021 11:32:01 GMT

general solution for compound while condition
(while index not exceeds bound and some contition on array(index))
convert

    while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2
        i1=i1+1
        i2=i2+1
   wend

to

 while i1<=no1 and i2<=no2
        if not(info1$(i1,0)=info2$(i2,0)) then exit while
        i1=i1+1
        i2=i2+1
   wend

, having "index not exceeds bound " WHILE condition
and moving other part of condition inside
IF NOT(condition) WHEN EXIT WHILE.

As for extra loop Rod just added to process "tail" - likely this could not be avoided?

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

Rod
Administrator

Posts: 671

Any tips on speeding up a JustBASIC program? Apr 18, 2021 13:18:04 GMT

Quote

Post by Rod on Apr 18, 2021 13:18:04 GMT

I put in more printing to let you see where the comparisons are made. Seems to work well, I have tried a good few combinations but perhaps I am not seeing the error.

n1=7
n2=7
dim info1$(10,10)
dim info2$(10,10)
'files "c:\basic\a scratchpad", info1$()
'files "e:\basic\a scratchpad", info2$()
info1$(0,0)=str$(n1)
info2$(0,0)=str$(n2)
for i = 1 to n1
    info1$(i,0)=word$("1 2 3 4 5 6 7",i)
next
for i = 1 to n2
    info2$(i,0)=word$("1 2 3 4 5 6 7",i)
next

no1=val(info1$(0,0))
no2=val(info2$(0,0))
print "number of files found in 1 :";no1
print "number of files found in 2 :";no2
i1=1
i2=1
while i1<no1 or i2<no2
    while info1$(i1,0)<info2$(i2,0) and i1<=no1
        print "Upper loop, Unique to 1 ";info1$(i1,0)
        i1=i1+1
    wend
    while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2
        print "Equal loop"
        i1=i1+1
        i2=i2+1
   wend
    while info1$(i1,0)>info2$(i2,0) and i2<=no2
        print "Lower loop,Unique to 2 ";info2$(i2,0)
        i2=i2+1
    wend
wend
while i1<=no1
    print "Tail 1 loop Unique to 1 ";info1$(i1,0)
    i1=i1+1
wend
while i2<=no2
    print "Tail 2 loop Unique to 2 ";info2$(i2,0)
    i2=i2+1
wend

end

Rod Administrator Posts: 671	Any tips on speeding up a JustBASIC program? Apr 18, 2021 14:23:12 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Rod on Apr 18, 2021 14:23:12 GMT @ B+ The <= is taken care of in the inner loops, the outer loop is just a control that ensures the looping stops.

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 18, 2021 15:59:33 GMT

Quote

Post by B+ on Apr 18, 2021 15:59:33 GMT

My hypothesis is that Rod's original code did not encounter duplicates in either file list though duplicates can exist between lists. That is extremely likely to happen with file lists from one drive ie if files were the same name they would have to be in another folder so they would be unique by path.

This is about an hour after I posted my experiments above. So right after supper I started a No Duplicates in list test code and ran into the Weird Unfinished For loop I posted here in Discussion Board. I think without Duplicates the code would run into less out of bounds error and less incomplete listings. A duplicate at the end of both lists would end correctly noting it is in both files, I think.

I think I added the 2 ='s in this line (the outer WHILE loop):
while i1 <= nItems and i2 <= nItems ' mod for < 10 to <= nItems
In attempts to get contents of both lists to complete for full listing of each list, if that loop was only to stop the looping then likely can set it back < without = specially when I get No Duplications in each list fixed up.

tsh73 Global Moderator Posts: 1,265	Any tips on speeding up a JustBASIC program? Apr 18, 2021 18:07:23 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by tsh73 on Apr 18, 2021 18:07:23 GMT I tried <= in outer loop but "tail" was not processed, so extra loop after main is still needed.
	If you like piece of my code, go ahead and use it. I had my share of fun creating it - now it's free.

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 18, 2021 18:42:51 GMT ntech likes this

Quote

Post by B+ on Apr 18, 2021 18:42:51 GMT

Yes! I now have Rod's code working in my test environment with these mods commented here:

[Rod]
print " Rod method ================================== "
' MOD 0: No Duplicates in either list but Duplicates allowed between lists
' MOD 1: Dim lists 1 more than nItems to remove out of bounds errors
' MOD 2: use <= nItems instead of just < nItems to get complete accounting of both lists
while i1 <= nItems or i2 <= nItems  ' add = to both tests so that we can get complete listings,
    while list1$(i1)<list2$(i2) and i1<=nItems
        'print "Unique to 1 ";list1$(i1,0)
        all$(ia) = list1$(i1)
        where(ia) = 1
        ia = ia + 1
        i1=i1+1
    wend
    while list1$(i1)=list2$(i2) and i1<=nItems and i2<=nItems
        ' Rod was not outputting anything when item in both lists
        all$(ia) = list2$(i2)
        where(ia) = 3
        ia = ia + 1
        i1=i1+1
        i2=i2+1
    wend
    while list1$(i1)>list2$(i2) and i2<=nItems
        'print "Unique to 2 ";list2$(i2,0)
        all$(ia) = list2$(i2)
        where(ia) = 2
        ia = ia + 1
        i2=i2+1
    wend
wend

Preliminary tests suggest it is going to take allot of items on lists to see any time difference between B+ code and Rod's if there is any, so as I said before, basically they are the same the caveat being we are assuming Non Duplicated lists. After all, I did get the "ladder method" idea from Rod and dropped Binary search idea like a hot potato! ;-))

Last Edit: Apr 18, 2021 18:44:07 GMT by B+

Rod
Administrator

Posts: 671

Any tips on speeding up a JustBASIC program? Apr 18, 2021 18:43:24 GMT

Quote

Post by Rod on Apr 18, 2021 18:43:24 GMT

edit: Cross posted with B+, off to see what he has cooked up.

So, the outer loop < is deliberate. The original code, still unchanged, handles matched files, unique files and duplicate files in either list. It handles duplicates in that it will match pairs till it gets one or either unique. So if there are three rod.dat files, one in list one and two in list two it will list the third one as unique to list two.

However duplicate file names in the same directory is kinda extreme.

I did forget the trailing files but the final two loops catch any tail end differences.

If there is still something I am missing please demonstrate by altering the numeric listings in the most recent code.

I am sure B+ has a point about his alternative method. Will be good to see it working.

Last Edit: Apr 18, 2021 18:44:45 GMT by Rod

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 19, 2021 7:18:28 GMT

Quote

Post by toughdiamond on Apr 19, 2021 7:18:28 GMT

Looking forward to one or other of the methods getting the green light for me to try grafting onto my new program. Do you think it's ready for me to try as is? I don't mind if it's not perfect yet, and it'd be good to try it in a real-life situation with my big volumes, just that I'm not quite sure which block of code to copy and paste in - is there a "final form" yet or am I jumping the gun?

Rod
Administrator

Posts: 671

Any tips on speeding up a JustBASIC program? Apr 19, 2021 10:46:22 GMT ntech likes this

Quote

Post by Rod on Apr 19, 2021 10:46:22 GMT

Often it is worthwhile dummying up some processing if you have concerns about limits or lengths of time. So this code builds two 80000 file arrays messes them up a little and then hunts for differences. It gets the job done in 700ms

How fast do you need it to be? Even if B+ gets it to run in half the time does it now matter? Which set of code have you understood best? thats more like the deciding factor, if you can better understand one set then use it. they both solve the problem.

no1=80000
no2=80000
dim info1$(80001,10)
dim info2$(80001,10)

'build some dummy file lists
t=time$("ms")
for i = 1 to 80000
    info1$(i,0)="file";str$(i)
    info2$(i,0)="file";str$(i)
next

'make a few files unique
for n= 1 to 5
i=int(rnd(0)*80000)
info1$(i,0)=info1$(i,0)+"0"
i=int(rnd(0)*80000)
info2$(i,0)=info2$(i,0)+"0"
next

print "built two file arrays of 80000 files in ";time$("ms")-t;"ms"
t=time$("ms")

print "number of files found in 1 :";no1
print "number of files found in 2 :";no2
i1=1
i2=1
while i1<no1 or i2<no2
    while info1$(i1,0)<info2$(i2,0) and i1<=no1
        print "Upper loop,Unique to 1 ";info1$(i1,0)
        i1=i1+1
    wend
    while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2
        i1=i1+1
        i2=i2+1
   wend
    while info1$(i1,0)>info2$(i2,0) and i2<=no2
        print "Lower loop,Unique to 2 ";info2$(i2,0)
        i2=i2+1
    wend
wend
while i1<=no1
    print "Tail 1 loop Unique to 1 ";info1$(i1,0)
    i1=i1+1
wend
while i2<=no2
    print "Tail 2 loop Unique to 2 ";info2$(i2,0)
    i2=i2+1
wend
print "matched 80000 files in ";time$("ms")-t;"ms"
end

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 19, 2021 17:33:53 GMT

Quote

Post by toughdiamond on Apr 19, 2021 17:33:53 GMT

Apr 19, 2021 10:46:22 GMT Rod said:

How fast do you need it to be? Even if B+ gets it to run in half the time does it now matter? Which set of code have you understood best? thats more like the deciding factor, if you can better understand one set then use it. they both solve the problem.

I'm happy with any speed that doesn't take hours to run. Both sets of code are about equal when it comes to my understanding them - i.e. still as clear as mud for the most part, but if I can get one to work then that might help me to figure out what goes on. I was about to try adapting one set of code to fit my program yesterday but was stopped by the realisation that I'm not quite sure what all$() and nItems are, don't know what value to give nItems (sum of both lists?), and I'm wary of declaring any more large arrays than absolutely necessary because I already have plans to use more than the two that hold the filenames of the two volumes - pathnames, dates and sizes - and I fear I might end up exceeding JB's 256mb memory limit.

Amyway, the code in your last post doesn't seem to contain the array or variable that stopped me before, so I'll see if I can wire it up to my program. For filename sources I've stuck to my old way of using DOS to make directory dumps as text files, which provides a convenient way of testing any uniques-finding code - I just make 2 copies of the same directory dump and then alter lines in one or both of them, then run the code to see if it finds out what I did.

Any tips on speeding up a JustBASIC program?

Post by toughdiamond on Apr 17, 2021 18:09:48 GMT

Post by B+ on Apr 17, 2021 20:39:00 GMT

Post by tsh73 on Apr 18, 2021 6:16:30 GMT

Post by toughdiamond on Apr 18, 2021 7:05:25 GMT

Post by Rod on Apr 18, 2021 8:44:35 GMT

Post by tsh73 on Apr 18, 2021 11:32:01 GMT

Post by Rod on Apr 18, 2021 13:18:04 GMT

Post by Rod on Apr 18, 2021 14:23:12 GMT

Post by B+ on Apr 18, 2021 15:59:33 GMT

Post by tsh73 on Apr 18, 2021 18:07:23 GMT

Post by B+ on Apr 18, 2021 18:42:51 GMT

Post by Rod on Apr 18, 2021 18:43:24 GMT

Post by toughdiamond on Apr 19, 2021 7:18:28 GMT

Post by Rod on Apr 19, 2021 10:46:22 GMT

Post by toughdiamond on Apr 19, 2021 17:33:53 GMT