Any tips on speeding up a JustBASIC program?

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 19, 2021 17:47:44 GMT

Quote

Post by B+ on Apr 19, 2021 17:47:44 GMT

toughdiamond
Rod is right don't wait, go with the one you understand the best. Unless you want an excuse not to do anything ;-))

The differences is minutia compared to time savings of using one or other. It's good practice just to try one and if get stuck try the other. Practice, practice, practice the best learning comes from that.

I have a question come to mind about memory limits of JB when considering doing a test involving 250,000 items in one list. Needing 2 arrays for lists and 2 arrays with 2X's one list takes me past 1.5 million items to store. Each item would be 9 bytes in length (a perm of the 9 digits).

Will JB let me do that many items?

Gotta say I am bulking at idea of doing 250,000 per list because it will take so much longer to setup the test lists compared to the code I am timing to process the lists 2 different ways. It takes so long to get 10,000 non duplicates in 2 lists, I have to come up with an alternate way to create non duplicate lists.

Already I know the time difference, if any, is insignificant compared to the massive time savings from using original method or even Binary Search of Sorted Lists.

This "ladder" method is actually what I have used to merge sorted files from 10 disk files.

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 19, 2021 18:00:42 GMT

Quote

Post by B+ on Apr 19, 2021 18:00:42 GMT

toughdiamond

I see we cross posted, good you are trying your own experiments.

The all$() and where() were arrays used to store results of testing every item in each list.
all$() after processing through code will list in alpha order every single item in both lists, where will say 1 if item is only in list 1, 2 if only in list 2 and 3 if in both lists. The same index, i, in all$(i) matches the where(i) giving item name and location of each item. I hope that clears up that part of things.

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 19, 2021 20:38:07 GMT

Quote

Post by B+ on Apr 19, 2021 20:38:07 GMT

Its just a few ms but B+ consistently edges out Rod's at 1000 per list and at 10000 per list (0ver 20 mins to create 2 non duplicates lists) about 10 x's a few ms to process.


' Ladder No Duplicates B+ versus Rod Time Test 'B+ 2021-04-17
' I had to modify my version to base 1 arrays to match Rods code and then to be more generic with nItems instead of 10

' I gave up trying to get Rod's code to work under my test conditions
global nItems ' we need to modify RndStr$() according number number of items we want in Non Duplicated listing of strings
nItems = 1000  'per list

dim list1$(nItems + 1), list2$(nItems + 1)
i1 = 1 : i2 = 1 : ia = 1 ' modify to base 1 arrays indexes to each list and all$ = where index


    'create lists use same list for testing  this is same for both methods
    for i = 1 to nItems  ' NO Duplicates in list
        scan
        [doagain]
        test$ = RndStr$()
        'print "test$ = ";test$
        if i > 1 then
            j = 1
            while j < i
                scan
                if test$ = list1$(j) then goto [doagain]
                j = j + 1
            wend
        end if
        list1$(i) = test$

        [testAgain2]
        test2$ = RndStr$()
        'print "test2$ = ";test2$
        if i > 1 then
            j = 1
            while j < i
                scan
                if test2$ = list2$(j) then goto [testAgain2]
                j = j + 1
            wend
        end if
        list2$(i) = test2$
        if nItems <= 100 then
            print i, list1$(i), list2$(i) 'check raw OK
        else
            if i mod 500 = 0 then
                print "Creating Non Duplicate Lists Progress: ";i;" of ";nItems;" ";time$()
            end if
        end if
    next i


'test extreme case when both lists end on same items and list2$ has it a number of times
list1$(nItems) = "ZZZZ"
list2$(nItems) = "ZZZZ"

sort list1$(), 1, nItems
sort list2$(), 1, nItems
if nItems <= 100 then
    print "Sorted ======================================"
    for i = 1 to nItems
        print i, list1$(i), list2$(i) 'check sort OK
    next
end if
tryRod = 0


' reset arrays and index for each method
dim all$(2*nItems), where(2*nItems)  ' where 1 for list 1, 2 for list 2, 3 for both
i1 = 1 : i2 = 1 : ia = 1
s=time$("ms")

if tryRod goto [Rod]
print " B+ method =============================================="
while i1 <= nItems and i2 <= nItems  ' mod for < 10 to <= nItems
    scan
    if list1$(i1) < list2$(i2) then
        all$(ia) = list1$(i1)
        where(ia) = 1
        ia = ia + 1
        i1 = i1 + 1
    else
        if list1$(i1) > list2$(i2) then
            all$(ia) = list2$(i2)
            where(ia) = 2
            ia = ia + 1
            i2 = i2 + 1
        else
            all$(ia) = list2$(i2)
            where(ia) = 3
            ia = ia + 1
            i1 = i1 + 1
            i2 = i2 + 1
        end if
    end if
wend
if i1 > nItems then 'finish out list 2
    for i = i2 to nItems ' mod to nItems
        all$(ia) = list2$(i)
        where(ia) = 2
        ia = ia + 1
    next
else 'finish out list 1 but possible to end with = items so check i1
    if i1 <= nItems then
        for i = i1 to nItems ' mod to nItems
            all$(ia) = list1$(i)
            where(ia) = 1
            ia = ia + 1
        next
    end if
end if
' B+ method end
print "Completed in ms ";time$("ms")-s
tot = 0 'reinit
' show results  This is same for both tests
print "index", "list 1", "list 2", "item", "source (3 = in both)"
for i = 1 to ia - 1 'modify to base 1 arrays
    scan
    if i <= nItems then ' mod to nItems
        if nItems <= 100 then print i, list1$(i), list2$(i), all$(i), where(i)
    else
        if nItems <= 100 then print i, "***", "***", all$(i), where(i)
    end if
    tot = tot + where(i)
next
print "All present and accounted for ";3 * nItems;"?= ";tot

' reset arrays and index for each method
dim all$(2*nItems), where(2*nItems)  ' where 1 for list 1, 2 for list 2, 3 for both
i1 = 1 : i2 = 1 : ia = 1
s=time$("ms")

[Rod]
print " Rod method ================================== "
' MOD 0: No Duplicates in either list but Duplicates allowed between lists
' MOD 1: Dim lists 1 more than nItems to remove out of bounds errors
' MOD 2: use <= nItems instead of just < nItems to get complete accounting of both lists
while i1 <= nItems or i2 <= nItems  ' add = to both tests so that we can get complete listings,
    while list1$(i1)<list2$(i2) and i1<=nItems
        scan
        'print "Unique to 1 ";list1$(i1,0)
        all$(ia) = list1$(i1)
        where(ia) = 1
        ia = ia + 1
        i1=i1+1
    wend
    while list1$(i1)=list2$(i2) and i1<=nItems and i2<=nItems
        scan
        ' Rod was not outputting anything when item in both lists
        all$(ia) = list2$(i2)
        where(ia) = 3
        ia = ia + 1
        i1=i1+1
        i2=i2+1
    wend
    while list1$(i1)>list2$(i2) and i2<=nItems
        scan
        'print "Unique to 2 ";list2$(i2,0)
        all$(ia) = list2$(i2)
        where(ia) = 2
        ia = ia + 1
        i2=i2+1
    wend
wend


[Finish]
print "Completed in ms ";time$("ms")-s
tot = 0 'reinit
' show results  This is same for both tests
print "index", "list 1", "list 2", "item", "source (3 = in both)"
for i = 1 to ia - 1 'modify to base 1 arrays
    scan
    if i <= nItems then ' mod to nItems
        if nItems <= 100 then print i, list1$(i), list2$(i), all$(i), where(i)
    else
        if nItems <= 100 then print i, "***", "***", all$(i), where(i)
    end if
    tot = tot + where(i)
next
print "All present and accounted for ";3 * nItems;"?= ";tot


Function RndStr$()
    ' pick a number of perms to assure allot of overlap in lists
    ' but not so resticted it takes forever or never to makeup a non duplicate list
    select case
    case nItems <= 10 ' 4 letters 2 at a time gives 16 perms
        For k = 1 To 2
            b$ = b$ + Mid$("ABCD", Int(Rnd(0) * 4) + 1, 1)
        Next
        RndStr$ = b$
    case nItems <= 100 ' needs 12 letters 2 at a time = 144 perms
        For k = 1 To 2
            b$ = b$ + Mid$("ABCDEFGHIJKL", Int(Rnd(0) * 12) + 1, 1)
        Next
        RndStr$ = b$
    case nItems <= 1000 ' needs 12^3 = 1728 perms
        For k = 1 To 3
            b$ = b$ + Mid$("ABCDEFGHIJKL", Int(Rnd(0) * 12) + 1, 1)
        Next
        RndStr$ = b$
    case nItems <= 10000 ' needs 25
        For k = 1 To 3 '25^3 = 15,625 perms
            b$ = b$ + Mid$("ABCDEFGHIJKLMNOPQRSTUVWXYZ", Int(Rnd(0) * 25) + 1, 1)
        Next
        RndStr$ = b$ 
    case nItems <= 100000
        For k = 1 To 4 ' 20^4 = 160,000
            b$ = b$ + Mid$("ABCDEFGHIJKLMNOPQRSTUVWXYZ", Int(Rnd(0) * 20) + 1, 1)
        Next
        RndStr$ = b$ 
    case nItems > 100000
        Print "Too many items, you don't want to wait that long! Ending program."
        end
    end select
End Function

Consistent with the hypothesis that there are a few more compares in Rod's code and just slightly increases the time.
At this rate 100,000 items per list will still have a difference of less than 1 sec.

Either method is 100% better than original processing of files or a Binary search to find duplicates between lists.

Last Edit: Apr 19, 2021 20:41:11 GMT by B+

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 19, 2021 22:01:32 GMT

Quote

Post by toughdiamond on Apr 19, 2021 22:01:32 GMT

Thanks for your input, gentlemen

Luckily my "real-world" directory dumps provide an easy way for me to test the code with a large list of filenames. Currently I'm messing with a "modest" 16,000-file directory dump, and if that works then I'll try a real biggie.

I'm doing a lot of wondering about the maximum number of filenames the final program will have to deal with, and whether or not there's any real threat to the method being able to cope with the worst possible case. Unfortunately a very large drive can in theory get filled with gazillions of very small files, so it's hard to put an upper limit on the challenge it could in theory present one day. Assuming my average file size stays the same as it currently is, a full 2tb drive might contain about 125,000 files, which on average would need about 380k, or 760k for the 2 volumes. Trouble is, it would be very handy to include the file size and date, which might double that, and also handy to have an array to hold the path of every file, which might double it again. So it might need a bit of thought at some stage about whether to keep some of the data (e.g. the paths) on the disk rather than trying to hold everything in arrays at the same time. It'll add to the processing time but I don't think it'll make an fatal difference as long as I'm careful what I tell the program to do.

Anyway, next step for me is to pick one of these 2 methods, transplant it into my program and try running it. That will tell me if I've wired it up right at least.

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 20, 2021 3:47:31 GMT

Quote

Post by toughdiamond on Apr 20, 2021 3:47:31 GMT

Well, it didn't crash. But it printed the entire contents of the volume(s) as unique, when all but about 8 files had exact counterparts in both volumes. I thought I'd grafted the code in there correctly. All I did was to change the names of the string arrays to be the same as the ones I used in the part of my program that loads and parses the directory dumps, and the names of a couple of simple variables for the same reason. My parsing does what it's supposed to do, I'm pretty sure of that.

I'll check it carefully again, and if I still can't see anything wrong, I'll post the modified code along with an explanation of why I modified what I did.

Rod
Administrator

Posts: 583

Any tips on speeding up a JustBASIC program? Apr 20, 2021 6:49:43 GMT

Quote

Post by Rod on Apr 20, 2021 6:49:43 GMT

If you check the link posted about the Files command you will see that it already contains other info about the files. That’s why it is multi dimension. Since both lists are ordered it is easy to break the task into manageable chunks. You might also do it directory by directory iteratively. Files will help here too.

If you want help getting the code corrected it would be best to print off the first few items in each list and let us see those. If they look similar check that the len() is the same for each and that there are no extra spaces or hidden characters spoiling the compare.

A tiny bit of code usually clarifies everything.

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 20, 2021 8:34:03 GMT

Quote

Post by toughdiamond on Apr 20, 2021 8:34:03 GMT

Thanks - the code I've been using is from here:

justbasiccom.proboards.com/thread/670/any-tips-speeding-justbasic-program?page=3&scrollTo=4644link

When I run that as a standalone program it works fine, but when I add (the relevant part of) it to my existing program (which imports the filenames and SORTS them, and also works fine), that's when it lists thousands of filenames which it thinks are unique, though they're not. I'll try a couple more things in the morning and post again with some filename data as you suggest, if it still won't work. I've double-checked the changes I made to the names of the variables and arrays (so that your code could pick up my arrays of filenames), but can't see anything wrong there.

I'm happier to work with DIR dumps rather than the FILES command for this, because the code I wrote to parse them works well and has one or two advantages over the alternative, if it can be made to play nicely with your code. But naturally, if there turns out to be an inherent reason why it can't, FILES will be the only way to go. It'll mean a learning curve and a rewrite, and will lose the handy option of plugging in the big HDs, grabbing the DIR dumps, copying them onto the computer's internal HD, and making alterations to the dump files to see if the program notices them correctly - it'd be nice not to have to keep plugging the USB drives into this laptop repeatedly to do FILES if there's much troubleshooting required when I'm testing it on those.

But if it turns out to be the only way, that's what I'll need to do. I don't mind learning about the FILES command as such, just that it makes things rather more complicated at the moment and it seems safer to stick with what I know for getting the data imported, rather than try to work with a further new concept at the same time.

I've looked at the filenames and their lengths, and the ones I checked are identical - I'll post some examples if the program is still playing up after I've tried a couple more ideas on it, which I'll do after I've slept. It might be something very simple I've overlooked.

Rod
Administrator

Posts: 671

Any tips on speeding up a JustBASIC program? Apr 20, 2021 11:12:37 GMT

Quote

Post by Rod on Apr 20, 2021 11:12:37 GMT

This is how I would tweak the code to use two imported arrays. You may get an index out of bounds error at the end. Files command creates a larger array then the files so i did not need to bother being exact. just dim your arrays one item larger than needed.

'The reason I push Files command is because
'it fills an array instantly and tells us
'how many items have been found.

'if we are using a list that list needs
'imported to an array which takes time.

'we also need to know how many items
'are in the list, that can take time.

'however, lets assume we have two arrays and we know
'how many items are in each.

'file1$()=first list
'file2$()=second list
'no1=number of items in list 1
'no2=number of items in list 2


'we can sort the files instantly
sort file1$(,1,no1
sort file2$(,1,no2

'now we start the matching, this is an ordered ladder search
'at any one point the lead file will be ahead, equal to or behind
'the sub file. Since they are both ordered we simply list the
'exception till the exception ceases then loop back round to
'see what exception we find next. If we are lucky we may not
'even loop round as we may fall into the next exception first.

'set the index of both files to 1, the first record
i1=1
i2=1
while i1<no1 or i2<no2
    'we are ahead of the sub
    while file1$(i1)<file2$(i2) and i1<=no1
        print "Upper loop,Unique to 1 ";file1$(i1)
        i1=i1+1
    wend
    'we are equal with the sub
    while file1$(i1)=file2$(i2) and i1<=no1 and i2<=no2
        i1=i1+1
        i2=i2+1
    wend
    'we are behind the sub
    while file1$(i1)>file2$(i2) and i2<=no2
        print "Lower loop,Unique to 2 ";file2$(i2)
        i2=i2+1
    wend
wend
while i1<=no1
    print "Tail 1 loop Unique to 1 ";file1$(i1)
    i1=i1+1
wend
while i2<=no2
    print "Tail 2 loop Unique to 2 ";file2$(i2)
    i2=i2+1
wend

Last Edit: Apr 20, 2021 19:25:12 GMT by Rod

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 20, 2021 20:26:36 GMT

Quote

Post by toughdiamond on Apr 20, 2021 20:26:36 GMT

That makes sense to me, and as far as I can see that's what I did, except that to harmonise things I changed the variable and array names in your code to match mine rather than the other way round. Specifically, I have 2-dimensional arrays v$() and w$() that hold not only the filenames, but also the paths and the full-length lines (i.e. including the dates and sizes) from the directory dumps. The number of files is held in i and j respectively. The sorting is currently performed on the filenames alone.

Here's a sample of 6 of the sorted results from each volume, showing that those files are mostly identical and that the lengths aren't different:

FROM LIST 1
file no. 7500  length: 50
04/08/2019  16:27         2,764,854 D246IW-(2).bmp
file no. 7501  length: 54
26/05/2010  06:41         2,106,216 D3DCompiler_43.dll
file no. 7502  length: 48
26/05/2010  06:41         1,998,168 d3dx9_43.dll
file no. 7503  length: 52
12/12/2018  23:07           205,761 D440_(1)_eng.pdf
file no. 7504  length: 52
12/12/2018  23:46           211,559 D440_(2)_eng.pdf
file no. 7505  length: 52
12/12/2018  23:52           214,681 D440_(3)_eng.pdf

FROM LIST 2
file no. 7500  length: 54
26/05/2010  06:41         2,106,216 D3DCompiler_43.dll
file no. 7501  length: 48
26/05/2010  06:41         1,998,168 d3dx9_43.dll
file no. 7502  length: 52
12/12/2018  23:07           205,761 D440_(1)_eng.pdf
file no. 7503  length: 52
12/12/2018  23:46           211,559 D440_(2)_eng.pdf
file no. 7504  length: 52
12/12/2018  23:52           214,681 D440_(3)_eng.pdf
file no. 7505  length: 53
10/12/2018  17:33           150,741 D440_0818_eng.pdf

I added a counter to your code, to count the number of unique files it thinks exist, and it turns out that it doesn't think they're ALL unique, only 427. Just to see what would happen, I tried changing the code to work on the "full" filenames (including the dates and sizes), and that reported 4409 uniques. Non-unique entries can be seen in both reports:

(last few lines of uniques-finding routine - checking filenames only)
Upper loop,Unique to 1 Turkish.ini
Lower loop,Unique to 2 Turkish.ini
Lower loop,Unique to 2 Unrar.dll
Lower loop,Unique to 2 UNTiMCA2.BAS
Lower loop,Unique to 2 UNTiMCAL.BAS
Upper loop,Unique to 1 Unrar.dll
Upper loop,Unique to 1 UNTiMCA2.BAS
Upper loop,Unique to 1 UNTiMCAL.BAS
Lower loop,Unique to 2 Untitled2.jpg
Upper loop,Unique to 1 Untitled2.jpx
matched 16057 files in 2574ms
reported 427 unique filenames

(last few lines of uniques-finding routine - checking filenames with dates and sizes)
Lower loop,Unique to 2 15/08/2020  13:26               449 ytdlcopier.bat.SUPERCEDED
Lower loop,Unique to 2 18/03/2021  18:49               852 YTDLsimple.m2s
Upper loop,Unique to 1 27/02/2019  14:51             2,225 YTDL6c.m2s
Upper loop,Unique to 1 15/08/2020  13:26               449 ytdlcopier.bat.SUPERCEDED
Upper loop,Unique to 1 18/03/2021  18:49               852 YTDLsimple.m2s
Upper loop,Unique to 1 13/03/2019  09:48               609 YTLINKS.m2s
Upper loop,Unique to 1 12/03/2019  18:51               392 YTLINKS2.BAS
Upper loop,Unique to 1 08/05/2019  11:17             1,056 YTLINKS3.BAS
Lower loop,Unique to 2 13/03/2019  09:48               609 YTLINKS.m2s
Lower loop,Unique to 2 12/03/2019  18:51               392 YTLINKS2.BAS
Lower loop,Unique to 2 08/05/2019  11:17             1,056 YTLINKS3.BAS
matched 16057 files in 35350ms
reported 4409 unique filenames

Insights and suggestions welcome. Pending that, I'll try using the new code you posted, with the same adaptions to the variable and array names as I did before, just in case it makes a difference that throws any light on the problem.

I agree that the time taken to load my directory dumps would likely be longer than it would be using FILES, just that so far that time hasn't been excessive. If I switch to FILES, I suspect it'll get complicated to rewrite because FILES and my program both use a 2-dimensional array but in a different way. Though I don't doubt that it would be better in the end, and that it may be necessary well before that if we hit a brick wall with my current strategy.

Rod Administrator Posts: 583	Any tips on speeding up a JustBASIC program? Apr 20, 2021 21:33:12 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Rod on Apr 20, 2021 21:33:12 GMT Can you post the first twenty entries from each list as they exist when ready to process. Only the file names are needed. Post your version of the matching routine,
	A tiny bit of code usually clarifies everything.

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 20, 2021 22:31:35 GMT

Quote

Post by toughdiamond on Apr 20, 2021 22:31:35 GMT

Certainly. Here's the filenames:

LIST 1:
'Four Candles'-qu9MptWyCB8.mp4
'Love Hurts' The Everly Brothers--5iJMfwwheY.mp3
'One Day More' of President Trump - Les Mis‚rables parody-E4aTjeCP0Lo.mp4
'Til I Kissed You _ Cathy's Clown ~~ Everly Brothers, Melbourne, 1989-yEp2dz5xvos.mp3
'When You're Gay' Song - The Armstrong and Miller Show - Series 2 Episode 6 Preview - BBC One-FK6BQA7dUDs.mp4
'You're My World' Cilla Black-e7-QBw862zk.mp3
--qnRSfcjQf.js.download
-COYwasVyyY.css
-CQcNBE1DAY.js.download
-hHOl1hwfuy.js.download
-ln06vWFVTB.css
-vI804b5LoF.css
-x490iycc78.css
!stDividedSound.wav
#ENSINANDO A USAR O MACRO RECORD PRO 2 NO MUAWAY-jl0A_uT2sfM.mp4
%filename%.accurip
%filename%.accurip
%filename%.accurip
(F) Local Disk.lnk
(The late) Phil Everly ~ sings ~ Let it Be Me ~-QAz2gLhdC80.mp3

LIST 2:
'Four Candles'-qu9MptWyCB8.mp4
'Love Hurts' The Everly Brothers--5iJMfwwheY.mp3
'One Day More' of President Trump - Les Mis‚rables parody-E4aTjeCP0Lo.mp4
'Til I Kissed You _ Cathy's Clown ~~ Everly Brothers, Melbourne, 1989-yEp2dz5xvos.mp3
'When You're Gay' Song - The Armstrong and Miller Show - Series 2 Episode 6 Preview - BBC One-FK6BQA7dUDs.mp4
'You're My World' Cilla Black-e7-QBw862zk.mp3
--qnRSfcjQf.js.download
-COYwasVyyY.css
-CQcNBE1DAY.js.download
-hHOl1hwfuy.js.download
-ln06vWFVTB.css
-vI804b5LoF.css
-x490iycc78.css
!stDividedSound.wav
#ENSINANDO A USAR O MACRO RECORD PRO 2 NO MUAWAY-jl0A_uT2sfM.mp4
%filename%.accurip
%filename%.accurip
%filename%.accurip
(F) Local Disk.lnk
(The late) Phil Everly ~ sings ~ Let it Be Me ~-QAz2gLhdC80.mp3

And here's the code I altered and ran:


no1=i
no2=j

t=time$("ms")

unq=0
i1=1
i2=1
while i1<i or i2<j
    while v$(i1,1)<w$(i2,1) and i1<=i
        print "Upper loop,Unique to 1 ";v$(i1,1):unq=unq+1
        i1=i1+1
    wend
    while v$(i1,1)=w$(i2,1) and i1<=i and i2<=j
        i1=i1+1
        i2=i2+1
   wend
    while v$(i1,1)>w$(i2,1) and i2<=j
        print "Lower loop,Unique to 2 ";w$(i2,1):unq=unq+1
        i2=i2+1
    wend
wend
while i1<=i
    print "Tail 1 loop Unique to 1 ";v$(i1,1):unq=unq+1
    i1=i1+1
wend
while i2<=j
    print "Tail 2 loop Unique to 2 ";w$(i2,1):unq=unq+1
    i2=i2+1
wend
print "reported ";unq;" unique filenames"
end

That particular version works on the filename entries that have the dates and sizes, which are stored in v$(n,1) and w$(n,1). To run it on plain filenames, I just use v$(n,2) and w$(n,2) instead, because that's where they're stored. Paths are in (n,3).

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 21, 2021 6:17:50 GMT

Quote

Post by toughdiamond on Apr 21, 2021 6:17:50 GMT

I tried using the new code, but the result was much the same, only with just 315 detected "uniques" this time. Strange that the number would change at all. This time, instead of replacing the arrays and variables in the code with my own, I dimensioned file1$() etc., copied my filename lists into those, and ran the new code without altering it.

Grubbing around for a clue as to what's going wrong, I tried SORTing my own arrays as well as Rod's, and compared some of the contents. Subject to my confirming this, I think the results are in a different order in places, between my sorted array and Rod's, which rather surprises me. The code to do that is pretty simple and I can't see any mistake:

dim file1$(i+1)
for k=1 to i
file1$(k)=v$(k,2)
next k
dim file2$(j+1)
for k=1 to j
file2$(k)=v$(k,2)
next k
no1=i
no2=j
sort file1$(),1,no1
sort file2$(),1,no2
sort v$(),1,i,2
sort w$(),1,j,2

I allowed that extra element when I dimensioned the new arrays it just to be on the safe side, probably unnecessary. I can't see how it would have done any harm.

I guess there are 2 possible explanations:
1. I made a mistake in either the above code or in checking the results.
2. Something's not quite right with the SORT command.
Neither seems likely to me, but tomorrow I mean to try writing the unsorted and sorted lists to files in the hope of verifying my observations and perhaps finding a pattern, unless somebody can explain it in terms of a known problem with SORT or an error in the above code. Clearly the lists need to be reliably correct before the uniques-finding routine can be expected to work.

Rod
Administrator

Posts: 583

Any tips on speeding up a JustBASIC program? Apr 21, 2021 6:43:43 GMT

Quote

Post by Rod on Apr 21, 2021 6:43:43 GMT

Still need to look at all this but a quick question. You say you swapped v$(n,1] to v$(n,2) did you also resort the array on column 2? The sort command takes an optional column parameter. So sort v$(,1,no1,2 would sort the array based on the second column. If no number is specified sort will use column 1 your dates and sizes.

I also have a question in my mind about how we sort all those non alpha numeric characters but I need to investigate that.

A tiny bit of code usually clarifies everything.

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 21, 2021 7:36:28 GMT

Quote

Post by toughdiamond on Apr 21, 2021 7:36:28 GMT

Apr 21, 2021 6:43:43 GMT Rod said:

You say you swapped v$(n,1] to v$(n,2) did you also resort the array on column 2? The sort command takes an optional column parameter. So sort v$(,1,no1,2 would sort the array based on the second column. If no number is specified sort will use column 1 your dates and sizes.

Yes, I used this:
sort v$(),1,i,2
sort w$(),1,j,2
Hang on.......got it - there IS a typo in the code I used for the copying - there's a v$ in there that should have been a w$, simple as that.

Profound apologies for not seeing that before posting about it. Anyway, embarrassing mistake corrected, and the sorted arrays now match each other properly at least. It hasn't helped the unique-finding routine though - it's now back to listing 427 uniques, same number as for the previous incarnation that I did before posting the 20 filenames.

Last Edit: Apr 21, 2021 7:45:14 GMT by toughdiamond: clarity

Rod
Administrator

Posts: 671

Any tips on speeding up a JustBASIC program? Apr 21, 2021 8:14:34 GMT

Quote

Post by Rod on Apr 21, 2021 8:14:34 GMT

I think the problem might be here

no1=i
no2=j

I think the main loops are ignored and it falls through to the tail loop which lists everything.

Set i to the number of items in the first file and j to the number in the setcond file.

Still testing though.

Any tips on speeding up a JustBASIC program?

Post by B+ on Apr 19, 2021 17:47:44 GMT

Post by B+ on Apr 19, 2021 18:00:42 GMT

Post by B+ on Apr 19, 2021 20:38:07 GMT

Post by toughdiamond on Apr 19, 2021 22:01:32 GMT

Post by toughdiamond on Apr 20, 2021 3:47:31 GMT

Post by Rod on Apr 20, 2021 6:49:43 GMT

Post by toughdiamond on Apr 20, 2021 8:34:03 GMT

Post by Rod on Apr 20, 2021 11:12:37 GMT

Post by toughdiamond on Apr 20, 2021 20:26:36 GMT

Post by Rod on Apr 20, 2021 21:33:12 GMT

Post by toughdiamond on Apr 20, 2021 22:31:35 GMT

Post by toughdiamond on Apr 21, 2021 6:17:50 GMT

Post by Rod on Apr 21, 2021 6:43:43 GMT

Post by toughdiamond on Apr 21, 2021 7:36:28 GMT

Post by Rod on Apr 21, 2021 8:14:34 GMT