Any tips on speeding up a JustBASIC program?

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 15, 2021 0:16:52 GMT

Quote

Post by toughdiamond on Apr 15, 2021 0:16:52 GMT

I've written a rudimentary JB program that finds the names and paths of files that are unique to one or other of two disk drives. Nothing fancy, it simply searches through two directory dumps line by line and reports anything that's in one but not the other. The feature is available in one or two proprietory "duplicate finders" but when I tried those, they gave too many false positives or proved too cumbersome to use for that purpose, so I thought I'd try writing one myself.

It works well enough, but if I try to use it to analyse the contents of large (i.e. 2TB) drives, which is the main purpose of the program, it will take many hours to complete the task. That's hardly surprising, because it takes each filename in turn from one directory listing and looks through the other directory listing, so if I have say 100,000 files on each drive then it will do 100,000 x 100,000 = 10,000,000,000 comparisons (actually that's an oversimplification and in practice it doesn't have to do quite that many, but you get the idea, the time taken for the program to run increases exponentially with the number of files on the drives, hence the problem).

I think I've tweaked the program enough to get the running time down to something reasonable, i.e. a few hours for the number of files I'm likely to want to process, but I was wondering if anybody knows of any tips for improving the speed? Essentially all it does is read everything into an array and do "IF a$(n) = b$(o) THEN......" I found that by splitting the directory listing into 2 parts and storing it in 2 separate arrays doubled the speed (because it only has to look at half the number of filenames per test), but I began to get into diminishing returns when I tried splitting it into more, and 4 parts is as far as it seems worth going. I've also found that using separate single-dimension arrays for the parts - w$(n), x$(n), etc. runs slightly faster than using one 2-dimensional array such as w$(n,1), w$(n,2) etc.

So my question is, does anybody have any ideas how I might speed things up further?

Thanks in advance :-)

xxgeek
Full Member

Linux OS = Mint Cinnamon - Windows OS = WINE - i7 (4 core 8 LP) - 24GB_ DDR3 250GB_ SATA3_SSD

Posts: 443

Any tips on speeding up a JustBASIC program? Apr 15, 2021 1:04:39 GMT

Quote

Post by xxgeek on Apr 15, 2021 1:04:39 GMT

You may get better performance depending on your "write cashing" in Windows settings.

A better explanation here - www.tenforums.com/tutorials/21904-enable-disable-disk-write-caching-windows-10-a.html

You "may" get better performance if you create the tkn file, run it instead of using the JB editor to run it, and give it a higher priority in the task manager. Depends on your system settings and what else is running along side your Just Basic program.If your system is a busy one you might get better performance with less interruptions as it does it,s work by using the higher priority.

If you have lots of ram
Another possible way is to create 2 ramdrives, copy files to each drive, then run your app on the files in the ramdrive, then copy the files back where they belong. Worth testing if you want speed. Using a batch file might help with the file copy to and from the ramdrives, or do that copy using JB code.

I'll let the ace programmers help with adjusting any code that can speed the process up..

Edit - You should post your code. Others can see better what might be adjusted to help you out.

Sorry, I misunderstood you. create a ramdrive, and put your dumpfiles in it to run your app on the dumpfiles in a ramdrive.
Should cut the time down considerably

Last Edit: Apr 15, 2021 1:25:02 GMT by xxgeek

Code-Share for updates.

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 15, 2021 1:42:10 GMT

Quote

Post by B+ on Apr 15, 2021 1:42:10 GMT

If the files are in alpha order you can use binary search for duplicates, that would cut time considerable.

Warning JB has odd way of ordering strings! You might be better off setting up a hash table.

While on subject of Alpha order, I just thought of another time consuming issue: case

If case does not matter it might benefit you to list the files in all caps to save time converting or testing in upper(|lower) case and that might save you from JB goofiness, 2 birds one stone.

Last Edit: Apr 15, 2021 2:11:23 GMT by B+

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 15, 2021 7:57:18 GMT

Quote

Post by toughdiamond on Apr 15, 2021 7:57:18 GMT

Apr 15, 2021 1:04:39 GMT xxgeek said:

You may get better performance depending on your "write cashing" in Windows settings.

A better explanation here - www.tenforums.com/tutorials/21904-enable-disable-disk-write-caching-windows-10-a.html

Thanks - I'm using Windows 7 but the tutorial will likely give me an idea what "write caching" is, and hopefully it's possible with Win7 by similar methods.

You "may" get better performance if you create the tkn file, run it instead of using the JB editor to run it, and give it a higher priority in the task manager. Depends on your system settings and what else is running along side your Just Basic program.If your system is a busy one you might get better performance with less interruptions as it does it,s work by using the higher priority.

Certainly it's worth trying the conversion to .tkn. As for increasing the priority, there's one thing that might complicate matters: I noticed quite early on that the program uses a lot of CPU, and the fan speed increases very noticeably. Not being too sure of the wisdom of running the CPU hot for a number of hours (with regard to ageing my computer), I've taken to reducing the amount of CPU available to JB with a utility called BES, which keeps the core reasonably cool, but naturally also slows it down, by a factor of about 1.8 at the current settings. I figure it might be worth the speed loss, since in spite of it I seem to have got the likely processing time down to something approaching reasonable, which is why I'm now looking for ways of making the code as efficient as possible, to get it down further still.

. . . .Given that I'm limiting the CPU load, would giving the program a higher priority just conflict with my efforts to keep the core temperature down? There aren't a lot of resources used by other processes on this computer unless I run other programs, which naturally would be nice, but I don't know much about what they'll do if I'm running my "program" in the background. It's a 2.1gHz AMD A4 3310MX APU, dual-core.

If you have lots of ram
Another possible way is to create 2 ramdrives, copy files to each drive, then run your app on the files in the ramdrive, then copy the files back where they belong. Worth testing if you want speed. Using a batch file might help with the file copy to and from the ramdrives, or do that copy using JB code.

I've got 8gb of RAM but sadly I'm using 32-bit Win7, which can only use 2.48gb of it. But I don't quite understand - the program loads the filenames into string arrays, and that's not what takes the bulk of the processing time. AFAIK that means the filenames are already in RAM before the hard work of searching them begins.

You should post your code. Others can see better what might be adjusted to help you out.

Yes I should, and I will. But I'd best tidy it up first, as currently it's not going to be easy to figure out what it's up to. I'm almost entirely self-taught so my programming style is sometimes unconventional, naive, and downright scruffy - I only discovered JB recently, after many years of (mis-)using GwBASIC.

Sorry, I misunderstood you. create a ramdrive, and put your dumpfiles in it to run your app on the dumpfiles in a ramdrive.
Should cut the time down considerably

I probably didn't explain it very clearly before. See above, about the slow part of the program being the search part which happens after the data has all been taken up into RAM.

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 15, 2021 8:54:17 GMT

Quote

Post by toughdiamond on Apr 15, 2021 8:54:17 GMT

Apr 15, 2021 1:42:10 GMT B+ said:

If the files are in alpha order you can use binary search for duplicates, that would cut time considerable.

Warning JB has odd way of ordering strings! You might be better off setting up a hash table.

The two directory dumps aren't in alpha order. So far, I create them with a DOS command, for example:

dir /og /s >directorydump.txt

They're probably in alpha order per subfolder, but AFAIK there's no DOS switch to alphabetize the whole shebang as a single entity. I did try running a bubblesort in JB to order them, and it worked, but it took as long to run as the program itself, so overall it wasn't going to save any time. I'd been hoping that once it was alphabetized. What is a hash table?

While on subject of Alpha order, I just thought of another time consuming issue: case

If case does not matter it might benefit you to list the files in all caps to save time converting or testing in upper(|lower) case and that might save you from JB goofiness, 2 birds one stone.

I can't think of any instances where case would matter. So what could I do to take advantage of that - convert everything using upper$ (or lower$), then what?

Rod
Administrator

Posts: 676

Any tips on speeding up a JustBASIC program? Apr 15, 2021 10:12:20 GMT

Quote

Post by Rod on Apr 15, 2021 10:12:20 GMT

Untested just my initial thoughts. Just BASIC has its own FILES command that fills an array instantly with your directory info. It also has a sort command that can sort sections of that array instantly (the files) so no bubble sort needed. Repeat for your other folder then you have two arrays in order. You should be able to jump through those rather than trawl through everything.

As to cache and priority I am less sure. I know you can up the priority but if nothing else is running you have all the priority anyway. I think modern Windows pretty much caches all disc activity to ram.

If I get some time I will try and code a test. But the sun is shining and I have model aeroplanes to fly....

This will give you an overview of the files command

libertybasiccom.proboards.com/thread/1505/files-command

Last Edit: Apr 15, 2021 10:15:09 GMT by Rod

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 15, 2021 10:15:59 GMT

Quote

Post by toughdiamond on Apr 15, 2021 10:15:59 GMT

OK, for what it's worth, here's the listing of the program. Hope my coding style isn't too nebulous to follow. Frankly it even confuses me when I try to fathom it. But it does work, apart from the speed problem.

It's the section starting with the label [LookForDuplication] that's by far the most in need of speeding up, because that's the section that gets called as many times as there are files in the directory dump, and the section itself has to look through a lot of array elements every time it's called. Everything else happens fairly promptly.

Uniques.bas.rtf (8.01 KB)

honky Full Member Posts: 198	Any tips on speeding up a JustBASIC program? Apr 15, 2021 10:32:32 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by honky on Apr 15, 2021 10:32:32 GMT Wouldn't it be more relevant to compare selected directories rather than disks?

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 15, 2021 11:44:00 GMT

Quote

Post by toughdiamond on Apr 15, 2021 11:44:00 GMT

Apr 15, 2021 10:12:20 GMT Rod said:

Untested just my initial thoughts. Just BASIC has its own FILES command that fills an array instantly with your directory info. It also has a sort command that can sort sections of that array instantly (the files) so no bubble sort needed. Repeat for your other folder then you have two arrays in order. You should be able to jump through those rather than trawl through everything.

Thanks, I didn't know those commands. FILES sounds good, but probably of limited use for the speed of my program, because loading the files as such doesn't take long even with the repeated LINE INPUT# command it uses. The real "long haul" part is comparing each filename in a very long list with each filename in another very long list, which takes ages even when it's all in the memory.
. . .The SORT command, though, sounds like it could work wonders. If I can get the file list alphabetized then I should be able to write a routine to find matches a lot more efficiently. There's a complication - each entry in my array has the date and file size followed by the filename, so using the command directly would presumably only sort them by date. And I don't want to simply trim away the date and size because the program currently gives me the added option of including those in the search for identical files (in order to identify unique files), which is useful (if a file has a different size but the same name, it's probably not the same file). I guess it will be a matter of putting the full entries into one column of a 2-dimensional array, trimming them, putting the trimmed versions into the second column, and running SORT on those, which hopefully will reorder both columns. If that works, then "all" have left to figure out is exactly how to use it as a basis for a more efficient search.

As to cache and priority I am less sure. I know you can up the priority but if nothing else is running you have all the priority anyway. I think modern Windows pretty much caches all disc activity to ram.

Yes it could be that it's only useful for making sure the BASIC doesn't slow down if I run other programs. When my program is the only thing running (apart from Windows and its associated processes), it typically takes half the available CPU. But I usually limit JB to about 36% to keep the CPU cool, so there's a lot of spare capacity for running other stuff, as long as Windows doesn't do anything silly.

If I get some time I will try and code a test. But the sun is shining and I have model aeroplanes to fly....

Well, it looks like I'll be opting for the SORT option anyway, unless my reasoning turns out to be faulty (wouldn't be the first time). Enjoy your flight :-)

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 15, 2021 12:28:51 GMT

Quote

Post by toughdiamond on Apr 15, 2021 12:28:51 GMT

Apr 15, 2021 10:32:32 GMT honky said:

Wouldn't it be more relevant to compare selected directories rather than disks?

Well, the main purpose of the program is this: I have a large collection of music, videos, pictures and other files, and until recently their final resting place was a 2TB USB drive. I began to realise that my growing and increasingly valuable (to me) collection was in danger of being irrevocably lost if the drive ever broke down, so I got a second drive and proceeded to copy the stuff onto it. For some of the files I was silly enough to use Windows drag and drop, and it came up with error messages about some of the combined filenames and pathnames being "too long to copy," but without telling me the full names and paths of the files it had skipped, and without even giving me the option to save a log. So naturally I needed to find out which files were present on one volume but not the other. There are many directories on the drives and I don't know which ones contained the files that hadn't been copied. Another option would be to wipe the drive that doesn't contain all the files and then copy everything back onto it using something that works properly, like RoboCopy, but it's a lot of data and it would take a long time, especially as the drive has a rather slow write speed. I could also use RoboCopy with the /MIR switch to "synch" the drive to the original, but I'm not convinced it would be particularly quick and efficient. I've had strange results from RoboCopy where there ended up being more files on the copy than the original - that was on a 128gb flash drive. I still don't know what the extra files are.
. . .So I started looking around for a way of finding skipped and superfluous files, and nothing seems to have been written that will do it properly, so I thought I'd try writing something myself. It could prove useful for any situation where I (or the copying utilities) made a mess of duplicating a large volume.

B+
Senior Member

OK

Posts: 941

Any tips on speeding up a JustBASIC program? Apr 15, 2021 16:59:25 GMT

Quote

Post by B+ on Apr 15, 2021 16:59:25 GMT

Oh good, with a built in Files command (I didn't know) and I forgot about new Sort command also built in that should help tremendously with speed (that surely must work normal when comparing strings).

Looks like the Files command will setup File name field by which you may sort. The question remains when comparing strings, JB has odd way of ordering them. Rod wrote up a string comparing routine some time ago but problem may be solved with all caps? You need to compare strings when doing Binary search for duplicate name. I can dig out some code for that if no one else has it handy.

And using the old method of piping or redirecting dir listing into txt file could be done sorting by Filename first, just lookup the correct switches here: docs.microsoft.com/en-us/windows-server/administration/windows-commands/dir
Looks like you can customize the sorting order on multiple fields with the o switch. I think I would try that first because you are almost there with that method already, so no learning curve for JB Files and Sort command and Windows routines likely to run fast! Get 2 sorted lists upon which to apply binary searches. Heck Windows probably has apps or commands for comparing folders and drives.

Rod
Administrator

Posts: 676

Any tips on speeding up a JustBASIC program? Apr 15, 2021 17:35:44 GMT ntech likes this

Quote

Post by Rod on Apr 15, 2021 17:35:44 GMT

You get 50% usage on the CPU because you are using a single core, 50% of the dual core capacity. But you cant use both cores with Just BASIC, or many other programs.

This is the native command set FILES. I did not need SORT since the directory is already in alpha order. If the file names differ in upper or lower case you can force a lower case comparison with lower$().

I have no idea if my search routine catches everything or will work in a more fragmented environment but it does find the missing files in my backup directory.

s=time$("ms")
dim info1$(10,10)
dim info2$(10,10)
files "c:\basic\a scratchpad", info1$()
files "e:\basic\a scratchpad", info2$()
no1=val(info1$(0,0))
no2=val(info2$(0,0))
print "files found in 1 :";no1
print "files found in 2 :";no2
i1=1
i2=1
while i1<no1 or i2<no2
    while info1$(i1,0)<info2$(i2,0) and i1<=no1
        print "Unique to 1 ";info1$(i1,0)
        i1=i1+1
    wend
    while info1$(i1,0)=info2$(i2,0) and i1<=no1 and i2<=no2
        i1=i1+1
        i2=i2+1
    wend
    while info1$(i1,0)>info2$(i2,0) and i2<=no2
        print "Unique to 2 ";info2$(i2,0)
        i2=i2+1
    wend
wend
print "Completed in ms ";time$("ms")-s
end

Output is

files found in 1 :326
files found in 2 :308
Unique to 1 b.bmp
Unique to 1 buttonhighlight.bas
Unique to 1 cuttingproblem.bas
Unique to 1 fileinfo.bas
Unique to 1 gridJB.bas
Unique to 1 maxmin.bas
Unique to 1 morsekey.bas
Unique to 1 pallett.bas
Unique to 1 rottext.bas
Unique to 1 screenshot.bas
Unique to 1 scrollbars.bas
Unique to 1 tabstrip.bas
Unique to 1 temp.bmp
Unique to 1 temp.tkn
Unique to 1 test.bat
Unique to 1 test.txt
Unique to 1 token.bas
Unique to 1 token.tkn
Completed in ms 84

tsh73 Global Moderator Posts: 1,265	Any tips on speeding up a JustBASIC program? Apr 15, 2021 18:01:02 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by tsh73 on Apr 15, 2021 18:01:02 GMT please tell how much files you got?
	If you like piece of my code, go ahead and use it. I had my share of fun creating it - now it's free.

toughdiamond
Member in Training

Posts: 56

Any tips on speeding up a JustBASIC program? Apr 15, 2021 22:42:22 GMT

Quote

Post by toughdiamond on Apr 15, 2021 22:42:22 GMT

Apr 15, 2021 18:01:02 GMT tsh73 said:

please tell how much files you got?

Right now, 58,141 files. So that would require about 3,300,000,000 comparisons if the program checked each filename in one volume against each filename in the other, though I would expect it to be about half that in practice, because once it's found a match for a file, it can move on to the next one.

I'm also thinking of future requirements. If I ever fill the drive and need to run the program again, I calculated that it might contain about 80,000 files, which would require almost twice as many comparisons, and could be even more if they were smaller files. And they're already selling drives bigger than my 2tb drives. It's hard to know what the upper limit of the challenge for my program might be.

B+ Senior Member OK Posts: 941	Any tips on speeding up a JustBASIC program? Apr 15, 2021 23:23:42 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by B+ on Apr 15, 2021 23:23:42 GMT Rod's code ought to cut the amount of comparing by vast amounts. I would guess around 2 * amount of files.

Any tips on speeding up a JustBASIC program?

Post by toughdiamond on Apr 15, 2021 0:16:52 GMT

Post by xxgeek on Apr 15, 2021 1:04:39 GMT

Post by B+ on Apr 15, 2021 1:42:10 GMT

Post by toughdiamond on Apr 15, 2021 7:57:18 GMT

Post by toughdiamond on Apr 15, 2021 8:54:17 GMT

Post by Rod on Apr 15, 2021 10:12:20 GMT

Post by toughdiamond on Apr 15, 2021 10:15:59 GMT

Post by honky on Apr 15, 2021 10:32:32 GMT

Post by toughdiamond on Apr 15, 2021 11:44:00 GMT

Post by toughdiamond on Apr 15, 2021 12:28:51 GMT

Post by B+ on Apr 15, 2021 16:59:25 GMT

Post by Rod on Apr 15, 2021 17:35:44 GMT

Post by tsh73 on Apr 15, 2021 18:01:02 GMT

Post by toughdiamond on Apr 15, 2021 22:42:22 GMT

Post by B+ on Apr 15, 2021 23:23:42 GMT