|
Post by honky on Nov 4, 2023 13:47:46 GMT
Good morning, I have a array of strings: etatag$(), such as: "Tool_a nom_c" "Tool_b nom_d" "Tool_a nom_c" "Tool_b nom_d" "Tool_a nom_c" ... ect ... I would like the numbers of: "tool_a nom_c" and "tool_b nom_d". I did that that doesn't work. How to do ? Thank you for.
[/div][div]dim comptea$(100) [/div][div]for x=1 to netatag 'netatag=end of array mota$=word$(etatag$(x),1): motb$=word$(etatag$(x),2)[/div][div] c=0:comptea=0 for y=netatag to 1 step-1 if word$(etatag$(y),1)=mota$ and word$(etatag$(y),2)=motb$ then comptea=comptea+1: c=c+1: comptea$(c)=str$(comptea);" ";mota$;" for ";motb$ end if next y next x[/div][div]
|
|
|
Post by tsh73 on Nov 4, 2023 15:36:05 GMT
Hello as for now stated, you do not need parsing line into words, at all You just need to count different lines In JB we have "sort" operator With that, easiest way is to sort and count number of duplicates.
Something along this
read netatag dim etatag$(netatag ) for x = 1 to netatag read a$: etatag$(x) =a$ next
'sort sort etatag$(), 1, netatag 'then count duplicates prev$="":count=0 for x = 1 to netatag if etatag$(x)=prev$ then count=count+1 else 'first if x<>1 then print prev$, count count=1 prev$=etatag$(x) end if next 'last item print prev$, count
end data 5 'num of lines data "Tool_a nom_c" data "Tool_b nom_d" data "Tool_a nom_c" data "Tool_b nom_d" data "Tool_a nom_c"
|
|
|
Post by honky on Nov 4, 2023 16:46:27 GMT
Thank you very much tsh73. I pu in my code.
|
|
|
Post by plus on Nov 7, 2023 16:04:33 GMT
I am kinda curious if sorting is really necessary as it would likely, theorectically, take longer to sort array than to just run through array once counting duplicates? Even if you had a bunch of duplicates to test for, it would take less fooling around than sorting then counting duplicates because you only have to run through array once.
b = b + ...
|
|
|
Post by tsh73 on Nov 7, 2023 18:08:54 GMT
How you will count duplicates in one pass in unsorted array? You'll need some magic to keep these unique values and counters And implementing magic is costly
|
|
|
Post by plus on Nov 7, 2023 18:57:10 GMT
Well I am going by the OP's example with a few items and just basically counting how many of each item is in the list. Read it again and see if find any word that mentions looking for "duplicates"! E is asking for counts of known items. Like a H, T list for coind flips and you want the count of H and T, everything over a count of one is a duplicate. Certainly no magic involved in this case! BTW I can't make H or T of Honky's code, WTH? is e using Word$() if e has an array or strings? WTH is <div> tag? b = b + ...
|
|
|
Post by tsh73 on Nov 7, 2023 19:19:25 GMT
Not sure, really But for small number of known items - well, yes. Single pass and see which item it is.
But code posted takes item(x), split it in words, then compares it to splitted item(y) It looks for me like aall-to-all comparison, for each x, y. That will be n^2. And it will find all duplicates (if debugged that is) Sorting is n log n ;), and counting in sorted is single pass no magic involved.
(and by magic I mean something like Python dictionary. You run through array single pass, but on each step call Python magic dictionary - it keeps unique values and allows to index counter by these values)
|
|
|
Post by plus on Nov 7, 2023 19:41:25 GMT
Here is what I mean:
'Count Items in a data listing bplus 2023-11-07 dim S$(100) ' allow up to 100 strings don't really need this but WTH? dim Item$(100) ' allow up 100 items dim counts(100) ' allow 100 counts itemCount = 0 : dataI = 1 : items = 0
'restore [data2] ' <<<<<<<<<<<< uncomment for 2nd test
while d$ <> "EOD" read d$ if d$ <> "EOD" then dataI = dataI + 1 S$(dataI) = d$ if items = 0 then items = 1 Item$(1) = d$ counts(1) = 1 else found = 0 for i = 1 to items if d$ = Item$(i) then counts(i) = counts(i) + 1 : found = 1 : exit for next if found = 0 then items = items + 1 Item$(items) = d$ counts(items) = 1 end if end if end if wend for i = 1 to items print Item$(i);" Count is ";counts(i) next
[data1] data "Tool_a nom_c", "Tool_b nom_d", "Tool_a nom_c", "Tool_b nom_d", "Tool_a nom_c", "EOD"
[data2] 'test without quotes OK unless word is command eg on <<< interesting data the, rain, in, Spain, falls, mainly, "on", the, plain, "EOD"
b = b + ...
|
|
|
Post by plus on Nov 7, 2023 19:46:39 GMT
" (and by magic I mean something like Python dictionary. You run through array single pass, but on each step call Python magic dictionary - it keeps unique values and allows to index counter by these values) " Wow I guess I am making magic! in post above b = b + ...
|
|
|
Post by tsh73 on Nov 7, 2023 20:09:24 GMT
Yes you do. Programmers routinely do something from nothing. Isn't it magic?
|
|
|
Post by plus on Nov 7, 2023 20:36:27 GMT
Yes you do. Programmers routinely do something from nothing. Isn't it magic? Indeed! why I keep coming back to it. b = b + ...
|
|
|
Post by tsh73 on Nov 8, 2023 19:23:19 GMT
I run it against the timer Really, times depend on a total size (now set 10000) and number of different things (now set to 10) With these parameters, Bplus and my code works about the same (on my machine) and it is 2x slower then just checking against 10 known values If I increase number of unique vales (not i mod 10 but i mod 20), Bplus' code starts working slower (obviously because of inner loop "for i = 1 to items") If I increase N 2x, Bplus' code runs 2x longer (linear). While sorting takes longer, making it total 3x. (and 4x leads to 4x, 11x - sorting loses!!!) But of cource it depends on task at hand. On N=1000, every way is under 0.1 sec... randomize .001 N=100000 'sort takes too long!!! 'N=40000 'N=20000 N=10000 'sort and Bplus' magic takes about same time 'N=1000 dim a$(N)
t0=time$("ms")
for i = 1 to N 'a$(i) = "item";i a$(i) = "item";(i mod 10) '0..9 'a$(i) = "item";(i mod 20) '0..19 - increase number of unique numbers next
'1% of items is changed, still sane items, but counter is different for i=1 to N/100 r=int(rnd(0)*N)+1 j=int(rnd(0)*10) a$(r)="item";j next t1=time$("ms")
print "generating ";t1-t0
'count for single countA=0 for i = 1 to N if a$(i) = "item0" then countA=countA+1 next t2=time$("ms")
print "count 1 ";t2-t1 print countA
'count for 3 countA=0 countB=0 countC=0 for i = 1 to N if a$(i) = "item0" then countA=countA+1 if a$(i) = "item1" then countB=countB+1 if a$(i) = "item2" then countC=countC+1 next
t3=time$("ms") print "count 3 ";t3-t2 print countA print countB print countC
'count for 10 count0=0 count1=0 count2=0 count3=0 count4=0 count5=0 count6=0 count7=0 count8=0 count9=0 for i = 1 to N if a$(i) = "item0" then count0=count0+1 if a$(i) = "item1" then count1=count1+1 if a$(i) = "item2" then count2=count2+1 if a$(i) = "item3" then count3=count3+1 if a$(i) = "item4" then count4=count4+1 if a$(i) = "item5" then count5=count5+1 if a$(i) = "item6" then count6=count6+1 if a$(i) = "item7" then count7=count7+1 if a$(i) = "item8" then count8=count8+1 if a$(i) = "item9" then count9=count9+1 next
t4=time$("ms") print "count 10 ";t4-t3 print count0 print count1 print count2 print count3 print count4 print count5 print count6 print count7 print count8 print count9
t7=time$("ms")
'Count Items in a data listing bplus 2023-11-07 dim S$(100) ' allow up to 100 strings don't really need this but WTH? dim Item$(100) ' allow up 100 items dim counts(100) ' allow 100 counts itemCount = 0 : dataI = 1 : items = 0
for x = 1 to N d$=a$(x) 'dataI = dataI + 1 'S$(dataI) = d$ if items = 0 then items = 1 Item$(1) = d$ counts(1) = 1 else found = 0 for i = 1 to items if d$ = Item$(i) then counts(i) = counts(i) + 1 : found = 1 : exit for next if found = 0 then items = items + 1 Item$(items) = d$ counts(items) = 1 end if end if next for i = 1 to items print Item$(i);" Count is ";counts(i) next t8=time$("ms") print "Bplus' magic ";t8-t7
'sort'n count ====================================== sort a$(),a1,N
t9=time$("ms") print "sort ";t9-t8 '860, faster then single pass
prev$="":count=0 for x = 1 to N if a$(x)=prev$ then count=count+1 else 'first if x<>1 then print prev$, count count=1 prev$=a$(x) end if next 'last item print prev$, count t10=time$("ms") print "sort&count ";t10-t8
|
|
|
Post by plus on Nov 9, 2023 14:35:11 GMT
I was thinking we can speed up bplus code by using a binary search to see if an item is listed yet, say when items exceed a certain amount. Then you need code to sort the Items$() list as you add to it, I have code that does that already ie creating a sorted list while loading it from data or from file or from user input. Did I post that here already? But all this extra stuff wont pay off until you have hundreds maybe thousands of items to have to run through. As I said and tsh73 confirmed if you know ahead of time all the items in the list it will go faster specially if you sort that list of Items and use Binary search to find the index for counting. BTW if the items are numbers, well then we already know what the items are AND they are almost automatically ordered so in that case it should go much much much faster in one pass! So coders, give all your string items a code number and use that code to save your data and run your counts! Piece of cake! b = b + ...
|
|