Occurrences counting

honky
Full Member

Posts: 261

Occurrences counting Nov 4, 2023 13:47:46 GMT

Quote

Post by honky on Nov 4, 2023 13:47:46 GMT

Good morning,
I have a array of strings: etatag$(), such as:
"Tool_a nom_c"
"Tool_b nom_d"
"Tool_a nom_c"
"Tool_b nom_d"
"Tool_a nom_c"
... ect ...
I would like the numbers of: "tool_a nom_c" and "tool_b nom_d".
I did that that doesn't work.

How to do ?

Thank you for.

[/div][div]dim comptea$(100)
[/div][div]for x=1 to netatag 'netatag=end of array
    mota$=word$(etatag$(x),1): motb$=word$(etatag$(x),2)[/div][div]    c=0:comptea=0
    for y=netatag to 1 step-1
       if word$(etatag$(y),1)=mota$ and word$(etatag$(y),2)=motb$ then
         comptea=comptea+1: c=c+1: comptea$(c)=str$(comptea);" ";mota$;" for ";motb$
      end if
    next y
    next x[/div][div]

tsh73
Global Moderator

Posts: 1,335

Occurrences counting Nov 4, 2023 15:36:05 GMT

Quote

Post by tsh73 on Nov 4, 2023 15:36:05 GMT

Hello
as for now stated, you do not need parsing line into words, at all
You just need to count different lines
In JB we have "sort" operator
With that, easiest way is to sort and count number of duplicates.

Something along this

read netatag
dim etatag$(netatag )
for x = 1 to netatag
    read a$: etatag$(x) =a$
next

'sort
sort etatag$(), 1, netatag
'then count duplicates
prev$="":count=0
for x = 1 to netatag
    if etatag$(x)=prev$ then
        count=count+1
    else    'first
        if x<>1 then print prev$, count
        count=1
        prev$=etatag$(x)
    end if
next
'last item
print prev$, count

end
data 5  'num of lines
data "Tool_a  nom_c"
data "Tool_b  nom_d"
data "Tool_a  nom_c"
data "Tool_b  nom_d"
data "Tool_a  nom_c"

Last Edit: Nov 4, 2023 17:47:52 GMT by tsh73

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

honky Full Member Posts: 261	Occurrences counting Nov 4, 2023 16:46:27 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by honky on Nov 4, 2023 16:46:27 GMT Thank you very much tsh73. I pu in my code.

plus
Full Member

Posts: 449

Occurrences counting Nov 7, 2023 16:04:33 GMT

Quote

Post by plus on Nov 7, 2023 16:04:33 GMT

I am kinda curious if sorting is really necessary as it would likely, theorectically, take longer to sort array than to just run through array once counting duplicates? Even if you had a bunch of duplicates to test for, it would take less fooling around than sorting then counting duplicates because you only have to run through array once.

b = b + ...

Last Edit: Nov 7, 2023 16:05:21 GMT by plus

tsh73 Global Moderator Posts: 1,335	Occurrences counting Nov 7, 2023 18:08:54 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by tsh73 on Nov 7, 2023 18:08:54 GMT How you will count duplicates in one pass in unsorted array? You'll need some magic to keep these unique values and counters And implementing magic is costly
	If you like piece of my code, go ahead and use it. I had my share of fun creating it - now it's free.

plus
Full Member

Posts: 449

Occurrences counting Nov 7, 2023 18:57:10 GMT

Quote

Post by plus on Nov 7, 2023 18:57:10 GMT

Well I am going by the OP's example with a few items and just basically counting how many of each item is in the list.

Read it again and see if find any word that mentions looking for "duplicates"! E is asking for counts of known items.

Like a H, T list for coind flips and you want the count of H and T, everything over a count of one is a duplicate. Certainly no magic involved in this case!

BTW I can't make H or T of Honky's code, WTH? is e using Word$() if e has an array or strings?

WTH is <div> tag?

b = b + ...

Last Edit: Nov 7, 2023 19:05:02 GMT by plus

tsh73
Global Moderator

Posts: 1,335

Occurrences counting Nov 7, 2023 19:19:25 GMT

Quote

Post by tsh73 on Nov 7, 2023 19:19:25 GMT

E is asking for counts of known items.

Not sure, really
But for small number of known items - well, yes. Single pass and see which item it is.

But code posted takes item(x), split it in words, then compares it to splitted item(y)
It looks for me like aall-to-all comparison, for each x, y. That will be n^2.
And it will find all duplicates (if debugged that is)
Sorting is n log n ;), and counting in sorted is single pass no magic involved.

(and by magic I mean something like Python dictionary.
You run through array single pass, but on each step call Python magic dictionary -
it keeps unique values and allows to index counter by these values)

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

plus
Full Member

Posts: 449

Occurrences counting Nov 7, 2023 19:41:25 GMT

Quote

Post by plus on Nov 7, 2023 19:41:25 GMT

Here is what I mean:

'Count Items in a data listing bplus 2023-11-07
dim S$(100) ' allow up to 100 strings don't really need this but WTH?
dim Item$(100) ' allow up 100 items
dim counts(100) ' allow 100 counts
itemCount = 0 : dataI = 1 : items = 0

'restore [data2] ' <<<<<<<<<<<< uncomment for 2nd test

while d$ <> "EOD"
    read d$
    if d$ <> "EOD" then
        dataI = dataI + 1
        S$(dataI) = d$
        if items = 0 then
            items = 1
            Item$(1) = d$
            counts(1) = 1
        else
            found = 0
            for i = 1 to items
                if d$ = Item$(i) then counts(i) = counts(i) + 1 : found = 1 : exit for
            next
            if found = 0 then
                items = items + 1
                Item$(items) = d$
                counts(items) = 1
            end if
        end if
    end if
wend
for i = 1 to items
    print Item$(i);" Count is ";counts(i)
next

[data1]
data "Tool_a  nom_c", "Tool_b  nom_d", "Tool_a  nom_c", "Tool_b  nom_d", "Tool_a  nom_c", "EOD"

[data2] 'test without quotes OK unless word is command eg on <<< interesting
data the, rain, in, Spain, falls, mainly, "on", the, plain, "EOD"

b = b + ...

Last Edit: Nov 7, 2023 19:47:45 GMT by plus

plus
Full Member

Posts: 449

Occurrences counting Nov 7, 2023 19:46:39 GMT cundo likes this

Quote

Post by plus on Nov 7, 2023 19:46:39 GMT

" (and by magic I mean something like Python dictionary.
You run through array single pass, but on each step call Python magic dictionary -
it keeps unique values and allows to index counter by these values) "

Wow I guess I am making magic! in post above

b = b + ...

Last Edit: Nov 7, 2023 19:46:57 GMT by plus

tsh73 Global Moderator Posts: 1,335	Occurrences counting Nov 7, 2023 20:09:24 GMT cundo likes this Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by tsh73 on Nov 7, 2023 20:09:24 GMT Wow I guess I am making magic! in post above :) Yes you do. Programmers routinely do something from nothing. Isn't it magic?
	If you like piece of my code, go ahead and use it. I had my share of fun creating it - now it's free.

plus Full Member Posts: 449	Occurrences counting Nov 7, 2023 20:36:27 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by plus on Nov 7, 2023 20:36:27 GMT Nov 7, 2023 20:09:24 GMT tsh73 said: Wow I guess I am making magic! in post above Yes you do. Programmers routinely do something from nothing. Isn't it magic? Indeed! why I keep coming back to it. b = b + ...

tsh73
Global Moderator

Posts: 1,335

Occurrences counting Nov 8, 2023 19:23:19 GMT

Quote

Post by tsh73 on Nov 8, 2023 19:23:19 GMT

I run it against the timer
Really, times depend on a total size (now set 10000) and number of different things (now set to 10)
With these parameters, Bplus and my code works about the same (on my machine)
and it is 2x slower then just checking against 10 known values

If I increase number of unique vales (not i mod 10 but i mod 20), Bplus' code starts working slower
(obviously because of inner loop "for i = 1 to items")
If I increase N 2x, Bplus' code runs 2x longer (linear). While sorting takes longer, making it total 3x.
(and 4x leads to 4x, 11x - sorting loses!!!)

But of cource it depends on task at hand.
On N=1000, every way is under 0.1 sec...

randomize .001
N=100000   'sort takes too long!!!
'N=40000
'N=20000
N=10000     'sort and Bplus' magic takes about same time
'N=1000
dim a$(N)

t0=time$("ms")

for i = 1 to N
    'a$(i) = "item";i
    a$(i) = "item";(i mod 10)   '0..9
    'a$(i) = "item";(i mod 20)   '0..19   - increase number of unique numbers
next

'1% of items is changed, still sane items, but counter is different
for i=1 to N/100
    r=int(rnd(0)*N)+1
    j=int(rnd(0)*10)
    a$(r)="item";j
next
t1=time$("ms")

print "generating ";t1-t0

'count for single
countA=0
for i = 1 to N
    if a$(i) = "item0" then countA=countA+1
next
t2=time$("ms")

print "count 1 ";t2-t1
print countA

'count for 3
countA=0
countB=0
countC=0
for i = 1 to N
    if a$(i) = "item0" then countA=countA+1
    if a$(i) = "item1" then countB=countB+1
    if a$(i) = "item2" then countC=countC+1
next

t3=time$("ms")
print "count 3 ";t3-t2
print countA
print countB
print countC

'count for 10
count0=0
count1=0
count2=0
count3=0
count4=0
count5=0
count6=0
count7=0
count8=0
count9=0
for i = 1 to N
    if a$(i) = "item0" then count0=count0+1
    if a$(i) = "item1" then count1=count1+1
    if a$(i) = "item2" then count2=count2+1
    if a$(i) = "item3" then count3=count3+1
    if a$(i) = "item4" then count4=count4+1
    if a$(i) = "item5" then count5=count5+1
    if a$(i) = "item6" then count6=count6+1
    if a$(i) = "item7" then count7=count7+1
    if a$(i) = "item8" then count8=count8+1
    if a$(i) = "item9" then count9=count9+1
next

t4=time$("ms")
print "count 10 ";t4-t3
print count0
print count1
print count2
print count3
print count4
print count5
print count6
print count7
print count8
print count9

t7=time$("ms")

'Count Items in a data listing bplus 2023-11-07
dim S$(100) ' allow up to 100 strings don't really need this but WTH?
dim Item$(100) ' allow up 100 items
dim counts(100) ' allow 100 counts
itemCount = 0 : dataI = 1 : items = 0

for x = 1 to N
    d$=a$(x)
        'dataI = dataI + 1
        'S$(dataI) = d$
        if items = 0 then
            items = 1
            Item$(1) = d$
            counts(1) = 1
        else
            found = 0
            for i = 1 to items
                if d$ = Item$(i) then counts(i) = counts(i) + 1 : found = 1 : exit for
            next
            if found = 0 then
                items = items + 1
                Item$(items) = d$
                counts(items) = 1
            end if
        end if
next
for i = 1 to items
    print Item$(i);" Count is ";counts(i)
next
t8=time$("ms")
print "Bplus' magic ";t8-t7

'sort'n count ======================================
sort a$(),a1,N

t9=time$("ms")
print "sort ";t9-t8 '860, faster then single pass

prev$="":count=0
for x = 1 to N
    if a$(x)=prev$ then
        count=count+1
    else    'first
        if x<>1 then print prev$, count
        count=1
        prev$=a$(x)
    end if
next
'last item
print prev$, count
t10=time$("ms")
print "sort&count ";t10-t8

Last Edit: Nov 9, 2023 1:23:50 GMT by tsh73

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

plus
Full Member

Posts: 449

Occurrences counting Nov 9, 2023 14:35:11 GMT

Quote

Post by plus on Nov 9, 2023 14:35:11 GMT

I was thinking we can speed up bplus code by using a binary search to see if an item is listed yet, say when items exceed a certain amount.
Then you need code to sort the Items$() list as you add to it, I have code that does that already ie creating a sorted list while loading it from data or from file or from user input. Did I post that here already? But all this extra stuff wont pay off until you have hundreds maybe thousands of items to have to run through. As I said and tsh73 confirmed if you know ahead of time all the items in the list it will go faster specially if you sort that list of Items and use Binary search to find the index for counting.

BTW if the items are numbers, well then we already know what the items are AND they are almost automatically ordered so in that case it should go much much much faster in one pass! So coders, give all your string items a code number and use that code to save your data and run your counts!

Piece of cake!

b = b + ...

Last Edit: Nov 9, 2023 14:41:36 GMT by plus

Post by honky on Nov 4, 2023 13:47:46 GMT

Post by tsh73 on Nov 4, 2023 15:36:05 GMT

Post by honky on Nov 4, 2023 16:46:27 GMT

Post by plus on Nov 7, 2023 16:04:33 GMT

Post by tsh73 on Nov 7, 2023 18:08:54 GMT

Post by plus on Nov 7, 2023 18:57:10 GMT

Post by tsh73 on Nov 7, 2023 19:19:25 GMT

Post by plus on Nov 7, 2023 19:41:25 GMT

Post by plus on Nov 7, 2023 19:46:39 GMT

Post by tsh73 on Nov 7, 2023 20:09:24 GMT

Post by plus on Nov 7, 2023 20:36:27 GMT

Post by tsh73 on Nov 8, 2023 19:23:19 GMT

Post by plus on Nov 9, 2023 14:35:11 GMT