Duplicates in array of strings.

honky Full Member Posts: 198	Duplicates in array of strings. Feb 16, 2023 12:02:09 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by honky on Feb 16, 2023 12:02:09 GMT I found (probably here) an algo that removes duplicate strings. but it uses asc() codes Which causes problems with accents. Does anyone have an algo by pure string comparisons? Thank you for.

tsh73
Global Moderator

Posts: 1,270

Duplicates in array of strings. Feb 16, 2023 12:54:06 GMT

Quote

Post by tsh73 on Feb 16, 2023 12:54:06 GMT

Give us examples
What are inputs
is it in array, is in in a one huge string
How big it is
What you need as output
is order of strings important

What are assents you are talking about?
- aren't accented charachers have different codes?
- I pretty sure string comparison would treat them as different characters too

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

honky
Full Member

Posts: 198

Duplicates in array of strings. Feb 16, 2023 13:13:08 GMT honky likes this

Quote

Post by honky on Feb 16, 2023 13:13:08 GMT

@: tsh73":
Strings are just names
But a comparison on only the first four letters would suffice
The problem with asc is:
asc("d") = 100
asc("e") = 101
asc("é") = 233 <--- !!!
asc("f") = 102
If we compare the strings as is, it should solve the problem

EDIT:


a$="a r de é g tà n"
b$="a r de é g tà n"
if a$=b$ then print "yess"
'It's say: "yess"
a$="a à g é t"
b$="a a g é t"
if a$=b$ then
print "yess"
else
print "no"
end if
'It's say: "no"

Last Edit: Feb 16, 2023 13:25:26 GMT by honky

tsh73
Global Moderator

Posts: 1,270

Duplicates in array of strings. Feb 16, 2023 13:26:24 GMT

Quote

Post by tsh73 on Feb 16, 2023 13:26:24 GMT

I've heard for JB/LB it could depend on International settings
For me

c$(1)=chr$(100)
c$(2)=chr$(101)
c$(3)=chr$(233)
c$(4)=chr$(102)

print c$(1)<c$(2)   '1- Yes
print c$(2)<c$(3)   '1- Yes
print c$(3)<c$(4)   '0- No

(and chr$(233) does not looks like any near "e" in my locale)

What numbers it prints for you?

Last Edit: Feb 16, 2023 13:28:05 GMT by tsh73

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

tsh73
Global Moderator

Posts: 1,270

Duplicates in array of strings. Feb 16, 2023 13:32:40 GMT

Quote

Post by tsh73 on Feb 16, 2023 13:32:40 GMT

So does your example prove that "If we compare the strings as is" it will NOT solve your problem?
Or it will not exactly match but it will sort right?

Oh
Just run a sort on letters and see for yourself is accented characters fall on right places

dim c$(256)
for i = 2 to 15
for j = 0 to 15
n = i*16+j
c$(n) = chr$(n)
print c$(n);
next
print
next
print
print "after sort"
print

sort c$(), 0, 255

for i = 2 to 15
for j = 0 to 15
n = i*16+j
print c$(n);
next
print
next

Funny. I was not aware my locale has accented characters at all (though they are not used in my language)
and sort() orders them reasonably.
Good thing to know.

Last Edit: Feb 16, 2023 14:01:41 GMT by tsh73

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

honky Full Member Posts: 198	Duplicates in array of strings. Feb 16, 2023 13:35:58 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by honky on Feb 16, 2023 13:35:58 GMT 233 it's not "e" but "é" (wjth accent) EDIT: it's not a sort but an eliminate doublons
	Last Edit: Feb 16, 2023 13:40:16 GMT by honky

tsh73
Global Moderator

Posts: 1,270

Duplicates in array of strings. Feb 16, 2023 13:40:15 GMT

Quote

Post by tsh73 on Feb 16, 2023 13:40:15 GMT

So
string comparison does not match
asc("e") = 101
asc("é") = 233
as equal.

Do you need them to be considered equal, for duplicate removing?

Then I would suggest recoding string, changing all variants of "e" to plain "e", and all ather accented letters, too
And then look for duplicates.

Last Edit: Feb 16, 2023 14:02:31 GMT by tsh73

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

honky
Full Member

Posts: 198

Duplicates in array of strings. Feb 16, 2023 14:04:06 GMT

Quote

Post by honky on Feb 16, 2023 14:04:06 GMT

Yes, it is possible to replace the accented letters with standard ones.
But I would like an algo by comparisons of raw strings
a$ = b$ or a$ <> b$
I'm not talking about the number of attempts (with array(s) of transfer) to which I have returned.

tsh73
Global Moderator

Posts: 1,270

Duplicates in array of strings. Feb 16, 2023 15:05:58 GMT

Quote

Post by tsh73 on Feb 16, 2023 15:05:58 GMT

But I would like an algo by comparisons of raw strings
a$ = b$ or a$ <> b$

But BASIC does not do it.

Write your own function that takes raw strings,
(internally)
. replace accented characters to ordinary ones
. and do compare.
Return True or False (1 or 0)

That's it.

If you like piece of my code, go ahead and use it.
I had my share of fun creating it - now it's free.

Rod
Administrator

Posts: 677
Member is Online

Duplicates in array of strings. Feb 16, 2023 15:10:10 GMT

Quote

Post by Rod on Feb 16, 2023 15:10:10 GMT

A string is just a collection of bytes, asc bytes. So they will never compare unless you substitute. It wont matter whether you do a string comparison or an asc code comparison they will still differ.

a$="ardeégtàn"
b$="ardeégtàn"
if a$=b$ then print "yess"
'It's say: "yess"
a$="aàgét"
b$="aagét"
if a$=b$ then
print "yess"
else
print "no"
end if
'It's say: "no"
for n= 1 to len(a$)
print asc(mid$(a$,n,1)),asc(mid$(b$,n,1))
next
if a$>b$ then print "greater"
if a$<b$ then print "lesser"

c$(1)=a$
c$(2)=b$
print c$(1),c$(2)
sort c$(,1,2
print c$(1),c$(2)

a$=replstr$(a$,chr$(224),chr$(97))
if a$=b$ then
print "yess"
else
print "no"
end if
print a$,b$

Rod Administrator Posts: 677 Member is Online	Duplicates in array of strings. Feb 16, 2023 15:13:18 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Rod on Feb 16, 2023 15:13:18 GMT Can I ask, because we don't use accents in Scotland. How is it that you can get a and accented a input? Are both a and accented a on the keyboard? Which of a$ and b$ is correctly input and spelt?
	Last Edit: Feb 16, 2023 15:14:01 GMT by Rod

honky
Full Member

Posts: 198

Duplicates in array of strings. Feb 16, 2023 16:45:47 GMT Rod likes this

Quote

Post by honky on Feb 16, 2023 16:45:47 GMT

On my keyboard (French) there are accented letters, first line of the keyboard (numbers in upper case); except the "ù" of "où" (where) which is after the "m".
And Jb makes a good difference between normals and accented.
How lucky you are not to have accents.

honky Full Member Posts: 198	Duplicates in array of strings. Feb 16, 2023 17:56:07 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by honky on Feb 16, 2023 17:56:07 GMT The solution exists and works. I'm waiting to post it. For if anyone wants to think about it. Tell me when you give your tongue at the cat . And I will post it.

cundo
Global Moderator

Posts: 235

Duplicates in array of strings. Feb 16, 2023 21:51:40 GMT Rod likes this

Quote

Post by cundo on Feb 16, 2023 21:51:40 GMT

My keyboard has the accent as a key is the ¨´{ key. Has 3 functions, if I press that key alone it doesn't do anything but wait to me to press a vowel.
So special Key + vowel, not simultaneously , it writes the accented vowel.

Plus I have the Ñ next to the L. that's all.

To run the codes posted on this forum download JustBASIC 2.0 if you haven't already.

honky
Full Member

Posts: 198

Duplicates in array of strings. Feb 21, 2023 14:58:05 GMT cundo likes this

Quote

Post by honky on Feb 21, 2023 14:58:05 GMT

Due to the shortage of tongues, the cat died of hunger.
So here is the solution:
Note: JB consider "é" (e accentued) as "e".
Bur differency "â" of 'a"


  dim k$(500)
a$="aa bb cc dd ee ff bbbb cc ée âa dd cc bbbb ee aa " '//Une chaine---
n=15
    for x=1 to n '// Mise dans un seul tableau ---
       k$(x)=word$(a$,x)
    next x
for i=1 to n
  nouveau =1
  for j = 1 to i-1
     if k$(i) = k$(j) then nouveau=0
  next j
  if nouveau = 1 then
     dedoub = dedoub+1
     resu$(dedoub) = k$(i)
     print(resu$(dedoub))
  end if
next i

Post by honky on Feb 16, 2023 12:02:09 GMT

Post by tsh73 on Feb 16, 2023 12:54:06 GMT

Post by honky on Feb 16, 2023 13:13:08 GMT

Post by tsh73 on Feb 16, 2023 13:26:24 GMT

Post by tsh73 on Feb 16, 2023 13:32:40 GMT

Post by honky on Feb 16, 2023 13:35:58 GMT

Post by tsh73 on Feb 16, 2023 13:40:15 GMT

Post by honky on Feb 16, 2023 14:04:06 GMT

Post by tsh73 on Feb 16, 2023 15:05:58 GMT

Post by Rod on Feb 16, 2023 15:10:10 GMT

Post by Rod on Feb 16, 2023 15:13:18 GMT

Post by honky on Feb 16, 2023 16:45:47 GMT

Post by honky on Feb 16, 2023 17:56:07 GMT

Post by cundo on Feb 16, 2023 21:51:40 GMT

Post by honky on Feb 21, 2023 14:58:05 GMT