Example, two separate Concept Words, one el and the other él (with accent over e)_ gets interpreted as duplicate Concept Words.
In Spanish and French, for purposes of indexing, these words should actually be treated the same. (In any Spanish or French dictionary, accented characters are alphabetized just the same as their accent-free base letter.)
- When building the unique concept word list, accents should be stripped.
- When searching for concepts via concept words, accents should be stripped from the search terms.
From a native Spanish speaker:
"Ignore the accents the majority of people don't know how to use them correctly. Plus that's how indexes are in Spanish."
FYI, in Spanish, all accented vowels should be treated as the plain version of those vowels. ñ is actually its own letter and should be treated the same as n. I don't know how this applies to other languages, but as a first pass I think we can just map all accented vowels to their plain form.