trying to strip out non ascii.. or rather convert non ascii
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Wed Oct 30 14:38:31 EDT 2013
Le mercredi 30 octobre 2013 18:54:05 UTC+1, Michael Torrie a écrit :
> On 10/30/2013 10:08 AM, wxjmfauth at gmail.com wrote:
>
> > My comment had nothing to do with Python, it was a
>
> > general comment. A diacritical mark just makes a letter
>
> > a different letter; a "ï " and a "i" are "as
>
> > diferent" as a "a" from a "z". A diacritical mark
>
> > is more than a simple ornementation.
>
>
>
> That's nice, but you didn't actually read what Ned said (or the OP).
>
> The OP doesn't care that "ï " and a "i" are as different as "a" and "z".
>
> For the purposes of his search he wants them treated as the same
>
> letter. A fuzzy searching treats them all the same. For example, a
>
> search for "Godel, Escher, Bach" should find "Gödel, Escher, Bach" just
>
> fine. Even though "o" and "ö" are different characters. And lo and
>
> behold Google actually does this! Try it. It's nice for those of use
>
> who want to find something and our US keyboards don't have the right marks.
>
>
>
> https://www.google.ca/search?q=godel+escher+bach
>
>
>
> After all this nonsense, that's what the original poster is looking for
>
> (I think... can't be sure since it's been so many days now). Seems to
>
> me a python module does this quite nicely:
>
>
>
> https://pypi.python.org/pypi/Unidecode
Ok. You are right. I recognize my mistake. Independently
from the top poster's task, I did not understand in that
way.
Let say it depends on the context, for a general
search engine, it's good that diacritics are ignored.
For, let say, a text processing system, it's good
to have only precised matches. It does not mean, other
matching possibilities may exist.
jmf
More information about the Python-list
mailing list