trying to strip out non ascii.. or rather convert non ascii

wxjmfauth at gmail.com wxjmfauth at gmail.com
Thu Oct 31 06:33:15 EDT 2013


Le jeudi 31 octobre 2013 08:10:18 UTC+1, Steven D'Aprano a écrit :
> On Wed, 30 Oct 2013 01:49:28 -0700, wxjmfauth wrote:
> 
> 
> 
> >> The right solution to that is to treat it no differently from other
> 
> >> fuzzy
> 
> >> searches. A good search engine should be tolerant of spelling errors
> 
> >> and
> 
> >> alternative spellings for any letter, not just those with diacritics.
> 
> >> Ideally, a good search engine would successfully match all three of
> 
> >> "naïve", "naive" and "niave", and it shouldn't rely on special handling
> 
> >> of diacritics.
> 
> > 
> 
> > This is a non sense. The purpose of a diacritical mark is to make a
> 
> > letter a different letter. If a tool is supposed to match an ô, there is
> 
> > absolutely no reason to match something else.
> 
> 
> 
> 
> 
> I'm glad that you know so much better than Google, Bing, Yahoo, and other 
> 
> search engines. When I search for "mispealled" Google gives me:
> 
> 
> 
>     Showing results for misspelled
> 
>     Search instead for mispealled
> 
> 
> 
> 
> 
> But I see now that this is nonsense and there is *absolutely no reason* 
> 
> to match something other than the ecaxt wrods I typed.
> 
> 
> 
> Perhaps you should submit a bug report to Google:
> 
> 
> 
> "When I mistype a word, Google correctly gives me the search results I 
> 
> wanted, instead of the wrong results I didn't want."
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven


As far as I know, I recognized my mistake. I had more
text processing systems in mind, than search engines.

I can even tell you, I am really stupid. I wrote pure
Unicode software to sort French or German strings.

Pure unicode == independent from any locale.

jmf



More information about the Python-list mailing list