ascii to latin1
Serge.Orlov at gmail.com
Tue May 9 03:07:15 CEST 2006
Luis P. Mendes wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> I'm developing a django based intranet web server that has a search page.
> Data contained in the database is mixed. Some of the words are
> accented, some are not but they should be. This is because the
> collection of data began a long time ago when ascii was the only way to go.
> The problem is users have to search more than once for some word,
> because the searched word can be or not be accented. If we consider
> that some expressions can have several letters that can be accented, the
> search effort is too much.
> I've searched the net for some kind of solution but couldn't find. I've
> just found for the opposite.
> if the word searched is 'televisão', I want that a search by either
> 'televisao', 'televisão' or even 'télévisao' (this last one doesn't
> exist in Portuguese) is successful.
> So, instead of only one search, there will be several used.
> Is there anything already coded, or will I have to try to do it all by
You need to covert from latin1 to ascii not from ascii to latin1. The
function below does that. Then you need to build database index not on
latin1 text but on ascii text. After that convert user input to ascii
de_str = unicodedata.normalize("NFD", s)
return ''.join(cp for cp in de_str if not
More information about the Python-list