Least-lossy string.encode to us-ascii?
Vlastimil Brom
vlastimil.brom at gmail.com
Thu Sep 13 17:44:18 EDT 2012
2012/9/13 Tim Chase <python.list at tim.thechases.com>:
> I've got a bunch of text in Portuguese and to transmit them, need to
> have them in us-ascii (7-bit). I'd like to keep as much information
> as possible, just stripping accents, cedillas, tildes, etc. So
> "serviço móvil" becomes "servico movil". Is there anything stock
> that I've missed? I can do mystring.encode('us-ascii', 'replace')
> but that doesn't keep as much information as I'd hope.
>
> -tkc
>
Hi,
would something like the following be enough for your needs?
Unfortunately, I can't check it reliably with regard to Portuguese.
>>> import unicodedata
>>> unicodedata.normalize("NFD", u"serviço móvil").encode("ascii", "ignore").decode("ascii")
u'servico movil'
>>>
There is also "Unidecode", but I haven't used it myself sofar...
http://pypi.python.org/pypi/Unidecode/
hth,
vbr
More information about the Python-list
mailing list