Replace accented chars with unaccented ones
AdSR
artur_spruce at yahoo.com
Thu Mar 18 04:38:04 EST 2004
Nicolas Bouillon <bouil at bouil.org.invalid> wrote:
> Hi
>
> I would like to replace accentuel chars (like "é", "è" or "à") with non
> accetued ones ("é" -> "e", "è" -> "e", "à" -> "a").
>
> I have tried string.replace method, but it seems dislike non ascii chars...
>
> Can you help me please ?
> Thanks.
You could try experimenting with the 'unicodedata' module:
>>> import unicodedata
>>> [unicodedata.name(x) for x in u'123 abc @#$ \u00ff']
['DIGIT ONE', 'DIGIT TWO', 'DIGIT THREE', 'SPACE', 'LATIN SMALL LETTER
A', 'LATIN SMALL LETTER B', 'LATIN SMALL LETTER C', 'SPACE',
'COMMERCIAL AT', 'NUMBER SIGN', 'DOLLAR SIGN', 'SPACE', 'LATIN SMALL
LETTER Y WITH DIAERESIS']
>>> unicodedata.lookup('latin capital letter a with grave')
u'\xc0'
You could strip the ' WITH...' part when applicable and convert names
back to string. You would only need to process characters with ord >=
160.
HTH,
AdSR
More information about the Python-list
mailing list