Replace accented chars with unaccented ones

AdSR artur_spruce at yahoo.com
Thu Mar 18 04:38:04 EST 2004


Nicolas Bouillon <bouil at bouil.org.invalid> wrote:
> Hi
> 
> I would like to replace accentuel chars (like "é", "è" or "à") with non 
> accetued ones ("é" -> "e", "è" -> "e", "à" -> "a").
> 
> I have tried string.replace method, but it seems dislike non ascii chars...
> 
> Can you help me please ?
> Thanks.

You could try experimenting with the 'unicodedata' module:

>>> import unicodedata
>>> [unicodedata.name(x) for x in u'123 abc @#$ \u00ff']
['DIGIT ONE', 'DIGIT TWO', 'DIGIT THREE', 'SPACE', 'LATIN SMALL LETTER
A', 'LATIN SMALL LETTER B', 'LATIN SMALL LETTER C', 'SPACE',
'COMMERCIAL AT', 'NUMBER SIGN', 'DOLLAR SIGN', 'SPACE', 'LATIN SMALL
LETTER Y WITH DIAERESIS']
>>> unicodedata.lookup('latin capital letter a with grave')
u'\xc0'

You could strip the ' WITH...' part when applicable and convert names
back to string. You would only need to process characters with ord >=
160.

HTH,

AdSR



More information about the Python-list mailing list