Replace accented chars with unaccented ones

Josiah Carlson jcarlson at
Tue Mar 16 03:19:00 CET 2004

Jeff Epler wrote:

> You have two options.  First, convert the string to Unicode and use code
> like the following:
>     replacements = [(u'\xe9', 'e'), ...]
>     def remove_accents(u):
>         for a, b in replacements:
>             u = u.replace(a, b)
>         return u
> u'e'
> Second, if you are using a single-byte encoding (iso8859-1, for
> instance), then work with byte string:
>     replacement_map = string.maketrans('\xe9...', 'e...')
>     def remove_accents(s):
>         return s.translate(replacement_map)
> 'e'
> If you want to have strings like u'é' in your programs, you have to
> include a line at the top of the source file that tells Python the
> encoding, like the following line does:
>     # -*- coding: utf-8 -*-
> (except you have to name the encoding your editor uses, if it's not
> utf-8) See
> Once you've done that, you can write
>     replacements = [(u'é', 'e'), ...]
> instead of using the \xXX escape for it.

Translating the replacements pairs into a dictionary would result in a 
significant speedup for large numbers of replacements.

mapping = dict(replacement_pairs)

def multi_replace(inp, mapping=mapping):
     return u''.join([mapping.get(i, i) for i in inp])

One pass through the file gives an O(len(inp)) algorithm, much better 
(running-time wise) than the string.replace method that runs in 
O(len(inp) * len(replacement_pairs)) time as given.

  - Josiah

More information about the Python-list mailing list