Replace accented chars with unaccented ones

Josiah Carlson jcarlson at nospam.uci.edu
Wed Mar 17 22:12:57 EST 2004


Noah wrote:

> Josiah Carlson <jcarlson at nospam.uci.edu> wrote in message news:<c37ugc$llq$1 at news.service.uci.edu>...
> 
>>>            r += xlate[ord(i)]
>>>            r += i
>>
>>Perhaps I'm going to have to create a signature and drop information 
>>about this in every post to c.l.py, but repeated string additions are 
>>slow as hell for any reasonably large lengthed string.  It is much 
>>faster to place characters into a list and ''.join() them.
> 
> 
> True. Is this better?
> 
>     ... body of latin1_to_ascii() ...
>     r = []
>     for i in unicrap:
>         if xlate.has_key(ord(i)):
>             r.append (xlate[ord(i)])
>         elif ord(i) >= 0x80:
>             pass
>         else:
>             r.append (i)
>     return ''.join(r)

I'd use:
''.join([xlate.get(ord(i), i) for i in unicrap \
           if ord(i) in xlate or ord(i) < 0x80]

Using r.append(), in general, while being faster than string addition, 
is significantly slower than using list comprehensions.

  - Josiah



More information about the Python-list mailing list