String multi-replace
Sorin Schwimmer
sxn02 at yahoo.com
Wed Nov 17 23:21:06 EST 2010
Hi All,
I have to eliminate diacritics in a fairly large file.
Inspired by http://code.activestate.com/recipes/81330/, I came up with the following code:
#! /usr/bin/env python
import re
nodia={chr(196)+chr(130):'A', # mamaliga
chr(195)+chr(130):'A', # A^
chr(195)+chr(142):'I', # I^
chr(195)+chr(150):'O', # OE
chr(195)+chr(156):'U', # UE
chr(195)+chr(139):'A', # AE
chr(197)+chr(158):'S',
chr(197)+chr(162):'T',
chr(196)+chr(131):'a', # mamaliga
chr(195)+chr(162):'a', # a^
chr(195)+chr(174):'i', # i^
chr(195)+chr(182):'o', # oe
chr(195)+chr(188):'u', # ue
chr(195)+chr(164):'a', # ae
chr(197)+chr(159):'s',
chr(197)+chr(163):'t'
}
name="R\xc3\xa2\xc5\x9fca"
regex = re.compile("(%s)" % "|".join(map(re.escape, nodia.keys())))
print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
But it won't work; I end up with:
Traceback (most recent call last):
File "multirep.py", line 25, in <module>
print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
File "multirep.py", line 25, in <lambda>
print regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], name)
TypeError: 'type' object is not subscriptable
What am I doing wrong?
Thanks for your advice,
SxN
More information about the Python-list
mailing list