substitution
Wilbert Berendsen
wbsoft at xs4all.nl
Thu Jan 21 09:18:54 EST 2010
Op maandag 18 januari 2010 schreef Adi:
> keys = [(len(key), key) for key in mapping.keys()]
> keys.sort(reverse=True)
> keys = [key for (_, key) in keys]
>
> pattern = "(%s)" % "|".join(keys)
> repl = lambda x : mapping[x.group(1)]
> s = "fooxxxbazyyyquuux"
>
> re.subn(pattern, repl, s)
I managed to make it even shorted, using the key argument for sorted, not
putting the whole regexp inside parentheses and pre-compiling the regular
expression:
import re
mapping = {
"foo" : "bar",
"baz" : "quux",
"quuux" : "foo"
}
# sort the keys, longest first, so 'aa' gets matched before 'a', because
# in Python regexps the first match (going from left to right) in a
# |-separated group is taken
keys = sorted(mapping.keys(), key=len)
rx = re.compile("|".join(keys))
repl = lambda x: mapping[x.group()]
s = "fooxxxbazyyyquuux"
rx.sub(repl, s)
One thing remaining: if the replacement keys could contain non-alphanumeric
characters, they should be escaped using re.escape:
rx = re.compile("|".join(re.escape(key) for key in keys))
Met vriendelijke groet,
Wilbert Berendsen
--
http://www.wilbertberendsen.nl/
"You must be the change you wish to see in the world."
-- Mahatma Gandhi
More information about the Python-list
mailing list