Wilbert Berendsen wbsoft at
Thu Jan 21 15:18:54 CET 2010

Op maandag 18 januari 2010 schreef Adi:
> keys = [(len(key), key) for key in mapping.keys()]
> keys.sort(reverse=True)
> keys = [key for (_, key) in keys]
> pattern = "(%s)" % "|".join(keys)
> repl = lambda x : mapping[]
> s = "fooxxxbazyyyquuux"
> re.subn(pattern, repl, s)

I managed to make it even shorted, using the key argument for sorted, not 
putting the whole regexp inside parentheses and pre-compiling the regular 

import re

mapping = {
        "foo" : "bar",
        "baz" : "quux",
        "quuux" : "foo"

# sort the keys, longest first, so 'aa' gets matched before 'a', because
# in Python regexps the first match (going from left to right) in a
# |-separated group is taken
keys = sorted(mapping.keys(), key=len)

rx = re.compile("|".join(keys))
repl = lambda x: mapping[]
s = "fooxxxbazyyyquuux"
rx.sub(repl, s)

One thing remaining: if the replacement keys could contain non-alphanumeric 
characters, they should be escaped using re.escape:

rx = re.compile("|".join(re.escape(key) for key in keys))

Met vriendelijke groet,
Wilbert Berendsen

"You must be the change you wish to see in the world."
        -- Mahatma Gandhi

More information about the Python-list mailing list