substitution

Iain King iainking at gmail.com
Thu Jan 21 09:58:14 EST 2010


On Jan 21, 2:18 pm, Wilbert Berendsen <wbs... at xs4all.nl> wrote:
> Op maandag 18 januari 2010 schreef Adi:
>
> > keys = [(len(key), key) for key in mapping.keys()]
> > keys.sort(reverse=True)
> > keys = [key for (_, key) in keys]
>
> > pattern = "(%s)" % "|".join(keys)
> > repl = lambda x : mapping[x.group(1)]
> > s = "fooxxxbazyyyquuux"
>
> > re.subn(pattern, repl, s)
>
> I managed to make it even shorted, using the key argument for sorted, not
> putting the whole regexp inside parentheses and pre-compiling the regular
> expression:
>
> import re
>
> mapping = {
>         "foo" : "bar",
>         "baz" : "quux",
>         "quuux" : "foo"
>
> }
>
> # sort the keys, longest first, so 'aa' gets matched before 'a', because
> # in Python regexps the first match (going from left to right) in a
> # |-separated group is taken
> keys = sorted(mapping.keys(), key=len)
>
> rx = re.compile("|".join(keys))
> repl = lambda x: mapping[x.group()]
> s = "fooxxxbazyyyquuux"
> rx.sub(repl, s)
>
> One thing remaining: if the replacement keys could contain non-alphanumeric
> characters, they should be escaped using re.escape:
>
> rx = re.compile("|".join(re.escape(key) for key in keys))
>
> Met vriendelijke groet,
> Wilbert Berendsen
>
> --http://www.wilbertberendsen.nl/
> "You must be the change you wish to see in the world."
>         -- Mahatma Gandhi

Sorting it isn't the right solution: easier to hold the subs as tuple
pairs and by doing so let the user specify order.  Think of the
following subs:

"fooxx" -> "baz"
"oxxx" -> "bar"

does the user want "bazxbazyyyquuux" or "fobarbazyyyquuux"?

Iain



More information about the Python-list mailing list