re.sub(): replace longest match instead of leftmost match?

ting at thsu.org ting at thsu.org
Mon Dec 19 18:15:06 EST 2011


On Dec 16, 11:49 am, John Gordon <gor... at panix.com> wrote:
> I'm working with IPv6 CIDR strings, and I want to replace the longest
> match of "(0000:|0000$)+" with ":".  But when I use re.sub() it replaces
> the leftmost match, even if there is a longer match later in the string.

Typically this means that your regular expression is not specific
enough.

That is, if you get multiple matches, and you need to sort through
those matches before performing a replace, it usually means that you
should rewrite your expression to get a single match.

Invariably this happens when you try to take short cuts. I can't blame
you for using a short cut, as sometimes short cuts just work, but once
you find that your short cut fails, you need to step back and rethink
the problem, rather than try to hack your short cut.

I don't know what you are doing, but off the top of my head, I'd check
to see if the CIDR string is wrapped in a header message and include
the header as part of the search pattern, or if you know the IPv6
strings are interspersed with IPv4 strings, I would rewrite the regex
to exclude IPv4 strings.
--
// T.Hsu



More information about the Python-list mailing list