negative lookahead question

Tim Peters tim.one at comcast.net
Mon Apr 21 13:40:21 EDT 2003


[Skip Montanaro]
> This re.sub call lives in Lib/stmplib.py as a way to make line endings
> canonical:
>
>     re.sub(r'(?:\r\n|\n|\r(?!\n))', CRLF, data)
>
> This certainly seems to do what's desired, however, it looks
> overly complex to me.  First, the non-grouping parens are unnecessary.

Yup.

> Second, I don't think the negative lookahead assertion is required.

That's also so.  Because alternatives are tried left to right, if we get to
the third alternative then we know we can't be looking at \r\n (that was the
first alternative, but the match attempt didn't stop there).

> This simpler function call seems to do the trick:
>
>     re.sub(r'\r\n|\n|\r', CRLF, data)

It should.  Other simpler equivalents are the regexps

    \r\n|[\r\n]

and

    \r\n?|\n

An interesting question is whether it's really necessary to replace \r\n
with itself (CRLF == '\r\n').  IOW, what's wrong with

    \r(?!\n)|\n

?  The problem is that it doesn't actually leave \r\n alone.  There's no
match when looking at the \r in \r\n, but then re.sub moves forward a
character and tries again, looking at the \n in \r\n.  That one would match,
and \r\n would get changed to \r\r\n.






More information about the Python-list mailing list