Problem with re module

Tue Mar 22 14:35:57 EDT 2011

John Harrington wrote:

> I'm trying to use the following substitution,
> 
>      lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
> \2',lineList[i])
> 
> I intend this to match any string "\begin{document}" that doesn't end
> in a line ending.  If there's no line ending, then, I want to place
> two carriage returns between the string and the non-line end
> character.
> 
> However, this places carriage returns even when the string is followed
> directly after with a line ending.  Can someone explain to me why this
> match is not behaving as I intend it to, especially the ([^$])?

Quoting http://docs.python.org/library/re.html:
"""
Special characters are not active inside sets. For example, [akm$] will 
match any of the characters 'a', 'k', 'm', or '$';
"""
>
> Also, how can I write a regex that matches what I wish to match, as
> described above?

I think you want a "negative lookahead assertion", (?!...):

>>> print re.compile("(xxx)(?!$)", re.MULTILINE).sub(r"\1**", "aaa bbb 
xxx\naaa xxx bbb\nxxx")
aaa bbb xxx
aaa xxx** bbb
xxx