Problem with re module
Benjamin Kaplan
benjamin.kaplan at
Tue Mar 22 15:07:06 EDT 2011
On Tue, Mar 22, 2011 at 2:40 PM, John Harrington
<beartiger.all at> wrote:
> On Mar 22, 11:16 am, John Bokma <j... at> wrote:
>> John Harrington <beartiger.... at> writes:
>> > I'm trying to use the following substitution,
>> > lineList[i]=re.sub(r'(\\begin{document})([^$])',r'\1\n\n
>> > \2',lineList[i])
>> > I intend this to match any string "\begin{document}" that doesn't end
>> > in a line ending. If there's no line ending, then, I want to place
>> > two carriage returns between the string and the non-line end
>> > character.
>> > However, this places carriage returns even when the string is followed
>> > directly after with a line ending. Can someone explain to me why this
>> > match is not behaving as I intend it to, especially the ([^$])?
>> [^$] matches: not a $ character
>> You might want [^\n]
> Thank you, John.
> I thought that when you use "r" before the regex, $ matches an end of
> line. But, in any case, if I use "[^\n]" as you suggest I get the
> same result.
r before a string has nothing to do with regexes. It signals a raw
string- escape sequences wont' be escaped.
>>> print 'a\tb'
a b
>>> print r'a\tb'
We use raw strings for regexes because otherwise, you'd have to
remember double up all your backslashes. And double up your doubled up
backslashes when you really want a backslash.
> Here's a script that illustrates the problem. Any help would be
> appreciated!:
> import re
> outlist = []
> myfile = "raw.tex"
> fin = open(myfile, "r")
> lineList = fin.readlines()
> fin.close()
> for i in range(0,len(lineList)):
> lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
> \2',lineList[i])
> outlist.append(lineList[i])
> fou = open(myfile, "w")
> for i in range(len(outlist)):
> fou.write(outlist[i])
> fou.close
> And the file raw.tex:
> \begin{document}
> This line should remain right after the above line in the output, but
> doesn't
> \begin{document}Extra stuff here should appear below the begin line
> and does in the output.
Works for me. Do you have a space after the \begin{document} or
something? Because that get moved. You might want to check for
non-whitespace characters in the reges instead of just non-newlines.
> --
More information about the Python-list
mailing list