Problem with re module
Ethan Furman
ethan at stoneleaf.us
Tue Mar 22 19:26:21 EDT 2011
John Harrington wrote:
> Here's a script that illustrates the problem. Any help would be
> appreciated!:
>
> #BEGIN SCRIPT
> import re
>
> outlist = []
> myfile = "raw.tex"
>
> fin = open(myfile, "r")
> lineList = fin.readlines()
> fin.close()
>
> for i in range(0,len(lineList)):
>
> lineList[i]=re.sub(r'(\\begin{document})([^\n])',r'\1\n\n
> \2',lineList[i])
>
> outlist.append(lineList[i])
>
> fou = open(myfile, "w")
> for i in range(len(outlist)):
> fou.write(outlist[i])
> fou.close
> #END SCRIPT
>
> And the file raw.tex:
>
> %BEGIN TeX FILE
> \begin{document}
> This line should remain right after the above line in the output, but
> doesn't
>
> \begin{document}Extra stuff here should appear below the begin line
> and does in the output.
> %END TeX FILE
Here's the important tidbit:
re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)
From the docs:
'.'
(Dot.) In the default mode, this matches any character except a newline.
If the DOTALL flag has been specified, this matches any character
including a newline.
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding
RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s; it will
not match just ‘a’.
And here's the entire program, a bit more pythonically:
8<---------------------------------------------------------------
import re
outlist = []
myfile = "raw.tex"
fin = open(myfile, "r")
lineList = fin.readlines()
fin.close()
for line in lineList:
line = re.sub(r'(\\begin{document})(.+)', r'\1\n\n\2', line)
outlist.append(line)
fou = open(myfile, "w")
for line in outlist:
fou.write(line)
fou.close
8<---------------------------------------------------------------
Hope this helps!
~Ethan~
More information about the Python-list
mailing list