Questions about regex
Bobby
bobby.house at gmail.com
Fri May 29 21:27:32 EDT 2009
On May 29, 1:26 pm, Jared.S.Ba... at gmail.com wrote:
> Hello,
>
> I'm new to python and I'm having problems with a regular expression. I
> use textmate as my editor and when I run the regex in textmate it
> works fine, but when I run it as part of the script it freezes. Could
> anyone help me figure out why this is happening and how to fix it.
> Here is the script:
>
> ======================================================
> # regular expression search and replace
> import sys, os, re, string, csv
>
> #Open the file and taking its data
> myfile=open('Steve_query3.csv') #Steve_query_test.csv
> #create an error flag to loop the script twice
> #store all file's data in the string object 'text'
> myfile.seek(0)
> text = myfile.read()
>
> for i in range(2):
> #def textParse(text, reRun):
> print 'how many times is this getting executed', i
>
> #Now to create the newfile 'test' and write our 'text'
> newfile = open('Steve_query3_out.csv', 'w')
> #open the new file and set it with 'w' for "write"
> #loop trough 'text' clean them up and write them into the 'newfile'
> #sub( pattern, repl, string[, count])
> #"sub("(?i)b+", "x", "bbbb BBBB")" returns 'x x'.
> text = re.sub('(\<(/?[^\>]+)\>)', "", text)#remove the HTML
> text = re.sub('/<!--(.|\s)*?-->/', "", text) #remove comments <!--[^
> \-]+-->
> text = re.sub('\/\*(.|\s)*?;}', "", text) #remove css formatting
> #remove a bunch of word formatting yuck
> text = re.sub(" ", " ", text)
> text = re.sub("<", "<", text)
> text = re.sub(">", ">", text)
> text = re.sub(""|&rquot;|“", "\'", text)
> #===================================
> #The two following lines are the ones giving me the problems
> text = re.sub("w:(.|\s)*?\n", "", text)
> text = re.sub("UnhideWhenUsed=(.|\s)*?\n", "", text)
> #===========================================
> text = re.sub(re.compile('^\r?\n?$', re.MULTILINE), '', text) #remove
> the extra whitespace
> #now write out the new file and close it
> newfile.write(text)
> newfile.close()
>
> #open the newfile and run the script again
> #Open the file and taking its data
>
> myfile=open('Steve_query3_out.csv') #Steve_query_test.csv
> #store all file's data in the string object 'text'
> myfile.seek(0)
> text = myfile.read()
>
> Thanks for the help,
>
> -Jared
Can you give a string that you would expect the regex to match and
what the expected result would be? Currently, it looks like the
interesting part of the regex (.|\s)*? would match any character of
any length once. There seems to be some redundancy that makes it more
confusing then it needs to be. I'm pretty sure that . will also match
anything that \s will match or maybe you just need to escape . because
you meant for it to be a literal.
More information about the Python-list
mailing list