Replace and inserting strings within .txt files with the use of regex
jstrickler at gmail.com
Sun Aug 8 19:29:16 CEST 2010
On Aug 8, 10:59 am, Thomas Jollans <tho... at jollans.com> wrote:
> On 08/08/2010 04:06 PM, Νίκος wrote:
> > On 8 Αύγ, 15:40, Thomas Jollans <tho... at jollans.com> wrote:
> >> On 08/08/2010 01:41 PM, Νίκος wrote:
> >>> I was so dizzy and confused yesterday that i forgot to metnion that
> >>> not only i need removal of php openign and closing tags but whaevers
> >>> data lurks inside those tags as well ebcause now with the 'counter.py'
> >>> script i wrote the html fiels would open ftm there and substitute the
> >>> tempalte variabels like %(counter)d
> >> I could just hand you a solution, but I'll be a bit of a bastard and
> >> just give you some hints.
> >> You could use regular expressions. If you know regular expressions, it's
> >> relatively trivial - but I doubt you know regexp.
> > Here is the code with some try-and-fail modification i made, still non-
> > working based on your hints:
> > ==========================================================
> > id = 0 # unique page_id
> > for currdir, files, dirs in os.walk('varsa'):
> > for f in files:
> > if f.endswith('php'):
> > # get abs path to filename
> > src_f = join(currdir, f)
> > # open php src file
> > print 'reading from %s' % src_f
> > f = open(src_f, 'r')
> > src_data = f.read() # read contents of PHP file
> > f.close()
> > # replace tags
> > print 'replacing php tags and contents within'
> > src_data = src_data.replace(r'<?.?>', '') #
> > the dot matches any character i hope! no matter how many of them?!?
> Two problems here:
> str.replace doesn't use regular expressions. You'll have to use the re
> module to use regexps. (the re.sub function to be precise)
> '.' matches a single character. Any character, but only one.
> '.*' matches as many characters as possible. This is not what you want,
> since it will match everything between the *first* <? and the *last* ?>.
> You want non-greedy matching.
> '.*?' is the same thing, without the greed.
> > # add ID
> > print 'adding unique page_id'
> > src_data = ( '<!-- %d -->' % id ) + src_data
> > id += 1
> > # add template variables
> > print 'adding counter template variable'
> > src_data = src_data + ''' <h4><font color=green> Αριθμός
> > Επισκεπτών: %(counter)d </font></h4> '''
> > # i can think of this but the above line must be above </
> > body></html> NOT after but how to right that?!?
> You will have to find the </body> tag before inserting the string.
> str.find should help -- or you could use str.replace and replace the
> </body> tag with you counter line, plus a new </body>.
> > # rename old php file to new with .html extension
> > src_file = src_file.replace('.php', '.html')
> > # open newly created html file for inserting data
> > print 'writing to %s' % dest_f
> > dest_f = open(src_f, 'w')
> > dest_f.write(src_data) # write contents
> > dest_f.close()
> > This is the best i can do.
> No it's not. You're just giving up too soon.
When replacing text in an HTML document with re.sub, you want to use
the re.S (singleline) option; otherwise your pattern won't match when
the opening tag is on one line and the closing is on another.
More information about the Python-list