Replace and inserting strings within .txt files with the use of regex

Thomas Jollans thomas at jollans.com
Sun Aug 8 10:59:34 EDT 2010


On 08/08/2010 04:06 PM, Νίκος wrote:
> On 8 Αύγ, 15:40, Thomas Jollans <tho... at jollans.com> wrote:
>> On 08/08/2010 01:41 PM, Νίκος wrote:
>>
>>> I was so dizzy and confused yesterday that i forgot to metnion that
>>> not only i need removal of php openign and closing tags but whaevers
>>> data lurks inside those tags as well ebcause now with the 'counter.py'
>>> script i wrote the html fiels would open ftm there and substitute the
>>> tempalte variabels like %(counter)d
>>
>> I could just hand you a solution, but I'll be a bit of a bastard and
>> just give you some hints.
>>
>> You could use regular expressions. If you know regular expressions, it's
>> relatively trivial - but I doubt you know regexp.
> 
> Here is the code with some try-and-fail modification i made, still non-
> working based on your hints:
> ==========================================================
> 
> id = 0  # unique page_id
> 
> for currdir, files, dirs in os.walk('varsa'):
> 
>     for f in files:
> 
>         if f.endswith('php'):
> 
>             # get abs path to filename
>             src_f = join(currdir, f)
> 
>             # open php src file
>             print 'reading from %s' % src_f
>             f = open(src_f, 'r')
>             src_data = f.read()         # read contents of PHP file
>             f.close()
> 
>             # replace tags
>             print 'replacing php tags and contents within'
>             src_data = src_data.replace(r'<?.?>', '')             #
> the dot matches any character i hope! no matter how many of them?!?

Two problems here:

str.replace doesn't use regular expressions. You'll have to use the re
module to use regexps. (the re.sub function to be precise)

'.'  matches a single character. Any character, but only one.
'.*' matches as many characters as possible. This is not what you want,
since it will match everything between the *first* <? and the *last* ?>.
You want non-greedy matching.

'.*?' is the same thing, without the greed.

> 
>             # add ID
>             print 'adding unique page_id'
>             src_data = ( '<!-- %d -->' % id ) + src_data
>             id += 1
> 
>             # add template variables
>             print 'adding counter template variable'
>             src_data = src_data + ''' <h4><font color=green> Αριθμός
> Επισκεπτών: %(counter)d </font></h4> '''
>             # i can think of this but the above line must be above </
> body></html> NOT after but how to right that?!?

You will have to find the </body> tag before inserting the string.
str.find should help -- or you could use str.replace and replace the
</body> tag with you counter line, plus a new </body>.

> 
>             # rename old php file to new with .html extension
>             src_file = src_file.replace('.php', '.html')
> 
>             # open newly created html file for inserting data
>             print 'writing to %s' % dest_f
>             dest_f = open(src_f, 'w')
>             dest_f.write(src_data)      # write contents
>             dest_f.close()
> 
> This is the best i can do.

No it's not. You're just giving up too soon.




More information about the Python-list mailing list