Replace various regex

Mon Feb 15 09:27:40 EST 2010

Martin wrote:
> On Feb 15, 2:03 pm, Jean-Michel Pichavant <jeanmic... at sequans.com>
> wrote:
>   
>> Martin wrote:
>>     
>>> Hi,
>>>       
>>> I am trying to come up with a more generic scheme to match and replace
>>> a series of regex, which look something like this...
>>>       
>>> 19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
>>> 5.0, 4.0, 2.0, 4.0, 1.0      !  lai(1:npft)
>>>       
>>> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
>>> would then like to be able to replace one or all of the corresponding
>>> numbers on the line. So far I have a rather unsatisfactory solution,
>>> any suggestions would be appreciated...
>>>       
>>> The file read in is an ascii file.
>>>       
>>> f = open(fname, 'r')
>>> s = f.read()
>>>       
>>> if CANHT:
>>>     s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+   !
>>> canht_ft", CANHT, s)
>>>       
>>> where CANHT might be
>>>       
>>> CANHT = '115.01,16.38,0.79,1.26,1.00   !  canht_ft'
>>>       
>>> But this involves me passing the entire string.
>>>       
>>> Thanks.
>>>       
>>> Martin
>>>       
>> I remove all lines containing things like 9*0.0 in your file, cause I
>> don't know what they mean and how to handle them. These are not numbers.
>>
>> import re
>>
>> replace = {
>>     'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
>>     't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
>>     }
>>
>> testBuffer = """
>>  0.749, 0.743, 0.754, 0.759  !  stheta(1:sm_levels)(top to bottom)
>> 0.46                         !  snow_grnd
>> 276.78,277.46,278.99,282.48  !  t_soil(1:sm_levels)(top to bottom)
>> 19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
>> 200.0, 4.0, 2.0, 4.0, 1.0 !  lai(1:npft)
>> """
>>
>> outputBuffer = ''
>> for line in testBuffer.split('\n'):
>>     for key, (index, repl) in replace.items():
>>         if key in line:
>>             parameters = {
>>                 'n' : '[\d\.]+', # given you example you have to change
>> this one, I don't know what means 9*0.0 in your file
>>                 'index' : index - 1,
>>             }
>>             # the following pattern will silently match any digit before
>> the <index>th digit is found, and use a capturing parenthesis for the last
>>             pattern =
>> '(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
>> parameters # regexp are sometimes a nightmare to read
>>             line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
>>             break
>>     outputBuffer += line +'\n'
>>
>> print outputBuffer
>>     
>
> Thanks I will take a look. I think perhaps I was having a very slow
> day when I posted and realised I could solve the original problem more
> efficiently and the problem wasn't perhaps as I first perceived. It is
> enough to match the tag to the right of the "!" sign and use this to
> adjust what lies on the left of the "!" sign. Currently I have
> this...if anyone thinks there is a neater solution I am happy to hear
> it. Many thanks.
>
> variable_tag = 'lai'
> variable = [200.0, 60.030, 0.060, 0.030, 0.030]
>
> # generate adjustment string
> variable = ",".join(["%s" % i for i in variable]) + ' !  ' +
> variable_tag
>
> # call func to adjust input file
> adjustStandardPftParams(variable, variable_tag, in_param_fname,
> out_param_fname)
>
> and the inside of this func looks like this
>
> def adjustStandardPftParams(self, variable, variable_tag, in_fname,
> out_fname):
>
>     f = open(in_fname, 'r')
>     of = open(out_fname, 'w')
>     pattern_found = False
>
>     while True:
>         line = f.readline()
>         if not line:
>             break
>         pattern = re.findall(r"!\s+"+variable_tag, line)
>         if pattern:
>             print 'yes'
>             print >> of, "%s" % variable
> 	    pattern_found = True
>
>         if pattern_found:
>             pattern_found = False
>         else:
>             of.write(line)
>
>     f.close()
>     of.close()
>
>     return
>   

Are you sure a simple
if variable_tag in line:
    # do some stuff

is not enough ?

People will usually prefer to write

for line in open(in_fname, 'r') :

instead of your ugly while loop ;-)

JM