Replace various regex

Martin mdekauwe at gmail.com
Mon Feb 15 17:26:16 EST 2010


On Feb 15, 2:27 pm, Jean-Michel Pichavant <jeanmic... at sequans.com>
wrote:
> Martin wrote:
> > On Feb 15, 2:03 pm, Jean-Michel Pichavant <jeanmic... at sequans.com>
> > wrote:
>
> >> Martin wrote:
>
> >>> Hi,
>
> >>> I am trying to come up with a more generic scheme to match and replace
> >>> a series ofregex, which look something like this...
>
> >>> 19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
> >>> 5.0, 4.0, 2.0, 4.0, 1.0      !  lai(1:npft)
>
> >>> Ideally match the pattern to the right of the "!" sign (e.g. lai), I
> >>> would then like to be able to replace one or all of the corresponding
> >>> numbers on the line. So far I have a rather unsatisfactory solution,
> >>> any suggestions would be appreciated...
>
> >>> The file read in is an ascii file.
>
> >>> f = open(fname, 'r')
> >>> s = f.read()
>
> >>> if CANHT:
> >>>     s = re.sub(r"\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+,\d+.\d+   !
> >>> canht_ft", CANHT, s)
>
> >>> where CANHT might be
>
> >>> CANHT = '115.01,16.38,0.79,1.26,1.00   !  canht_ft'
>
> >>> But this involves me passing the entire string.
>
> >>> Thanks.
>
> >>> Martin
>
> >> I remove all lines containing things like 9*0.0 in your file, cause I
> >> don't know what they mean and how to handle them. These are not numbers.
>
> >> import re
>
> >> replace = {
> >>     'snow_grnd' : (1, '99.99,'), # replace the 1st number by 99.99
> >>     't_soil' : (2, '88.8,'), # replace the 2nd number by 88.88
> >>     }
>
> >> testBuffer = """
> >>  0.749, 0.743, 0.754, 0.759  !  stheta(1:sm_levels)(top to bottom)
> >> 0.46                         !  snow_grnd
> >> 276.78,277.46,278.99,282.48  !  t_soil(1:sm_levels)(top to bottom)
> >> 19.01,16.38,0.79,1.26,1.00   !  canht_ft(1:npft)
> >> 200.0, 4.0, 2.0, 4.0, 1.0 !  lai(1:npft)
> >> """
>
> >> outputBuffer = ''
> >> for line in testBuffer.split('\n'):
> >>     for key, (index, repl) in replace.items():
> >>         if key in line:
> >>             parameters = {
> >>                 'n' : '[\d\.]+', # given you example you have to change
> >> this one, I don't know what means 9*0.0 in your file
> >>                 'index' : index - 1,
> >>             }
> >>             # the following pattern will silently match any digit before
> >> the <index>th digit is found, and use a capturing parenthesis for the last
> >>             pattern =
> >> '(\s*(?:(?:%(n)s)[,\s]+){0,%(index)s})(?:(%(n)s)[,\s]+)(.*!.*)' %
> >> parameters # regexp are sometimes a nightmare to read
> >>             line = re.sub(pattern, r'\1 '+repl+r'\3' , line)
> >>             break
> >>     outputBuffer += line +'\n'
>
> >> print outputBuffer
>
> > Thanks I will take a look. I think perhaps I was having a very slow
> > day when I posted and realised I could solve the original problem more
> > efficiently and the problem wasn't perhaps as I first perceived. It is
> > enough to match the tag to the right of the "!" sign and use this to
> > adjust what lies on the left of the "!" sign. Currently I have
> > this...if anyone thinks there is a neater solution I am happy to hear
> > it. Many thanks.
>
> > variable_tag = 'lai'
> > variable = [200.0, 60.030, 0.060, 0.030, 0.030]
>
> > # generate adjustment string
> > variable = ",".join(["%s" % i for i in variable]) + ' !  ' +
> > variable_tag
>
> > # call func to adjust input file
> > adjustStandardPftParams(variable, variable_tag, in_param_fname,
> > out_param_fname)
>
> > and the inside of this func looks like this
>
> > def adjustStandardPftParams(self, variable, variable_tag, in_fname,
> > out_fname):
>
> >     f = open(in_fname, 'r')
> >     of = open(out_fname, 'w')
> >     pattern_found = False
>
> >     while True:
> >         line = f.readline()
> >         if not line:
> >             break
> >         pattern = re.findall(r"!\s+"+variable_tag, line)
> >         if pattern:
> >             print 'yes'
> >             print >> of, "%s" % variable
> >        pattern_found = True
>
> >         if pattern_found:
> >             pattern_found = False
> >         else:
> >             of.write(line)
>
> >     f.close()
> >     of.close()
>
> >     return
>
> Are you sure a simple
> if variable_tag in line:
>     # do some stuff
>
> is not enough ?
>
> People will usually prefer to write
>
> for line in open(in_fname, 'r') :
>
> instead of your ugly while loop ;-)
>
> JM

My while loop is suitably offended. I have changed it as you
suggested...though if I do: if pattern (variable_tag) in line as you
suggested i would in my example correctly pick the tag lai, but also
one called dcatch_lai, which I wouldn't want. No doubt there is an
obvious solution I am again missing!

of = open(out_fname, 'w')
pattern_found = False

for line in open(in_fname, 'r'):
    pattern = re.findall(r"!\s+"+variable_tag, line)
    if pattern:
       print >> of, "%s" % variable
       pattern_found = True

    if pattern_found:
       pattern_found = False
    else:
       of.write(line)

of.close()

Many Thanks.



More information about the Python-list mailing list