help with simple regular expression grouping with re
Bob Horvath
bob at horvath.com
Sun May 9 00:21:12 EDT 1999
Tim Peters wrote:
> [Bob Horvath]
> > Being relatively new to Python, I am trying to do something using re and
> > cannot figure out the right pattern to do what I want.
>
> That's OK -- regular expressions are tricky! Be sure to read
>
> http://www.python.org/doc/howto/regex/regex.html
>
> for a gentler intro than the reference manual has time to give.
>
Thanks, it is a little easier.
>
> > The input that I am parsing is a typical "mail merge" file, containing
> > comma separated fields that are surrounded by double quotes. A typical
> > line is:
> >
> > "field 1", "field 2","field 3 has is different, it has an embedded
> > comma","this one doesn't"
> >
> > I am trying to get a list of fields that are the strings that are
> > between the quotes, including any embedded commas.
>
> Note that regexps are utterly unforgiving -- the first two fields in your
> example aren't separated by a comma, but by a comma followed by a blank. I
> don't know whether that was a typo or a requirement, so let's write
> something that doesn't care <wink>:
It was a typo. The commas do not have blanks around them when separating
fields. Nor are there any blanks or other white space at outside of the
double quoted fields.
>
>
> import re
> pattern = re.compile(r"""
> " # match an open quote
> ( # start a group so re.findall returns only this part
> [^"]*? # match shortest run of non-quote characters
> ) # close the group
> " # and match the close quote
> """, re.VERBOSE)
>
> answer = re.findall(pattern, your_example)
> for field in answer:
> print field
>
> That prints:
>
> field 1
> field 2
> field 3 has is different, it has an embedded comma
> this one doesn't
>
> Just study that until your eyes bleed <wink>.
>
Well, I did a lot of searching around before and after my original post, and
while findall seems to be the thing I want, I am using 1.5.1, which apparently
does not have it. I can upgrade my Linux system, but the system where it will
ultimately run might be a different story.
Is there a way to do the equivalent of findall on releases prior to having it?
Downloading-a-new-version-now-to-see-if-there-is-a-re.findall.py,
Bob
More information about the Python-list
mailing list