split string at commas respecting quotes when string not in csv format
Paul McGuire
ptmcg at austin.rr.com
Fri Mar 27 09:54:48 EDT 2009
On Mar 27, 5:19 am, Tim Chase <python.l... at tim.thechases.com> wrote:
> >> >>> import re
> >> >>> s = """a=1,b="0234,)#($)@", k="7" """
> >> >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)')
> >> >>> rx.findall(s)
> >> [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')]
> >> >>> rx.findall('a=1, *DODGY*SYNTAX* b=2')
> >> [('a', '1'), ('b', '2')]
>
> > I'm going to save this one and study it, too. I'd like to learn
> > to use regexes better, even if I do try to avoid them when possible :)
>
> This regexp is fairly close to the one I used, but I employed the
> re.VERBOSE flag to split it out for readability. The above
> breaks down as
>
> [ ]* # optional whitespace, traditionally "\s*"
> (\w+) # tag the variable name as one or more "word" chars
> = # the literal equals sign
> ( # tag the value
> [^",]+ # one or more non-[quote/comma] chars
> | # or
> "[^"]*" # quotes around a bunch of non-quote chars
> ) # end of the value being tagged
> [ ]* # same as previously, optional whitespace ("\s*")
> (?: # a non-capturing group (why?)
> , # a literal comma
> | # or
> $ # the end-of-line/string
> ) # end of the non-capturing group
>
> Hope this helps,
>
> -tkc
Mightent there be whitespace on either side of the '=' sign? And if
you are using findall, why is the bit with the delimiting commas or
end of line/string necessary? I should think findall would just skip
over this stuff, like it skips over *DODGY*SYNTAX* in your example.
-- Paul
More information about the Python-list
mailing list