split string at commas respecting quotes when string not in csv format

John Machin sjmachin at lexicon.net
Fri Mar 27 09:49:08 EDT 2009


On Mar 27, 9:19 pm, Tim Chase <python.l... at tim.thechases.com> wrote:
> >>  >>> import re
> >>  >>> s = """a=1,b="0234,)#($)@", k="7" """
> >>  >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)')
> >>  >>> rx.findall(s)
> >>  [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')]
> >>  >>> rx.findall('a=1, *DODGY*SYNTAX* b=2')
> >>  [('a', '1'), ('b', '2')]
>
> > I'm going to save this one and study it, too.  I'd like to learn
> > to use regexes better, even if I do try to avoid them when possible :)
>
> This regexp is fairly close to the one I used, but I employed the
> re.VERBOSE flag to split it out for readability.  The above
> breaks down as
>
>   [ ]*       # optional whitespace, traditionally "\s*"

No, it's optional space characters -- T'd regard any other type of
whitespace there as a stuff-up.

>   (\w+)      # tag the variable name as one or more "word" chars
>   =          # the literal equals sign
>   (          # tag the value
>   [^",]+     # one or more non-[quote/comma] chars
>   |          # or
>   "[^"]*"    # quotes around a bunch of non-quote chars
>   )          # end of the value being tagged
>   [ ]*       # same as previously, optional whitespace  ("\s*")

same correction as previously

>   (?:        # a non-capturing group (why?)

a group because I couldn't be bothered thinking too hard about the
precedence of the | operator, and non-capturing because the OP didn't
want it captured.

>   ,          # a literal comma
>   |          # or
>   $          # the end-of-line/string
>   )          # end of the non-capturing group
>
> Hope this helps,

Me too :-)

Cheers,
John



More information about the Python-list mailing list