split string at commas respecting quotes when string not in csv format
Tim Chase
python.list at tim.thechases.com
Fri Mar 27 06:19:01 EDT 2009
>> >>> import re
>> >>> s = """a=1,b="0234,)#($)@", k="7" """
>> >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)')
>> >>> rx.findall(s)
>> [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')]
>> >>> rx.findall('a=1, *DODGY*SYNTAX* b=2')
>> [('a', '1'), ('b', '2')]
>> >>>
>
> I'm going to save this one and study it, too. I'd like to learn
> to use regexes better, even if I do try to avoid them when possible :)
This regexp is fairly close to the one I used, but I employed the
re.VERBOSE flag to split it out for readability. The above
breaks down as
[ ]* # optional whitespace, traditionally "\s*"
(\w+) # tag the variable name as one or more "word" chars
= # the literal equals sign
( # tag the value
[^",]+ # one or more non-[quote/comma] chars
| # or
"[^"]*" # quotes around a bunch of non-quote chars
) # end of the value being tagged
[ ]* # same as previously, optional whitespace ("\s*")
(?: # a non-capturing group (why?)
, # a literal comma
| # or
$ # the end-of-line/string
) # end of the non-capturing group
Hope this helps,
-tkc
More information about the Python-list
mailing list