split string at commas respecting quotes when string not in csv format
John Machin
sjmachin at lexicon.net
Thu Mar 26 16:46:08 EDT 2009
On Mar 27, 6:51 am, "R. David Murray" <rdmur... at bitdance.com> wrote:
> OK, I've got a little problem that I'd like to ask the assembled minds
> for help with. I can write code to parse this, but I'm thinking it may
> be possible to do it with regexes. My regex foo isn't that good, so if
> anyone is willing to help (or offer an alternate parsing suggestion)
> I would be greatful. (This has to be stdlib only, by the way, I
> can't introduce any new modules into the application so pyparsing is
> not an option.)
>
> The challenge is to turn a string like this:
>
> a=1,b="0234,)#($)@", k="7"
>
> into this:
>
> [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]
The challenge is for you to explain unambiguously what you want.
1. a=1 => "1" and k="7" => "7" ... is this a mistake or are the quotes
optional in the original string when not required to protect a comma?
2. What is the rule that explains the transmogrification of @ to # in
your example?
3. Is the input guaranteed to be syntactically correct?
The following should do close enough to what you want; adjust as
appropriate.
>>> import re
>>> s = """a=1,b="0234,)#($)@", k="7" """
>>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)')
>>> rx.findall(s)
[('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')]
>>> rx.findall('a=1, *DODGY*SYNTAX* b=2')
[('a', '1'), ('b', '2')]
>>>
HTH,
John
More information about the Python-list
mailing list