split string at commas respecting quotes when string not in csv format

Tim Chase python.list at tim.thechases.com
Fri Mar 27 06:19:01 EDT 2009


>>  >>> import re
>>  >>> s = """a=1,b="0234,)#($)@", k="7" """
>>  >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)')
>>  >>> rx.findall(s)
>>  [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')]
>>  >>> rx.findall('a=1, *DODGY*SYNTAX* b=2')
>>  [('a', '1'), ('b', '2')]
>>  >>>
> 
> I'm going to save this one and study it, too.  I'd like to learn
> to use regexes better, even if I do try to avoid them when possible :)

This regexp is fairly close to the one I used, but I employed the 
re.VERBOSE flag to split it out for readability.  The above 
breaks down as

  [ ]*       # optional whitespace, traditionally "\s*"
  (\w+)      # tag the variable name as one or more "word" chars
  =          # the literal equals sign
  (          # tag the value
  [^",]+     # one or more non-[quote/comma] chars
  |          # or
  "[^"]*"    # quotes around a bunch of non-quote chars
  )          # end of the value being tagged
  [ ]*       # same as previously, optional whitespace  ("\s*")
  (?:        # a non-capturing group (why?)
  ,          # a literal comma
  |          # or
  $          # the end-of-line/string
  )          # end of the non-capturing group

Hope this helps,

-tkc





More information about the Python-list mailing list