Trouble splitting strings with consecutive delimiters
Jussi Piitulainen
jpiitula at ling.helsinki.fi
Tue May 1 02:14:54 EDT 2012
deuteros writes:
> I'm using regular expressions to split a string using multiple
> delimiters. But if two or more of my delimiters occur next to each
> other in the string, it puts an empty string in the resulting
> list. For example:
>
> re.split(':|;|px', "width:150px;height:50px;float:right")
>
> Results in
>
> ['width', '150', '', 'height', '50', '', 'float', 'right']
>
> Is there any way to avoid getting '' in my list without adding px;
> as a delimiter?
You could use a sequence of such delimiters.
>>> re.split('(?::|;|px)+', "width:150px;height:50px;float:right")
['width', '150', 'height', '50', 'float', 'right']
Consider splitting twice instead: first into key-value substrings at
semicolons, and those into key-value pairs at colons. Here as a dict.
Better handle the units after that.
>>> dict(kv.split(':') for kv in "width:150px;height:50px;float:right".split(';'))
{'width': '150px', 'float': 'right', 'height': '50px'}
You might also want to accept whitespace as part of the delimiters.
(There might be a parser for such data formats somewhere in the
library already. CSV?)
More information about the Python-list
mailing list