Trouble splitting strings with consecutive delimiters

Peter Otten __peter__ at web.de
Tue May 1 08:55:13 EDT 2012


deuteros wrote:

> I'm using regular expressions to split a string using multiple delimiters.
> But if two or more of my delimiters occur next to each other in the
> string, it puts an empty string in the resulting list. For example:
> 
>     re.split(':|;|px', "width:150px;height:50px;float:right")
> 
> Results in
> 
>     ['width', '150', '', 'height', '50', '', 'float', 'right']
> 
> Is there any way to avoid getting '' in my list without adding px; as a
> delimiter?

That looks like a CSS style; to parse it you should use a tool that was 
built for the job. The first one I came across (because it is included in 
the linux distro I'm using and has "css" in its name, so this is not an 
endorsement) is

http://packages.python.org/cssutils/

>>> import cssutils
>>> style = cssutils.parseStyle("width:150px;height:50px;float:right")
>>> for property in style.getProperties():
...     print property.name, "-->", property.value
... 
width --> 150px
height --> 50px
float --> right

OK, so you still need to strip off the unit prefix manually:

>>> def strip_suffix(s, *suffixes):
...     for suffix in suffixes:
...             if s.endswith(suffix):
...                     return s[:-len(suffix)]
...     return s
... 
>>> strip_suffix(style.float, "pt", "px")
u'right'
>>> strip_suffix(style.width, "pt", "px")
u'150'





More information about the Python-list mailing list