string split
Greg Jorgensen
gregj at pobox.com
Sun Dec 24 04:28:12 EST 2000
"Jacek Pop³awski" <jp at ulgo.koti.com.pl> wrote:
> I have string:
>
> s="one two <br> three"
> ...
> is it possible to split not by space (" ") but by "<" and ">" ?
>
> I need:
>
> ['one two','<br>','three']
>
> How to do it in simple way?
The regular expression (re) module split function may be what you want:
import re
# pattern matches <...>, but doesn't handle nested < ... < ...> ... >
# the \s* serve to remove any whitespace before and after the < ... >
rx = re.compile(r'\s*(<[^<>]*>)\s*')
s = "one two <br> three"
rx.split(s) # ['one two', '<br>', 'three']
This won't handle nested <...>, e.g. <a <b>>, and it will insert an empty
element between adjacent <...><...> substrings, or before/after <...>
substrings that appear at the beginning or end of the string. Empty elements
can be removed with a filter() or a list comprehension:
s = "one two <br><br> three<br>"
t = rx.split(s) # t=['one two', '<br>', '', '<br>', 'three',
'<br>', '']
t = [i for i in t if i] # t = ['one two', '<br>', '<br>', 'three']
or
t = filter(lambda i: i, t) # t = ['one two', '<br>', '<br>', 'three']
--
Greg Jorgensen
Deschooling Society
Portland, Oregon, USA
gregj at pobox.com
More information about the Python-list
mailing list