string split

Greg Jorgensen gregj at pobox.com
Sun Dec 24 04:28:12 EST 2000


"Jacek Pop³awski" <jp at ulgo.koti.com.pl> wrote:

> I have string:
>
> s="one two <br> three"
> ...
> is it possible to split not by space (" ") but by "<" and ">" ?
>
> I need:
>
> ['one two','<br>','three']
>
> How to do it in simple way?

The regular expression (re) module split function may be what you want:

    import re

    # pattern matches <...>, but doesn't handle nested < ... < ...> ... >
    # the \s* serve to remove any whitespace before and after the < ... >
    rx = re.compile(r'\s*(<[^<>]*>)\s*')

    s = "one two <br> three"
    rx.split(s)        # ['one two', '<br>', 'three']

This won't handle nested <...>, e.g. <a <b>>, and it will insert an empty
element between adjacent <...><...> substrings, or before/after <...>
substrings that appear at the beginning or end of the string. Empty elements
can be removed with a filter() or a list comprehension:

    s = "one two <br><br> three<br>"
    t = rx.split(s)            # t=['one two', '<br>', '', '<br>', 'three',
'<br>', '']
    t = [i for i in t if i]    # t = ['one two', '<br>', '<br>', 'three']

or

    t = filter(lambda i: i, t)    # t = ['one two', '<br>', '<br>', 'three']

--
Greg Jorgensen
Deschooling Society
Portland, Oregon, USA
gregj at pobox.com





More information about the Python-list mailing list