help with re.split()

Ben Hutchings ben.hutchings at roundpoint.com
Tue Feb 20 19:16:29 EST 2001


Steve Mak <stevemak at softhome.net> writes:

> Hi guys,
> 
>     How do I use the re.split() function so it splits a line of text,
> keeping only the word. ie: it excludes any symbols, spaces, etc. I tried
> p=re.split('[. ]+', line), but some spaces are being kept.

If the line has a leading and/or trailing section that matches the
pattern, then the result list will begin and/or end with an empty
string, but no spaces will be kept.  If you then join the listed
strings back together with spaces then you will get leading and/or
trailing space.

For example:
    >>> p=re.split(' ', ' string with leading and trailing space ')
    >>> p
    ['', 'string', 'with', 'leading', 'and', 'trailing', 'space', '']
    >>> string.join(p, ' ')
    ' string with leading and trailing space '

Perhaps this is what you are seeing?

> if the line is Hello, how are you?
> 
> I need to split it so I get:
> Hello
> how
> are
> you

Then you must include ',' and '?' in the character class.

-- 
Any opinions expressed are my own and not necessarily those of Roundpoint.



More information about the Python-list mailing list