help with re.split()
Carel Fellinger
cfelling at iae.nl
Tue Feb 20 18:31:24 EST 2001
Steve Mak <stevemak at softhome.net> wrote:
> Hi guys,
> How do I use the re.split() function so it splits a line of text,
> keeping only the word. ie: it excludes any symbols, spaces, etc. I tried
> p=re.split('[. ]+', line), but some spaces are being kept.
Maybe some space are taps in disguish?
The approach you toke has it's pitfalls, like in
>>> import re
>>> re.split(r'\W+', 'Hello, how are you?')
['Hello', 'how', 'are', 'you', '']
Notice this empty string at the end of the split-list.
That's because after the matched "?" there is an empty string
(there always is:). The same would happen with a leading match.
Maybe finding all words in that list is simpler, like (\w+ matches
all alphanumeric strings.):
>>> import re
>>> re.findall(r'\w+', ' Hello, how are you? ')
['Hello', 'how', 'are', 'you']
>>> re.findall(r'\w+', ' Hello, 2 you 2! ')
['Hello', '2', 'you', '2']
--
groetjes, carel
More information about the Python-list
mailing list