Spliting a string on non alpha characters

Tim Chase python.list at tim.thechases.com
Sat Sep 23 16:59:26 CEST 2006


> I'm relatively new to python but I already noticed that many lines of
> python code can be simplified to a oneliner by some clever coder. As
> the topics says, I'm trying to split lines like this :
> 
> 'foo bar- blah/hm.lala' -> [foo, bar, blah, hm, lala]
> 
> 'foo////bbbar.. xyz' -> [foo, bbbar, xyz]
> 
> obviously a for loop catching just chars could do the trick, but I'm
> looking for a more elegant way. Anyone can help?

1st, I presume you mean that you want back

['foo', 'bar', 'blah', 'hm', 'lala']

instead of

[foo, bar, blah, hm, lala]

(which would presume you have variables named as such, which is 
kinda funky)

That said...

Well, I'm sure there are scads of ways to do this.  I know 
regexps can do it fairly cleanly:

 >>> import re
 >>> r = re.compile(r'\w+')
 >>> s = 'foo bar- blah/hm.lala'
 >>> s2 = 'foo////bbbar.. xyz'
 >>> r.findall(s)
['foo', 'bar', 'blah', 'hm', 'lala']
 >>> r.findall(s2)
['foo', 'bbbar', 'xyz']

The regexp in question (r'\w+') translates to "one or more 'word' 
character".  The definition of a 'word' character depends on your 
locale/encoding, but would at a minimum include your standard 
alphabet, and digits.

If you're not interested in digits, and only want 26*2 letters, 
you can use

 >>> r = re.compile(r'[a-zA-Z]+')

instead (which would be "one or more letters in the set [a-zA-Z]").

-tkc








More information about the Python-list mailing list