[Tutor] tokenizing a simple string with split()

Kent Johnson kent37 at tds.net
Sun Apr 1 05:42:14 CEST 2007


Andrei Petre wrote:
> I want to split a string like "C:\My\Doc\;D:\backup\" with two 
> separators: \ and ;
> I found that \ is handled with /raw string/ notation r"". But the 
> problem i encountered is with split() function.
> In the 2.5 reference is said that "The sep argument of the split() 
> function may consist of multiple characters". 

The argument to split() is the literal string to split on, not a list of 
potential splitting characters. So to split on '; ' your string would 
have to be 'spam; egg; mail'.

To split on one of a list of characters you have to use a regular 
expression and re.split().

In [1]: import re
In [3]: re.split('[; ]', "spam;egg mail")
Out[3]: ['spam', 'egg', 'mail']

[; ] is a regular expression that means, "match either of ; or space".

Kent


More information about the Tutor mailing list