[Python-Dev] "".tokenize() ?
Fredrik Lundh
fredrik@pythonware.com
Fri, 4 May 2001 12:50:06 +0200
mal wrote:
> > > "one, two and three".tokenize([",", "and"])
> > > -> ["one", " two ", "three"]
> > >
> > > I like this method -- should I review the code and then check it in ?
> >
> > -1. method bloat. not exactly something you do every day, and
> > when you do, it's a one-liner:
> >
> > def tokenize(string, ignore):
> > [word for word in re.findall("\w+", string) if not word in ignore]
>
> This is not the same as what .tokenize() does: it cut at each
> occurrance of a substring rather than words as in your example
oh, I didn't see the spaces. splitting on all substrings is even
easier (but perhaps a bit more obscure, at least when written
on one line):
def tokenize(string, seps):
return re.split("|".join(map(re.escape, seps)), string)
Cheers /F