aleaxit at yahoo.com
Wed Sep 1 13:26:22 CEST 2004
Angus Mackay <yeah at right.com> wrote:
> I remember python having a generic tokenizer in the library. all I want
> is to set a list of token seperators and then read tokens out of a
> stream, the token seperators should be returned as themselves.
> is there anything like this?
Not as such in the standard library: the functions in module tokenizer
do not let you 'set a list of token separators'. If what you're
tokenizing can fit in a string in memory, module re can help:
>>> for w in x.split('a,b, c;d; e'): print repr(w),'+',
'a' + ',' + 'b' + ',' + '' + ' ' + 'c' + ';' + 'd' + ';' + '' + ' ' +
Note that you get empty-string items when two separators abut.
If the limitations of re.split (stuff must fit in memory, &c) are a
problem, then the lexx-like solutions I see somebody else suggested may
be more appropriate for your needs.
More information about the Python-list