tokenizer..
Alex Martelli
aleaxit at yahoo.com
Wed Jun 6 18:47:17 EDT 2001
"John" <john.thai at dspfactory.com> wrote in message
news:fhxT6.245523$Z2.2804247 at nnrp1.uunet.ca...
> Hello,
>
> Is there a built in function or module which allows me to get tokens
> from a string?
The .split() method of strings is good for many purposes (but
doesn't respect many definition of "tokens" -- still, it does
work just as well as c's strtok:-).
> I know there's the tokenize module but that needs a callable object and it
> seems to loop forever...
Is it so bad after all...?
D:\py21>python
Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
Alternative ReadLine 1.1 -- Copyright 2001, Chris Gonnerman
>>> class ReadOneLine:
... def __init__(self,data): self.data=data
... def __call__(self):
... result = self.data
... self.data = ''
... return result
...
>>> import tokenize
>>> tokenize.tokenize(ReadOneLine('''"Hello", said Bill, "what's up?"'''))
1,0-1,7: STRING '"Hello"'
1,7-1,8: OP ','
1,9-1,13: NAME 'said'
1,14-1,18: NAME 'Bill'
1,18-1,19: OP ','
1,20-1,32: STRING '"what\'s up?"'
2,0-2,0: ENDMARKER ''
>>> class Accum:
... def __init__(self): self.toks=[]
... def __call__(self, tp, st, *junk): self.toks.append(st)
...
>>> ac=Accum()
>>> tokenize.tokenize(ReadOneLine('''"Hello", said Bill, "what's
up?"'''),ac)
>>> ac.toks
['"Hello"', ',', 'said', 'Bill', ',', '"what\'s up?"', '']
>>>
Alex
More information about the Python-list
mailing list