tokenizer..

Alex Martelli aleaxit at yahoo.com
Wed Jun 6 18:47:17 EDT 2001


"John" <john.thai at dspfactory.com> wrote in message
news:fhxT6.245523$Z2.2804247 at nnrp1.uunet.ca...
> Hello,
>
>     Is there a built in function or module which allows me to get tokens
> from a string?

The .split() method of strings is good for many purposes (but
doesn't respect many definition of "tokens" -- still, it does
work just as well as c's strtok:-).


> I know there's the tokenize module but that needs a callable object and it
> seems to loop forever...

Is it so bad after all...?

D:\py21>python
Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
Alternative ReadLine 1.1 -- Copyright 2001, Chris Gonnerman
>>> class ReadOneLine:
...     def __init__(self,data): self.data=data
...     def __call__(self):
...         result = self.data
...         self.data = ''
...         return result
...
>>> import tokenize
>>> tokenize.tokenize(ReadOneLine('''"Hello", said Bill, "what's up?"'''))
1,0-1,7:        STRING  '"Hello"'
1,7-1,8:        OP      ','
1,9-1,13:       NAME    'said'
1,14-1,18:      NAME    'Bill'
1,18-1,19:      OP      ','
1,20-1,32:      STRING  '"what\'s up?"'
2,0-2,0:        ENDMARKER       ''
>>> class Accum:
...     def __init__(self): self.toks=[]
...     def __call__(self, tp, st, *junk): self.toks.append(st)
...
>>> ac=Accum()
>>> tokenize.tokenize(ReadOneLine('''"Hello", said Bill, "what's
up?"'''),ac)
>>> ac.toks
['"Hello"', ',', 'said', 'Bill', ',', '"what\'s up?"', '']
>>>


Alex






More information about the Python-list mailing list