ka-ping yee tokenizer.py

Karl Kobata karl.kobata at syncira.com
Tue Sep 16 21:48:51 CEST 2008

Hi Fredrik,

This is exactly what I need.  Thank you.
I would like to do one additional function.  I am not using the tokenizer to
parse python code.  It happens to work very well for my application.
However, I would like either or both of the following variance:
1) I would like to add 2 other characters as comment designation
2) write a module that can readline, modify the line as required, and
finally, this module can be used as the argument for the tokenizer.

Def modifyLine( fileHandle ):
  # readline and modify this string if required

For token in tokenize.generate_tokens( modifyLine( myFileHandle ) ):
	Print token

Anxiously looking forward to your thoughts.

-----Original Message-----
From: python-list-bounces+kkobata=syncira.com at python.org
[mailto:python-list-bounces+kkobata=syncira.com at python.org] On Behalf Of
Fredrik Lundh
Sent: Monday, September 15, 2008 2:04 PM
To: python-list at python.org
Subject: Re: ka-ping yee tokenizer.py

Karl Kobata wrote:

> I have enjoyed using ka-ping yee's tokenizer.py.  I would like to 
> replace the readline parameter input with my own and pass a list of 
> strings to the tokenizer.  I understand it must be a callable object and 
> iteratable but it is obvious with errors I am getting, that this is not 
> the only functions required.

not sure I can decipher your detailed requirements, but to use Python's 
standard "tokenize" module (written by ping) on a list, you can simple 
do as follows:

     import tokenize

     program = [ ... program given as list ... ]

     for token in tokenize.generate_tokens(iter(program).next):
         print token

another approach is to turn the list back into a string, and wrap that 
in a StringIO object:

     import tokenize
     import StringIO

     program = [ ... program given as list ... ]

     program_buffer = StringIO.StringIO("".join(program))

     for token in tokenize.generate_tokens(program_buffer.readline):
         print token



More information about the Python-list mailing list