Looking for very simple general purpose tokenizer

python at g2swaroop.net python at g2swaroop.net
Mon Jan 19 15:39:21 CET 2004

    Maybe the Spark module can help you.
Here's a DeveloperWorks tutorial on that :


Hope it helps,

Hi group,

I need to parse various text files in python. I was wondering if there was a
general purpose tokenizer available. I know about split(), but this
(otherwise very handy method does not allow me to specify a list of
splitting characters, only one at the time and it removes my splitting
operators (OK for spaces and \n's but not for =, / etc. Furthermore I tried
tokenize but this specifically for Python and is way too heavy for me. I am
looking for something like this:

splitchars = [' ', '\n', '=', '/', ....]
tokenlist = tokenize(rawfile, splitchars)

Is there something like this available inside Python or did anyone already
make this? Thank you in advance

Maarten van Reeuwijk                        Heat and Fluid Sciences
Phd student                             dept. of Multiscale Physics
www.ws.tn.tudelft.nl                 Delft University of Technology

More information about the Python-list mailing list