[Tutor] Tokenizing Help
William Witteman
yam at nerd.cx
Wed Apr 22 20:35:29 CEST 2009
I need to be able to decompose a formatted text file into identifiable,
possibly named pieces. To tokenize it, in other words. There seem to
be a vast array of modules to do this with (simpleparse, pyparsing etc)
but I cannot understand their documentation.
The file format I am looking at (it is a bibliographic reference file)
looks like this:
<1> # the references are enumerated
AU - some text
perhaps across lines
AB - some other text
AB - there may be multiples of some fields
UN - any 2-letter combination may exist, other than by exhaustion, I
cannot anticipate what will be found
What I am looking for is some help to get started, either with
explaining the implementation of one of the modules with respect to my
format, or with an approach that I could use from the base library.
Thanks.
--
yours,
William
More information about the Tutor
mailing list