[Tutor] Tokenizing Help
Paul McGuire
ptmcg at austin.rr.com
Thu Apr 23 16:24:16 CEST 2009
For the given test case, this pyparsing sample parses the data, without
having to anticipate all the possible 2-letter keys.
from pyparsing import *
integer = Word(nums)
DASH = Literal('-').suppress()
LT = Literal('<').suppress()
GT = Literal('>').suppress()
entrynum = LT + integer + GT
keycode = Word(alphas.upper(),exact=2)
key = GoToColumn(1).suppress() + keycode + DASH
data = Group(key("key") + Empty() + SkipTo(key | entrynum |
StringEnd())("value"))
entry = entrynum("refnum") + OneOrMore(data)("data")
for e in entry.searchString(test):
print e.refnum
for dd in e.data:
print dd.key,':', dd.value
print
Prints:
['567']
['AU'] : Bibliographical Theory and Practice - Volume 1 - The AU - Tag
and its applications
['AB'] : Texts in Library Science
['568']
['AU'] : Bibliographical Theory and Practice - Volume 2 - The
['AB'] : Tag and its applications
['AB'] : Texts in Library Science
['569']
['AU'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU -
['AU'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU
['AB'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU -
['AU'] : AU - AU - AU - AU - AU - AU - AU - AU - AU - AU
['ZZ'] : Somewhat nonsensical case
If you find that you have to also accept keycodes that consist of a capital
letter followed by a numeric digit (like "B7"), modify the keycode
definition to be:
keycode = Word(alphas.upper(), alphanums.upper(), exact=2)
-- Paul
More information about the Tutor
mailing list