[Tutor] token parser
Kent Johnson
kent37 at tds.net
Sun Feb 11 13:54:30 CET 2007
Dj Gilcrease wrote:
> How would I go about writing a fast token parser to parse a string like
> "[4d6.takeHighest(3)+(2d6*3)-5.5]"
>
> and get a list like
> ['+',
> ['takeHighest',
> ['d',
> 4,
> 6
> ],
> 3
> ],
> ['-',
> ['*',
> ['d',
> 2,
> 6
> ],
> 3
> ],
> 5.5
> ]
> ]
>
> back? ( I put it all separated and indented like that so it is easier
> to read, it is for me anyways )
If your input is valid Python (which the above is not, 4d6 and 2d6 are
not valid identifiers) then perhaps the compiler.parse() function would
be a good starting point. It generates an abstract syntax tree which you
could perhaps transform into the format you want:
In [13]: import compiler
In [19]: compiler.parse("[d6.takeHighest(3)+(d6*3)-5.5]")
Out[19]: Module(None,
Stmt([Discard(List([Sub((Add((CallFunc(Getattr(Name('d6'),
'takeHighest'), [Const(3)], None, None), Mul((Name('d6'), Const(3))))
), Const(5.5)))]))]))
If this doesn't work for you, then I would look to one of the many
parser-generator packages available for Python. I don't know which is
fastest; I have found pyparsing and PLY to be fairly easy to use.
pyparsing comes with a lot of examples which might help you get started.
Here are some summaries of the options:
http://www.nedbatchelder.com/text/python-parsers.html
http://wiki.python.org/moin/LanguageParsing
http://radio.weblogs.com/0100945/2004/04/24.html
Here is an article that gives some examples:
http://www.rexx.com/~dkuhlman/python_201/python_201.html#SECTION007000000000000000000
http://www-128.ibm.com/developerworks/linux/library/l-cpdpars.html?ca=dgr-lnxw02DParser
and the references in the above
Kent
More information about the Tutor
mailing list