Regex: Parsing Lisp with Python
Mike C. Fletcher
mcfletch at rogers.com
Thu Aug 8 22:51:13 CEST 2002
Not what the original poster wanted, but oh well...
This appears to be a functional LISP parser using the CVS version of
SimpleParse 2.0.0 (though it's been so long since I used LISP I can't be
sure it's entirely correct. As an example, the results of parsing:
'''("this\n\r" ' those (+ a b) (23s 0xa3 55.3) "s")'''
(specified as a Python string) is as follows:
[('string_double_quote', 1, 9, [('char_no_quote', 2, 8, )]),
('quote', 10, 11, ),
('name', 12, 17, ),
[('name', 19, 20, ), ('name', 21, 22, ), ('name', 23, 24, )]
[('name', 27, 30, ),
[('hex', 31, 35, [('hexdigits', 33, 35, )])])]),
[('int_unsigned', 36, 38, ),
[('int_unsigned', 39, 40, )])])])])])]),
('string_double_quote', 42, 45, [('char_no_quote', 43, 44, )])])]
"""Basic LISP parser adapted from the YAPPS documentation's sample
We use shortcuts, so we get " strings, float, int, and hex
atoms, as well as regular list objects. Note: Lisp doesn't
appear to use , for seperating atoms in lists, not sure if
that's just a feature of the YAPPS version or not.
definition = r"""
### A LISP parser based on a parser in YAPPS documentation
<ts> := [ \t\n\r]*
<nameChar> := [-+*/!@%^&=.a-zA-Z0-9_]
quote := "'"
name := nameChar+
>atom< := quote / string_double_quote / list / number_expr / name
# numbers are regular number values followed
# by something that is _not_ a nameCharacter
number_expr := number, ?-(nameChar)
list := "(", seq?, ")"
>seq< := ts, atom, (ts,atom)*, ts
from simpleparse.parser import Parser
from simpleparse.common import strings, numbers
from simpleparse.dispatchprocessor import *
parser = Parser( definition, 'atom' )
Paul Rubin wrote:
> Thomas Guettler <zopestoller at thomas-guettler.de> writes:
>>I tried it like this, but this gives me all tokens
>>serialized. It is hard to get the second symbol without
>>counting all open and close tokens. Is there a way to get
>>the tokens in nested lists?
> No there's no way to do that with traditional regexps.
> You have to parse the s-expressions. Normally you do that with
> recursion: on seeing an open-paren, parse additional s-expressions
> til you see a close-paren, and make a list of them.
> You might look at source code of some lisp interpreters to see how
> this works. SIOD (Scheme In One Day) is a nice simple one written in
> C, that you can probably find on Google.
Mike C. Fletcher
Designer, VR Plumber, Coder
More information about the Python-list