[Python-ideas] Hooking between lexer and parser

Neil Girdhar mistersheik at gmail.com
Sat Jun 6 20:44:38 CEST 2015


Ryan: I'm trying to figure out how the parsing library should be done — not
trying to work around other designs.
Stefan: maybe this is a better answer to your question.

So thinking about this more, this is how I think it should be done:

Each grammar rule is expressed as an Iterable.


class FileInput:
    def __init__(self):
        self.indent_level = None

    def match(self):
        while True:
            matched = yield Disjunction(
                '\n',
                [Whitespace(self.indent_level, indent=False), Statement()])
            if matched == '\n':
                break
        yield EndOfFile()


class Suite:
    def __init__(self, indent_level):
        self.indent_level = indent_level

    def match(self):
        yield Disjunction(
            SimpleStatement(),
            ['\n', Whitespace(self.indent_level, indent=True),
             Repeat(Statement)])
        # dedent is not required because the next statement knows its indent
        # level.



On Sat, Jun 6, 2015 at 9:36 AM, s.krah <stefan at bytereef.org> wrote:

>
>
> *Neil Girdhar <mistersheik at gmail.com <mistersheik at gmail.com>>* wrote:
> > Along with the grammar, you also give it code that it can execute as it
> matches each symbol in a rule.  In Python for example, as it matches each
> argument passed to a function, it would keep track of the count of *args,
> **kwargs, and  keyword arguments, and regular arguments, and then raise a
> syntax error if it encounters anything out of order.  Right now that check
> is done in validate.c, which is really annoying.
>
> Agreed.  For 3.4 it was possible to encode these particular semantics into
> the grammar
> itself, but it would no longer be LL(1).
>
> If I understood correctly, you wanted to handle lexing and parsing
> together.  How
> would the INDENT/DEDENT tokens be generated?
>
> For my private ast generator, I did the opposite: I wanted to formalize
> the token
> preprocessing step, so I have:
>
>     lexer -> parser1 (generates INDENT/DEDENT) -> parser2 (generates the
> ast directly)
>
>
> It isn't slower than what is in Python right now and you can hook into the
> token stream
> at any place.
>
>
>
> Stefan Krah
>
>
>  --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/WTFHSUbfU20/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/WTFHSUbfU20/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150606/1b9b2a9a/attachment.html>


More information about the Python-ideas mailing list