Where regexs listed for Python language's tokenizer/lexer?
Robert Kern
robert.kern at gmail.com
Sat Sep 12 19:07:05 EDT 2009
Dennis Lee Bieber wrote:
> On Fri, 11 Sep 2009 23:10:39 -0700 (PDT), Chris Seberino
> <cseberino at gmail.com> declaimed the following in
> gmane.comp.python.general:
>
>> Where regexs listed for Python language's tokenizer/lexer?
>>
>> If I'm not mistaken, the grammar is not sufficient to specify the
>> language....
>> you also need to specify the regexs that define the tokens
>> right?..where is that?
>>
> Pardon... I've been out of the "market", but I don't recall EVER
> seeing a "regex" used in a textbook for compiler/interpreter design.
>
> BNF (or Pascal's bubble diagram equivalent) has always been used to
> define the syntactical components in those books in my possession, and
> parsers (tokenizers) were written using those implied algorithms (if the
> first character is numeric or "." it starts a number, otherwise treat it
> as an identifier, etc.),
In actual implementations of lexers and the lexical analysis components of
parsers, regexes are fairly common. For example, from ply:
http://www.dabeaz.com/ply/ply.html#ply_nn6
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list