Where regexs listed for Python language's tokenizer/lexer?

Duncan Booth duncan.booth at invalid.invalid
Sat Sep 12 17:12:48 EDT 2009


Paul McGuire <ptmcg at austin.rr.com> wrote:

> On Sep 12, 1:10 am, Chris Seberino <cseber... at gmail.com> wrote:
>> Where regexs listed for Python language's tokenizer/lexer?
>>
>> If I'm not mistaken, the grammar is not sufficient to specify the
>> language....
>> you also need to specify the regexs that define the tokens
>> right?..where is that?
> 
> I think the OP is asking for the regexs that define the terminals
> referenced in the Python grammar, similar to those found in yacc token
> definitions.  He's not implying that there are regexs that implement
> the whole grammar.
> 

The OP should read 
http://www.python.org/doc/current/reference/lexical_analysis.html

That has the BNF for those tokens that can be defined by BNF 
such as identifiers, numbers and strings. The tricky bit is that INDENT and 
DEDENT tokens depend on the context: see section 2.1.8.



More information about the Python-list mailing list