Where can I find a lexical spec of python?

Shaofei Cheng csf178 at 163.com
Wed Sep 21 20:01:14 CEST 2011

Yes, I'm using this document now but I was wondering if there is a formal spec for lexical grammar?  It looks like some part of the doc "http://docs.python.org/py3k/reference/grammar.html"  is missing. 
We can find some replacement in lexical_analysis.html but it seems this document is write for a python user instead of a guy trying to implement python.
在 2011-09-22 00:55:45,"Thomas Jollans" <t at jollybox.de> 写道:
>On 21/09/11 18:33, 程劭非 wrote:
>> Thanks Thomas.
>> I've read the document http://docs.python.org/py3k/reference/lexical_analysis.html 
>> but I worried it might leak some language features like "tab magic".
>> For I'm working on a parser with JavaScript I need a more strictly defined spec. 
>> Currently I have a highlighter here ->http://shaofei.name/python/PyHighlighter.html
>> (Also the lexer  http://shaofei.name/python/PyLexer.html)
>> As you can see, I just make its behavior align with CPython, but I'm not sure what the real python lexical grammar is like.
>> Does anyone know if there is a lexical grammar spec like other languages(e.g. http://bclary.com/2004/11/07/#annex-a)?
>I believe the language documentation on docs.python.org is all the
>documentation of the language there is. It may not be completely formal,
>and in parts it concentrates not on the actual rules but on the original
>implementation, but, as far as I can tell, it tells you everything you
>need to know to write a new parser for the Python language, without any
>You appear to be anxious about implementing the indentation mechanism
>correctly. The language documentation describes a behaviour precisely.
>What is the problem?
>> Please help me. Thanks a lot.
>> 在 2011-09-21 19:41:33,"Thomas Jollans" <t at jollybox.de> 写道:
>>> On 21/09/11 11:44, 程劭非 wrote:
>>>> Hi, everyone, 
>>>> I've found there was several tokens used in python's
>>>> grammar(http://docs.python.org/reference/grammar.html) but I didn't see
>>>> their definition anywhere.  The tokens listed here: 
>>> They should be documented in
>>> http://docs.python.org/py3k/reference/lexical_analysis.html - though
>>> apparently not using these exact terms.
>>> Trivial: U+000A
>>> End of file.
>>>> NAME
>>> documented as "identifier" in 2.3
>>> Documented in 2.1.8.
>>> Documented in 2.4.3 - 2.4.6
>>> Documented in 2.4.2
>>>> I've got some infomations from the source
>>>> code(http://svn.python.org/projects/python/trunk/Parser/tokenizer.c) but
>>>> I'm not sure which feature is only for this specified implementaion.  (I
>>>> saw tabstop could be modified with comments using "tab-width:",
>>>> ":tabstop=", ":ts=" or "set tabsize=", is this feature really in spec?)
>>> That sounds like a legacy feature that is no longer used. Somebody
>>> familiar with the early history of Python might be able to shed more
>>> light on the situation. It is inconsisten with the spec (section 2.1.8):
>>> """
>>> Indentation is rejected as inconsistent if a source file mixes tabs and
>>> spaces in a way that makes the meaning dependent on the worth of a tab
>>> in spaces; a TabError is raised in that case.
>>> """
>>> - Thomas
>>> -- 
>>> http://mail.python.org/mailman/listinfo/python-list

More information about the Python-list mailing list