Where can I find a lexical spec of python?

Wed Sep 21 07:41:33 EDT 2011

On 21/09/11 11:44, 程劭非 wrote:
> Hi, everyone, 
> I've found there was several tokens used in python's
> grammar(http://docs.python.org/reference/grammar.html) but I didn't see
> their definition anywhere.  The tokens listed here: 

They should be documented in
http://docs.python.org/py3k/reference/lexical_analysis.html - though
apparently not using these exact terms.

> NEWLINE
Trivial: U+000A

> ENDMARKER
End of file.

> NAME
documented as "identifier" in 2.3

> INDENT
> DEDENT
Documented in 2.1.8.

> NUMBER
Documented in 2.4.3 - 2.4.6

> STRING
Documented in 2.4.2

> I've got some infomations from the source
> code(http://svn.python.org/projects/python/trunk/Parser/tokenizer.c) but
> I'm not sure which feature is only for this specified implementaion.  (I
> saw tabstop could be modified with comments using "tab-width:",
> ":tabstop=", ":ts=" or "set tabsize=", is this feature really in spec?)

That sounds like a legacy feature that is no longer used. Somebody
familiar with the early history of Python might be able to shed more
light on the situation. It is inconsisten with the spec (section 2.1.8):

"""
Indentation is rejected as inconsistent if a source file mixes tabs and
spaces in a way that makes the meaning dependent on the worth of a tab
in spaces; a TabError is raised in that case.
"""

- Thomas