Where can I find a lexical spec of python?

程劭非 csf178 at 163.com
Wed Sep 21 18:33:03 CEST 2011


Thanks Thomas.
I've read the document http://docs.python.org/py3k/reference/lexical_analysis.html 

but I worried it might leak some language features like "tab magic".

For I'm working on a parser with JavaScript I need a more strictly defined spec. 

Currently I have a highlighter here ->http://shaofei.name/python/PyHighlighter.html
(Also the lexer  http://shaofei.name/python/PyLexer.html)

As you can see, I just make its behavior align with CPython, but I'm not sure what the real python lexical grammar is like.

Does anyone know if there is a lexical grammar spec like other languages(e.g. http://bclary.com/2004/11/07/#annex-a)?

Please help me. Thanks a lot.
在 2011-09-21 19:41:33,"Thomas Jollans" <t at jollybox.de> 写道:
>On 21/09/11 11:44, 程劭非 wrote:
>> Hi, everyone, 
>> I've found there was several tokens used in python's
>> grammar(http://docs.python.org/reference/grammar.html) but I didn't see
>> their definition anywhere.  The tokens listed here: 
>
>They should be documented in
>http://docs.python.org/py3k/reference/lexical_analysis.html - though
>apparently not using these exact terms.
>
>> NEWLINE
>Trivial: U+000A
>
>> ENDMARKER
>End of file.
>
>> NAME
>documented as "identifier" in 2.3
>
>> INDENT
>> DEDENT
>Documented in 2.1.8.
>
>> NUMBER
>Documented in 2.4.3 - 2.4.6
>
>> STRING
>Documented in 2.4.2
>
>> I've got some infomations from the source
>> code(http://svn.python.org/projects/python/trunk/Parser/tokenizer.c) but
>> I'm not sure which feature is only for this specified implementaion.  (I
>> saw tabstop could be modified with comments using "tab-width:",
>> ":tabstop=", ":ts=" or "set tabsize=", is this feature really in spec?)
>
>That sounds like a legacy feature that is no longer used. Somebody
>familiar with the early history of Python might be able to shed more
>light on the situation. It is inconsisten with the spec (section 2.1.8):
>
>"""
>Indentation is rejected as inconsistent if a source file mixes tabs and
>spaces in a way that makes the meaning dependent on the worth of a tab
>in spaces; a TabError is raised in that case.
>"""
>
>- Thomas
>-- 
>http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list