[issue40678] Full list of Python lexical rules

New submission from Ram Rachum <ram@rachum.com>: I'm a noob on parsing, learning about it, so it's possible I've made a mistake somewhere. I know there's this page: https://docs.python.org/3/reference/grammar.html Which is a full listing of Python's grammar. However, looking at this page: https://docs.python.org/3/reference/lexical_analysis.html I see rules that aren't written there, like longstringitem. I'm guessing that's because these are lexing rules, while the former was a list of parsing rules? If that's the case, shouldn't there also be a full, authoritative list of Python's lexical rules? Possibly alongside the parsing rules? ---------- assignee: docs@python components: Documentation messages: 369320 nosy: cool-RR, docs@python, georg.brandl, gvanrossum priority: normal severity: normal status: open title: Full list of Python lexical rules type: enhancement versions: Python 3.6, Python 3.7, Python 3.8, Python 3.9 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40678> _______________________________________

Terry J. Reedy <tjreedy@udel.edu> added the comment: First note that 3.8.3 grammar.html is stated to be the actual grammar used by the old parser, and is a bit different from the more human readable grammar given in the reference manual. It is a bit different in 3.9 and I expect will be much more different in 3.10 with the new PEG parser. In the grammar, the CAPITALIZED_NAMES are token names returned by the tokenizer/lexer. This is a standard convention. I am pretty sure that the human readable lexing rules in lexical_analysis are not what the lexer uses. I presume the latter uses barely readable RE expressions, as does the tokenize module. Compare the float grammar in https://docs.python.org/3/reference/lexical_analysis.html#floating-point-lit... to the float REs in tokenize.py. def group(*choices): return '(' + '|'.join(choices) + ')' def maybe(*choices): return group(*choices) + '?' # The above are reused for multiple REs. Exponent = r'[eE][-+]?[0-9](?:_?[0-9])*' Pointfloat = group(r'[0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?', r'\.[0-9](?:_?[0-9])*') + maybe(Exponent) Expfloat = r'[0-9](?:_?[0-9])*' + Exponent Floatnumber = group(Pointfloat, Expfloat) Note that this is (python) code, not a text specification. You or someone else can look at what the C lexer does. But I think that the proposal should be rejected. ---------- nosy: +terry.reedy _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40678> _______________________________________

Change by Terry J. Reedy <tjreedy@udel.edu>: ---------- versions: +Python 3.10 -Python 3.6, Python 3.7, Python 3.8, Python 3.9 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40678> _______________________________________

Ram Rachum <ram@rachum.com> added the comment: Hmm, I feel this isn't right, because I still feel like there should be one place where one can see the full Python syntax specification, lexing and parsing and all. But I'm underqualified to argue because I don't understand the details. Is someone more knowledgeable interested in arguing this point? ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40678> _______________________________________

Terry J. Reedy <tjreedy@udel.edu> added the comment: What you literally seem to ask for does not exist. If you want to pursue this, I suggest posting to python-ideas and you might get support for an acceptable alternative. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40678> _______________________________________

Ram Rachum <ram@rachum.com> added the comment: I understand, thank you. ---------- stage: -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue40678> _______________________________________
participants (2)
-
Ram Rachum
-
Terry J. Reedy