Is it possible to view tokenizer output?

Hi, I'm just getting into the CPython codebase just for fun, and I've just started messing around with the tokenizer and the grammar. I was wondering, is there a way to just print out the results of the tokenizer (as in just the stream of tokens it generates) in a human readable format? It would be really helpful for debugging. Hope the question's not too basic. Cheers.

Le 30/05/2022 à 00:59, Jack a écrit :
Hi, I'm just getting into the CPython codebase just for fun, and I've just started messing around with the tokenizer and the grammar. I was wondering, is there a way to just print out the results of the tokenizer (as in just the stream of tokens it generates) in a human readable format? It would be really helpful for debugging. Hope the question's not too basic.
python -m tokenize file.py ? See https://docs.python.org/3/library/tokenize.html#command-line-usage Cheers, Jean

Thanks! I didn't even know about that module. Does this take into account your local changes to the tokenizer, though? I've added a new token type to Grammar/Tokens, and some code to tokenizer.c to return that token type in appropriate circumstances. I've stepped through the tokenizer in the debugger, so I /think /it's working. When I run -m tokenize as you suggest, I don't see my custom token type. The devguide mentions that "|Lib/tokenize.py| needs changes to match changes to the tokenizer.", so I'm guessing I would have to manually repeat my changes in tokenize.py to see them, right? But what I want to see is what tokenizer.c is producing when my newly built Python binary actually reads a file. On 30/05/2022 00:09, Jean Abou Samra wrote:
Le 30/05/2022 à 00:59, Jack a écrit :
Hi, I'm just getting into the CPython codebase just for fun, and I've just started messing around with the tokenizer and the grammar. I was wondering, is there a way to just print out the results of the tokenizer (as in just the stream of tokens it generates) in a human readable format? It would be really helpful for debugging. Hope the question's not too basic.
python -m tokenize file.py
?
See https://docs.python.org/3/library/tokenize.html#command-line-usage
Cheers, Jean

python -m tokenize < file-to-parse.py See the comment at the top of tokenize.py. IIRC, it re-implements the tokenizer, it does not call the one used for python code. Eric On 5/29/2022 6:59 PM, Jack wrote:
Hi, I'm just getting into the CPython codebase just for fun, and I've just started messing around with the tokenizer and the grammar. I was wondering, is there a way to just print out the results of the tokenizer (as in just the stream of tokens it generates) in a human readable format? It would be really helpful for debugging. Hope the question's not too basic.
Cheers.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2ZTZBAN5... Code of Conduct: http://python.org/psf/codeofconduct/

Well, I just stuck a print statement in _PyTokenizer_Get() and it's done the job for me, right now. Thanks, Jack On 30/05/2022 00:36, Eric V. Smith wrote:
python -m tokenize < file-to-parse.py
See the comment at the top of tokenize.py. IIRC, it re-implements the tokenizer, it does not call the one used for python code.
Eric
On 5/29/2022 6:59 PM, Jack wrote:
Hi, I'm just getting into the CPython codebase just for fun, and I've just started messing around with the tokenizer and the grammar. I was wondering, is there a way to just print out the results of the tokenizer (as in just the stream of tokens it generates) in a human readable format? It would be really helpful for debugging. Hope the question's not too basic.
Cheers.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2ZTZBAN5... Code of Conduct: http://python.org/psf/codeofconduct/

On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
python -m tokenize < file-to-parse.py
See the comment at the top of tokenize.py. IIRC, it re-implements the tokenizer, it does not call the one used for python code.
Ah right, I would be surprised that there would be a public Python API to get the tokenizer output, since there is no public C API for that :-) I just removed <token.h> header file since it was never usable outside Python C internals: there is no public C API to just run the tokenizer and gets its output. Victor

There is no *public* one but there is a private one accesible from Python I added for testing purposes. On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:
On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
python -m tokenize < file-to-parse.py
See the comment at the top of tokenize.py. IIRC, it re-implements the tokenizer, it does not call the one used for python code.
Ah right, I would be surprised that there would be a public Python API to get the tokenizer output, since there is no public C API for that :-)
I just removed <token.h> header file since it was never usable outside Python C internals: there is no public C API to just run the tokenizer and gets its output.
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSP... Code of Conduct: http://python.org/psf/codeofconduct/

Hi Pablo, could you clarify please? Is that on the main branch, or would you be willing to share the code? On 30/05/2022 16:23, Pablo Galindo Salgado wrote:
There is no *public* one but there is a private one accesible from Python I added for testing purposes.
On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:
On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote: > python -m tokenize < file-to-parse.py > > See the comment at the top of tokenize.py. IIRC, it re-implements the > tokenizer, it does not call the one used for python code.
Ah right, I would be surprised that there would be a public Python API to get the tokenizer output, since there is no public C API for that :-)
I just removed <token.h> header file since it was never usable outside Python C internals: there is no public C API to just run the tokenizer and gets its output.
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSP... Code of Conduct: http://python.org/psf/codeofconduct/

Is on the main branch but as I mentioned is **exclusively** for internal consumption: https://github.com/python/cpython/blob/8136606769661c103c46d142e52ecbbbb8880... On Mon, 30 May 2022 at 17:37, Jack <jack.jjmillist@gmail.com> wrote:
Hi Pablo, could you clarify please? Is that on the main branch, or would you be willing to share the code? On 30/05/2022 16:23, Pablo Galindo Salgado wrote:
There is no *public* one but there is a private one accesible from Python I added for testing purposes.
On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:
On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
python -m tokenize < file-to-parse.py
See the comment at the top of tokenize.py. IIRC, it re-implements the tokenizer, it does not call the one used for python code.
Ah right, I would be surprised that there would be a public Python API to get the tokenizer output, since there is no public C API for that :-)
I just removed <token.h> header file since it was never usable outside Python C internals: there is no public C API to just run the tokenizer and gets its output.
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSP... Code of Conduct: http://python.org/psf/codeofconduct/

You should maybe move the code out of the stdlib (to tests?) if it should not be used. Otherwise, someone somehow will start to rely on it, and then complain when it breaks :-) Victor On Mon, May 30, 2022 at 6:51 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
Is on the main branch but as I mentioned is **exclusively** for internal consumption:
https://github.com/python/cpython/blob/8136606769661c103c46d142e52ecbbbb8880...
On Mon, 30 May 2022 at 17:37, Jack <jack.jjmillist@gmail.com> wrote:
Hi Pablo, could you clarify please? Is that on the main branch, or would you be willing to share the code?
On 30/05/2022 16:23, Pablo Galindo Salgado wrote:
There is no *public* one but there is a private one accesible from Python I added for testing purposes.
On Mon, 30 May 2022, 15:17 Victor Stinner, <vstinner@python.org> wrote:
On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
python -m tokenize < file-to-parse.py
See the comment at the top of tokenize.py. IIRC, it re-implements the tokenizer, it does not call the one used for python code.
Ah right, I would be surprised that there would be a public Python API to get the tokenizer output, since there is no public C API for that :-)
I just removed <token.h> header file since it was never usable outside Python C internals: there is no public C API to just run the tokenizer and gets its output.
Victor _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CT3YSWSP... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UXPSZFOK... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
participants (5)
-
Eric V. Smith
-
Jack
-
Jean Abou Samra
-
Pablo Galindo Salgado
-
Victor Stinner