Is it possible to view tokenizer output?
data:image/s3,"s3://crabby-images/885a2/885a23b6d4d9ee6100e7ff88e722a3d733ac5dbf" alt=""
Hi, I'm just getting into the CPython codebase just for fun, and I've just started messing around with the tokenizer and the grammar. I was wondering, is there a way to just print out the results of the tokenizer (as in just the stream of tokens it generates) in a human readable format? It would be really helpful for debugging. Hope the question's not too basic. Cheers.
data:image/s3,"s3://crabby-images/7a945/7a945492421562b3b67e834d6a6c5e4a9138ed07" alt=""
Le 30/05/2022 à 00:59, Jack a écrit :
python -m tokenize file.py ? See https://docs.python.org/3/library/tokenize.html#command-line-usage Cheers, Jean
data:image/s3,"s3://crabby-images/885a2/885a23b6d4d9ee6100e7ff88e722a3d733ac5dbf" alt=""
Thanks! I didn't even know about that module. Does this take into account your local changes to the tokenizer, though? I've added a new token type to Grammar/Tokens, and some code to tokenizer.c to return that token type in appropriate circumstances. I've stepped through the tokenizer in the debugger, so I /think /it's working. When I run -m tokenize as you suggest, I don't see my custom token type. The devguide mentions that "|Lib/tokenize.py| needs changes to match changes to the tokenizer.", so I'm guessing I would have to manually repeat my changes in tokenize.py to see them, right? But what I want to see is what tokenizer.c is producing when my newly built Python binary actually reads a file. On 30/05/2022 00:09, Jean Abou Samra wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
Ah right, I would be surprised that there would be a public Python API to get the tokenizer output, since there is no public C API for that :-) I just removed <token.h> header file since it was never usable outside Python C internals: there is no public C API to just run the tokenizer and gets its output. Victor
data:image/s3,"s3://crabby-images/8aca7/8aca7e22be08ab16930a56176dfa4ee2085cde7b" alt=""
Is on the main branch but as I mentioned is **exclusively** for internal consumption: https://github.com/python/cpython/blob/8136606769661c103c46d142e52ecbbbb8880... On Mon, 30 May 2022 at 17:37, Jack <jack.jjmillist@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
You should maybe move the code out of the stdlib (to tests?) if it should not be used. Otherwise, someone somehow will start to rely on it, and then complain when it breaks :-) Victor On Mon, May 30, 2022 at 6:51 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
-- Night gathers, and now my watch begins. It shall not end until my death.
data:image/s3,"s3://crabby-images/7a945/7a945492421562b3b67e834d6a6c5e4a9138ed07" alt=""
Le 30/05/2022 à 00:59, Jack a écrit :
python -m tokenize file.py ? See https://docs.python.org/3/library/tokenize.html#command-line-usage Cheers, Jean
data:image/s3,"s3://crabby-images/885a2/885a23b6d4d9ee6100e7ff88e722a3d733ac5dbf" alt=""
Thanks! I didn't even know about that module. Does this take into account your local changes to the tokenizer, though? I've added a new token type to Grammar/Tokens, and some code to tokenizer.c to return that token type in appropriate circumstances. I've stepped through the tokenizer in the debugger, so I /think /it's working. When I run -m tokenize as you suggest, I don't see my custom token type. The devguide mentions that "|Lib/tokenize.py| needs changes to match changes to the tokenizer.", so I'm guessing I would have to manually repeat my changes in tokenize.py to see them, right? But what I want to see is what tokenizer.c is producing when my newly built Python binary actually reads a file. On 30/05/2022 00:09, Jean Abou Samra wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
On Mon, May 30, 2022 at 1:40 AM Eric V. Smith <eric@trueblade.com> wrote:
Ah right, I would be surprised that there would be a public Python API to get the tokenizer output, since there is no public C API for that :-) I just removed <token.h> header file since it was never usable outside Python C internals: there is no public C API to just run the tokenizer and gets its output. Victor
data:image/s3,"s3://crabby-images/8aca7/8aca7e22be08ab16930a56176dfa4ee2085cde7b" alt=""
Is on the main branch but as I mentioned is **exclusively** for internal consumption: https://github.com/python/cpython/blob/8136606769661c103c46d142e52ecbbbb8880... On Mon, 30 May 2022 at 17:37, Jack <jack.jjmillist@gmail.com> wrote:
data:image/s3,"s3://crabby-images/f2cb6/f2cb6403da92e69ee6cc8c3fb58b22cdceb03681" alt=""
You should maybe move the code out of the stdlib (to tests?) if it should not be used. Otherwise, someone somehow will start to rely on it, and then complain when it breaks :-) Victor On Mon, May 30, 2022 at 6:51 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
-- Night gathers, and now my watch begins. It shall not end until my death.
participants (5)
-
Eric V. Smith
-
Jack
-
Jean Abou Samra
-
Pablo Galindo Salgado
-
Victor Stinner