[docs] [issue2134] Add new attribute to TokenInfo to report specific token IDs
report at bugs.python.org
Thu Dec 15 02:13:35 CET 2011
Nick Coghlan <ncoghlan at gmail.com> added the comment:
There are a *lot* of characters with semantic significance that are reported by the tokenize module as generic "OP" tokens:
However, I can't fault tokenize for deciding to treat all of those tokens the same way - for many source code manipulation purposes, these just need to be transcribed literally, and the "OP" token serves that purpose just fine.
As the extensive test updates in the current patch suggest, AMK is also correct that changing this away from always returning "OP" tokens (even for characters with more specialised tokens available) would be a backwards incompatible change.
I think there are two parts to this problem, one documentation related (affecting 2.7, 3.2, 3.3) and another that would be an actual change in 3.3:
1. First, I think 3.3 should add an "exact_type" attribute to TokenInfo instances (without making it part of the tuple-based API). For most tokens, this would be the same as "type", but for OP tokens, it would provide the appropriate more specific token ID.
2. Second, the tokenize module documentation should state *explicitly* which tokens it collapses down into the generic "OP" token, and explain how to use the "string" attribute to recover the more detailed information.
assignee: -> docs at python
nosy: +docs at python, ncoghlan
stage: -> needs patch
title: function generate_tokens at tokenize.py yields wrong token for colon -> Add new attribute to TokenInfo to report specific token IDs
versions: +Python 2.7, Python 3.3
Python tracker <report at bugs.python.org>
More information about the docs