[docs] [issue2134] Add new attribute to TokenInfo to report specific token IDs

Nick Coghlan report at bugs.python.org
Thu Dec 15 02:13:35 CET 2011


Nick Coghlan <ncoghlan at gmail.com> added the comment:

There are a *lot* of characters with semantic significance that are reported by the tokenize module as generic "OP" tokens:

token.LPAR
token.RPAR
token.LSQB
token.RSQB
token.COLON
token.COMMA
token.SEMI
token.PLUS
token.MINUS
token.STAR
token.SLASH
token.VBAR
token.AMPER
token.LESS
token.GREATER
token.EQUAL
token.DOT
token.PERCENT
token.BACKQUOTE
token.LBRACE
token.RBRACE
token.EQEQUAL
token.NOTEQUAL
token.LESSEQUAL
token.GREATEREQUAL
token.TILDE
token.CIRCUMFLEX
token.LEFTSHIFT
token.RIGHTSHIFT
token.DOUBLESTAR
token.PLUSEQUAL
token.MINEQUAL
token.STAREQUAL
token.SLASHEQUAL
token.PERCENTEQUAL
token.AMPEREQUAL
token.VBAREQUAL
token.CIRCUMFLEXEQUAL
token.LEFTSHIFTEQUAL
token.RIGHTSHIFTEQUAL
token.DOUBLESTAREQUAL¶
token.DOUBLESLASH
token.DOUBLESLASHEQUAL
token.AT

However, I can't fault tokenize for deciding to treat all of those tokens the same way - for many source code manipulation purposes, these just need to be transcribed literally, and the "OP" token serves that purpose just fine.

As the extensive test updates in the current patch suggest, AMK is also correct that changing this away from always returning "OP" tokens (even for characters with more specialised tokens available) would be a backwards incompatible change.

I think there are two parts to this problem, one documentation related (affecting 2.7, 3.2, 3.3) and another that would be an actual change in 3.3:

1. First, I think 3.3 should add an "exact_type" attribute to TokenInfo instances (without making it part of the tuple-based API). For most tokens, this would be the same as "type", but for OP tokens, it would provide the appropriate more specific token ID.

2. Second, the tokenize module documentation should state *explicitly* which tokens it collapses down into the generic "OP" token, and explain how to use the "string" attribute to recover the more detailed information.

----------
assignee:  -> docs at python
components: +Documentation
nosy: +docs at python, ncoghlan
stage:  -> needs patch
title: function generate_tokens at tokenize.py yields wrong token for colon -> Add new attribute to TokenInfo to report specific token IDs
versions: +Python 2.7, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2134>
_______________________________________


More information about the docs mailing list