[Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's

Michael Foord fuzzyman at voidspace.org.uk
Tue Sep 28 13:55:25 CEST 2010

On 28 September 2010 12:29, Michael Foord <fuzzyman at voidspace.org.uk> wrote:

>  On 28/09/2010 12:19, Antoine Pitrou wrote:
>> On Mon, 27 Sep 2010 23:45:45 -0400
>> Steve Holden<steve at holdenweb.com>  wrote:
>>> On 9/27/2010 11:27 PM, Benjamin Peterson wrote:
>>>> 2010/9/27 Meador Inge<meadori at gmail.com>:
>>>>> which, as seen in the trace, is because the 'detect_encoding' function
>>>>> in
>>>>> 'Lib/tokenize.py' searches for 'BOM_UTF8' (a 'bytes' object) in the
>>>>> string
>>>>> to tokenize 'first' (a 'str' object).  It seems to me that strings
>>>>> should
>>>>> still be able to be tokenized, but maybe I am missing something.
>>>>> Is the implementation of 'detect_encoding' correct in how it attempts
>>>>> to
>>>>> determine an encoding or should I open an issue for this?
>>>> Tokenize only works on bytes. You can open a feature request if you
>>>> desire.
>>>>  Working only on bytes does seem rather perverse.
>> I agree, the morality of bytes objects could have been better :)
>>  The reason for working with bytes is that source data can only be
> correctly decoded to text once the encoding is known. The encoding is
> determined by reading the encoding cookie.
> I certainly wouldn't be opposed to an API that accepts a string as well
> though.
Ah, and to explain the design decision when tokenize was ported to py3k -
the Python 2 APIs take the readline method of a file object (not a string).


For this to work correctly in Python 3 it *has* to be a file object open in
binary read mode in order to decode the source code correctly.

A new API that takes a string would certainly be nice. The Python 2 API for
tokenize is 'interesting'...

All the best,

Michael Foord

> All the best,
> Michael
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
> --
> http://www.ironpythoninaction.com/
> http://www.voidspace.org.uk/blog
> READ CAREFULLY. By accepting and reading this email you agree, on behalf of
> your employer, to release me from all obligations and waivers arising from
> any and all NON-NEGOTIATED agreements, licenses, terms-of-service,
> shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure,
> non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have
> entered into with your employer, its partners, licensors, agents and
> assigns, in perpetuity, without prejudice to my ongoing rights and
> privileges. You further represent that you have the authority to release me
> from any BOGUS AGREEMENTS on behalf of your employer.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100928/93bf7ca9/attachment.html>

More information about the Python-Dev mailing list