[Python-Dev] What does a double coding cookie mean?

Wed Mar 16 21:54:02 EDT 2016

On 3/16/2016 5:29 PM, Guido van Rossum wrote:
> I've updated the PEP. Please review. I decided not to update the
> Unicode howto (the thing is too obscure). Serhiy, you're probably in a
> better position to fix the code looking for cookies to pick the first
> one if there are two on the same line (or do whatever you think should
> be done there).
>
> Should we recommend that everyone use tokenize.detect_encoding()?
>
> On Wed, Mar 16, 2016 at 5:05 PM, Guido van Rossum <guido at python.org> wrote:
>> On Wed, Mar 16, 2016 at 12:59 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> The only reason to read up to two lines was to address the use of
>>> the shebang on Unix, not to be able to define two competing
>>> source code encodings :-)
>> I know. I was just surprised that the PEP was sufficiently vague about
>> it that when I found that mypy picked the second if there were two, I
>> couldn't prove to myself that it was violating the PEP. I'd rather
>> clarify the PEP than rely on the reasoning presented earlier here.

Oh sure.  Updating the PEP is the best way forward. But the reasoning, 
although from somewhat vague specifications, seems sound enough to 
declare that it meant "find the first cookie in the first two lines".

Which is what you've said in the update, although not quite that 
tersely.  It now leaves no room for ambiguous interpretations.

>>
>> I don't like erroring out when there are two different cookies on two
>> lines; I feel that the spirit of the PEP is to read up to two lines
>> until a cookie is found, whichever comes first.

The only reason for an error would be to alert people that had depended 
on the bugs, or misinterpretations.

Personally, I think if they haven't converted to UTF-8 by now, they've 
got bigger problems than this change.
>>
>> I will update the regex in the PEP too (or change the wording to avoid "match").
>>
>> I'm not sure what to do if there are two cooking on one line. If
>> CPython currently picks the latter we may want to preserve that
>> behavior.
>>
>> Should we recommend that everyone use tokenize.detect_encoding()?
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160316/37a300fb/attachment.html>