[Python-Dev] eval and triple quoted strings

Tue Jun 18 02:06:03 CEST 2013

On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2013/6/17 Guido van Rossum <guido at python.org>:
>> On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin at python.org> wrote:
>>> 2013/6/17 Greg Ewing <greg.ewing at canterbury.ac.nz>:
>>>> Guido van Rossum wrote:
>>>>>
>>>>> No. Executing a file containing those exact characters produces a
>>>>> string containing only '\n' and exec/eval is meant to behave the same
>>>>> way. The string may not have originated from a file, so the universal
>>>>> newlines behavior of the io module is irrelevant here -- the parser
>>>>> must implement its own equivalent processing, and it does.
>>>>
>>>>
>>>> I'm still not convinced that this is necessary or desirable
>>>> behaviour. I can understand the parser doing this as a
>>>> workaround before we had universal newlines, but now that
>>>> we do, I'd expect any Python string to already have newlines
>>>> converted to their canonical representation, and that any CRs
>>>> it contains are meant to be there. The parser shouldn't need
>>>> to do newline translation a second time.
>>>
>>> It used to be that way until 2.7. People like to do things like
>>>
>>>     with open("myfile.py", "rb") as fp:
>>>         exec fp.read() in ns
>>>
>>> which used to fail with CRLF newlines because binary mode doesn't have
>>> them. I think this is actually the correct way to execute Python
>>> sources because the parser then handles the somewhat complicated
>>> process of decoding Python source for you.
>>
>> What exactly does the parser handles better than the io module? Is it
>> just the coding cookies? I suppose that works as long as the file is
>> encoded using as ASCII superset like the Latin-N variants or UTF-8. It
>> would fail pretty badly if it was UTF-16 (and yes, that's an
>> abominable encoding for other reasons :-).
>
> The coding cookie is the main one. In fact, if you can't parse that,
> you don't really know what encoding to open the file with at all.
> There's also small things like BOM handling (you have to use the
> utf-16-sig encoding with TextIO to get it removed) and defaulting to
> UTF-8 (which the io module doesn't do) which is better left to the
> parser.

Maybe there are some lessons here that the TextIO module could learn?

-- 
--Guido van Rossum (python.org/~guido)