[Python-ideas] Python 3000 TIOBE -3%

Sat Feb 11 20:35:28 CET 2012

Masklinn, 11.02.2012 17:18:
> On 2012-02-11, at 13:53 , Stefan Behnel wrote:
>> Well, except for the cases where that didn't work. Remember that implicit
>> encoding behaves in a platform dependent way in Python 2, so even if your
>> code runs on your machine doesn't mean it will work for anyone else.
> 
> Sure, I said it allowed you, not that this allowance actually worked.
> 
>>> And using latin-1 in that context looks and feels weird/icky, the file is not
>>> encoded using latin-1, the encoding just happens to work to manipulate bytes as
>>> ascii text + non-ascii stuff.
>>
>> Correct. That's precisely the use case described above.
> 
> Yes, but now instead of just ignoring that stuff you have to actively and
> knowingly lie to Python to get it to shut up.

The advantage is that it becomes explicit what you are doing. In Python 2,
without any encoding, you are implicitly assuming that the encoding is
Latin-1, because that's how you are processing it. You're just not spelling
it out anywhere, thus leaving it to the innocent reader to guess what's
happening. In Python 3, and in better Python 2 code (using codecs.open(),
for example), you'd make it clear right in the open() call that Latin-1 is
the way you are going to process the data.

>> Besides, it's perfectly possible to process bytes in Python 3. You just
>> have to open the file in binary mode and do the processing at the byte
>> string level.
> 
> I think that's the route which should be taken

Oh, absolutely not. When it's text, it's best to process it as Unicode.

Stefan