[Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)

Nick Coghlan ncoghlan at gmail.com
Thu Jan 9 08:09:10 CET 2014


On 9 January 2014 15:22, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Kristján Valur Jónsson wrote:
>>
>> all you want is to open that .txt
>> file on the drive and extract some phone numbers and merge in some email
>> addresses. What encoding does the file have? Do I care? Must I care?
>
>
> To some extent, yes. If the encoding happens to be an
> ascii-compatible one, such as latin-1 or utf-8, you can
> probably extract the phone numbers without caring what
> the rest of the bytes mean. But not if it's utf-16,
> for example.
>
> If you know that all the files on your system have an
> ascii-compatible encoding, you can use the surrogateescape
> error handler to avoid having to know about the exact
> encoding. Granted, that makes it slightly more complicated
> than it was in Python 2, but not much.

There's also the fact that POSIX folks are used to "r" and "rb" being
the same thing.

Python 3 chose to make the default behaviour be to open files as text
files in the default system encoding. This created two significant
user visible changes:

- POSIX users could no longer ignore the difference between binary
mode and text mode when opening files (Windows users have always had
to care due to the line ending problem)

- POSIX users could no longer ignore locale configuration errors

We're aiming to resolve the most common locale configuration issue by
configuring surrogateescape on the standard streams when the OS claims
that default encoding is ASCII, but ultimately, the long term fix is
for POSIX platforms to standardise on and consistently report UTF-8 as
the system encoding (as well as configuring ssh environments properly
by default)

Python 2 is *very* much a POSIX first language, with Windows, the JVM
and other non-POSIX environments as an afterthought. Python 3 is
intentionally offers more consistent cross platform behaviour, which
means it no longer aligns as neatly with the sensibilities of
experienced users of POSIX systems.

Cheers,
Nick.

>
> --
> Greg
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list