[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Fri Apr 24 17:59:59 CEST 2009

2009/4/24 Antoine Pitrou <solipsis at pitrou.net>:
> Aahz <aahz <at> pythoncraft.com> writes:
>>
>> The part that I haven't seen clearly addressed so far is what happens
>> when disks get mounted across OSes (e.g. NFS).
>
> Unless there's some kind of native NFS API for file access, it is hopelessly out
> of scope for Python. We use whatever the C library exports to us, and don't have
> any control over filesystem details.

For "raw" level stuff (bytes on Unix, Unicode-nearly (:-)) on Windows)
that's right. Resist the temptation to guess and all that.

For the level Martin is (as far as I can tell) aiming at [1], we need
some defined rules on how to behave (relatively) sanely. Windows is
fairly easy - "nearly-Unicode" to Unicode isn't too bad. But on Unix,
you're dealing with bytes-to-Unicode in the absence of a clearly
stated encoding - which is a known can of worms...

In my view:

The pros for Martin's proposal are a uniform cross-platform interface,
and a user-friendly API for the common case.
The cons are subtle and complex corner cases, and lack of agreement on
the validity of the proposed encoding in those cases.

The fact that the bytes APIs won't go away probably mitigates the cons
to a large extent (again, in my view...)

Paul.

[1] Actually, all the PEP says is "With this PEP, a uniform treatment
of these data as characters becomes
possible." An argument as to why this is a good thing would be a
useful addition to the PEP. At the moment it's more or less treated as
self-evident - which I agree with, but which clearly the Unix people
here are not as certain of.