[Python-Dev] Windows: Remove support of bytes filenames in theos module?

Stephen J. Turnbull stephen at xemacs.org
Thu Feb 11 22:10:15 EST 2016


Executive summary:

My experience is that having bytes APIs in the os module is very
useful.  But perhaps higher-level functions like os.scandir can do
without (I present no arguments either way on that, just acknowledge
it).

Andrew Barnert writes:

 > Anyway, Windows CDs can't cause this problem.

My bad.  I meant archival Mac CDs (or perhaps they were taken from a
network filesystem) which is where I see MacRoman, and Windows (ie,
FAT-formatted) USB drives, which is where I see Shift JIS.  The point
here is not what is technically possible or even standard, it's that
though what I see in practice may not *require* bytes APIs, it's *very
convenient* to have them (especially interactively).

 > The same thing is true with NTFS external drives, VFAT USB drives,
 > etc. Generally, it's usually not Windows media on *nix systems that
 > break Python 2 unicode; it's native *nix filesystems where users
 > mix locales.

IMHO, Python 2 unicode is not breakable, let alone broken. ;-)  Mailman
2 has managed to almost get to a state where you can't get it to raise
a Unicode exception (except where deliberately used as EAFP), let
alone one that is not handled (before the catch-all "except Exception"
that keeps the daemon running).  And that's in an application whose
original encoding support assumed standard conformance by design in a
realm where spammers and junior high school hackers regularly violate
the most ancient of RFCs (the restriction to ASCII in headers goes
back to a 6xx RFC at the latest!)  Python 2 Unicode turns out to have
been an excellent compromise between the needs of backward
compatibility with uniformly encoded bytestreams for Europe, and the
forward-looking needs of a globalizing Internet.  (But you knew that! 
:-)  As I wrote earlier, the world is broken, or at least Japan.  The
world "got bettah", thus Python 3.  And most of the time Python 3 is
wonderful in Japan (specifically, it's trivial to get recalcitrant
students to use best I18N practice).

My point is that *where I live* the experience is very different.
There are *no* Japanese who use *nix (other than Mac OS X) for
paperwork in my neighborhood.  Shift JIS filenames *are* from Windows
media recently written, though probably not by Microsoft-provided
software.  Bytes APIs are a very useful tool in dealing with these
issues, at least in the hands of someone who has become expert in
dealing with them.

I suspect the same is true of China, except that like their business
partner Apple they are in a position to legislate uniformity, and do.
(Unfortunately that's GB18030, not Unicode.)  So maybe they're better
off than a place that coined the phrase "politics that can't decide".

I admit I've not yet used os.scandir, let alone its bytes API.  Perhaps
we can, and perhaps we should, restrict the bytes API in the os module
to a few basic functions, and require that the environment be sane for
cases where we want to use higher-level or optimized functions.

 > > You contradict yourself! ;-)
 > 
 > I'm perfectly happen to have been wrong earlier. And if catching
 > myself before someone else did makes me a flip-flopper, well, I'm
 > not running for president. :P

I consider that the most important qualification for President,
especially if your name is Trump or Sanders.  That's one of the things
I respect most about Python: with a few (negligible) exceptions, minds
change to fit the facts.

And, BTW, EAFP applies here, too.  Make mistakes on the mailing lists
before you commit them to code.  Please!<wink/>



More information about the Python-Dev mailing list