[Python-Dev] Bytes path support

Stephen J. Turnbull stephen at xemacs.org
Sat Aug 23 11:02:06 CEST 2014


Chris Barker writes:

 > So I write bytes that are encoded one way into a text file that's encoded
 > another way, and expect to be abel to read that later?

No, not you.  Crap software does that.  Your MUD server.  Oleg's
favorite web pages with ads, or more likely the ad servers.

 > Not for me (or many other users) -- terminals are sometimes set
 > with ascii-only encoding,

So?  That means you can't handle text files in general, only those
restricted to ASCII.  That's a completely different issue.

 > Python3 supports this case very well. But it does indeed make it
 > hard to work with filenames when you don't know the encoding they
 > are in.

No, it doesn't.  Reasonably handling "text streams" in unknown,
possibly multiple, encodings is just hard.  Python 3 has nothing to do
with it, and Oleg should know that very well.

It's true that code written in Python 2 to handle these issues needs
to be ported to Python 3.  Things is, Oleg says "another tool" -- any
non-Python-2 tool will need porting of his code too.

 > And apparently that's pretty common -- or common enough that it
 > would be nice for Python to support it well. This trick is how --
 > we'd like the "just pass it around and do path manipulations" case
 > to work with (almost) arbitrary bytes,

It does.  That's what os.path is for.

 > but everything else to work naturally with text (unicode text).

No gloss, please.  It's text, period.  The internal Unicode encoding
is *not exposed*, with a few (important) exceptions such as Han
unification.

 > I think the way to do this is to abstract the path concept, like pathlib
 > does.

You forgot to append the word "well".<wink/>

 > From my personal experience, non-ascii filenames are much easier to
 > deal with if I use unicode for filenames everywhere (py2). Somehow,
 > I have yet to be bitten by mixed encoding in filenames.

.gov domain?  ASCII-only terminal settings?  It's not "somehow", it's
that you live a sheltered life.<wink/>

 > So will using a surrogate-escape error handling with pathlib make
 > all this just work?

Not answerable until you define "all this" more precisely.

And that's the big problem with Oleg's complaint, too.  It's not at
all clear what he wants, except that all of his current code should
continue to work in Python 3.  Just like all of us.  The question then
is persuading him that it's worth moving to Python 3 despite the
effort of porting Python-2-specific code.  Maybe he can be persuaded,
maybe not.  Python 2 is a better than average language.



More information about the Python-Dev mailing list