[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
Stephen J. Turnbull
stephen at xemacs.org
Thu Apr 14 20:20:42 EDT 2016
Ethan Furman writes:
> Substitute open() with sending those bytes somewhere else:
Eg, pathlib.Path, which will raise? Surely it should be safe to pass
a DirEntry to a pathlib constructor? Note that having Path call
fsdecode implicitly is a bad idea, because we don't know the
provenance of generic bytes. But by design of __fspath__, its value
(if str) is suitable for passing to Path, for further processing.
> why should I have to reencode this str back to bytes, when bytes
> are what I asked for in the first place?
Erm, you didn't *ask* for bytes. You asked for whatever __fspath__ is
going to give you. And in many cases, like pathlib, it will be str.
I imagine that doesn't bother you; you plan to use antipathy anyway.
But if there's uptake on the protocol, I'll bet that str-only
implementations are the majority.
And your question also cuts the other way. Why should *I* have to
decode bytes to str, or suffer unexpected TypeErrors, or deal with the
possibility of TypeErrors, just because __fspath__ is polymorphic?
We're here to improve pathlib. There's been a huge amount of mission
creep, with no use cases to provide intuition. You pit your abstract
inconvenience against my 20 years of whack-a-mole with UnicodeErrors
and TypeErrors in Mailman. I *know* that if you let bytes that
represent text loose inside an application, eventually they'll end up
in a str context and "blooey!"
> How did this application get a bytes path object to begin with?
> Either it explicitly used bytes when calling scandir and friends
> (in which case it shouldn't be surprised to be working with bytes);
> or it got that bytes object from a database, over-the-wire,
> an-other-language-lib, etc.
No, it got it from an __fspath__-toting object (such as a DirEntry) it
received from some library, which constructed it polymorphically from
bytes it got from some other place -- and so lost the original
encoding. That's the scenario I think is impossible to rule out, and
reducing that kind of scenario to the bare minimum is why bytes got
demoted from being the default representation of text in Python 3 in
the first place.
> If I'm working with bytes, why would I want to work with str?
First, are you actually *working* on those bytes, or are you just
passing them to os functions? If the latter, you shouldn't care.
Second, because paths are conceptually text (you may not agree, but
Nick inter alia has indicated he does). Working with bytes paths
(except literals) is a good way to get in trouble, because there are
all kinds of ways they can end up inappropriately encoded. For
example, the odds are very high that a bytes path read from a file
(including from a zipfile directory) in Japan will be encoded in Shift
JIS. On Mac OS X, that will either produce mojibake in the directory
(if the access creates the file) or fail to access the intended file,
because the filesystem encoding is UTF-8.
Third, because you want to be portable to Windows, where you have no
choice about whether paths are str or bytes.
These reasons probably don't apply to you with much strength, but the
question is how typical you are, vs. the nearly universal experience
of mojibake and the dominant market share of Windows.
> Python is a glue language, and Python practitioners don't always
> have the luxury of working only with text.
For paths? Of course you can work with them as text. ISTM what you
really want is the luxury of working only with bytes, because you're
in the habit of pretending they are text. I don't object to you
having your luxury as long as it doesn't increase risk for my use
cases. I think you're asking for trouble, and the practice is
definitely nonportable, but consenting adults applies.
However, the proposed polymorphism does create ambiguity and risk for
my uses. I rarely have the luxury of *not* ensuring paths are text,
regardless of the bytes-ness of the underlying application, because I
can be pretty darn sure that somebody's going to feed me non-
filesystem encodings, and soon. Even when I am working with bytes
representing paths in the filesystem encoding, I need to convert to
text to read the darn things when debugging! So I don't consent;
you'll have to impose it on me.
More information about the Python-Dev