[Python-Dev] When should pathlib stop being provisional?
Chris Angelico
rosuav at gmail.com
Tue Apr 5 20:02:30 EDT 2016
On Wed, Apr 6, 2016 at 9:45 AM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Apr 5, 2016 at 4:13 PM, Chris Angelico <rosuav at gmail.com> wrote:
>> On Wed, Apr 6, 2016 at 9:08 AM, Alexander Walters
>> <tritium-list at sdamon.com> wrote:
>>> * pathlib should be improved (specifically by making it inherit from str)
>>
>> I'd like to see this specific change settled on in the PEP, actually.
>> There are some arguments on both sides, and some hybrid solutions
>> being proposed, and it looks to be an important enough issue to people
>> for there to be an answer somewhere. It seems to come down to a
>> sloppiness vs strictness concern, I think, but I'm not sure.
>
> This does sound like it's the crucial issue, and it is worth writing
> up clearly the pros and cons. Let's draft those lists in a thread
> (this one's fine) and then add them to the PEP. We can then decide to:
>
> - keep the status quo
> - change PurePath to inherit from str
> - decide it's never going to be settled and kill pathlib.py
>
> (And yes, I'm dead serious about the latter, rather Solomonic option.)
Summarizing from memory to get things started.
Inheriting from str makes it easier for code to support pathlib
without really caring about the details.
NOT inheriting from str forces code to be aware that it's working with
a path, in the same way that text and bytes are fundamentally
different things, and the Unicode string doesn't inherit from the byte
string, nor vice versa.
If a few crucial built-in functions support Path objects (notably
open() and a handful of os.* functions), the bulk of stdlib support
will be easy (sometimes trivial) to implement.
Paths are [or are not] fundamentally different from strings. <-- argued point
Paths might be backed by Unicode text, and might be backed by bytes.
Should a Path be able to be implicitly constructed from either?
Should there be some sort of "Path literal"? <-- possibly a completely
separate question, to be resolved after this one
How should .. be handled? Can you canonicalize a Path?
Can Path handle URIs as well as file system paths?
-----
My personal view on the text/bytes debate is that a path is
fundamentally a human concept, and consists therefore of text. The
fact that some file systems store (at the low level) bytes and some
store (I think) UTF-16 code units should be immaterial; path
components exist for people. We can smuggle unrecognized bytes around,
but ultimately, those bytes came from characters at some point - we
just don't know the encoding. So a Path object has no relationship
with bytes, only with str.
Whether a Path is fundamentally "a text string that uses slashes to
separate components" or "a tuple of path components" is up for debate.
Both make a lot of sense, and I'm somewhat inclined to the latter
view; it allows for other forms of path component, such as an open
directory (for statat/openat etc), or a special thing representing
"current directory" or "root directory".
ChrisA
More information about the Python-Dev
mailing list