[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
Ethan Furman
ethan at stoneleaf.us
Sat Apr 9 12:41:01 EDT 2016
On 04/09/2016 12:48 AM, Nick Coghlan wrote:
> Considering the helper function usage, here's some examples in
> combination with os.fsencode and os.fsdecode:
>
> # Status quo for binary/text path conversions
> text_path = os.fsdecode(bytes_path)
> bytes_path = os.fsencode(text_path)
>
> # Getting a text path from an arbitrary object
> text_path = os.fspath(obj) # This doesn't scream "returns text!"
> text_path = os.fspathname(obj) # This does
>
> # Getting a binary path from an arbitrary object
> bytes_path = os.fsencode(os.fspath(obj))
> bytes_path = os.fsencode(os.fspathname(obj))
>
> I'm starting to think the semantic nudge from the "name" suffix when
> reading the code is worth the extra four characters when writing it
> (keeping in mind that the whole point of this exercise is that most
> folks *won't* be writing explicit conversions - the stdlib will handle
> it on their behalf).
>
> I also think the more explicit name helps answer some of the type
> signature questions that have arisen:
>
> 1. Does os.fspathname return rich Path objects? No, it returns names
> as str objects
> 2. Will file descriptors pass through os.fspathname? No, as they're
> not names, they're numeric descriptors.
> 3. Will bytes-like objects pass through os.fspathname? No, as they're
> not names, they're encodings of names
This worries me.
I know the primary purpose of this change is to enable pathlib and os
and the rest of the stdlib to work together, but consider . . .
If adding a new attribute/method was as far as we went, new code (stdlib
or otherwise) would look like:
if isinstance(a_path_thingy, bytes):
# because os can accept bytes
pass
elif isinstance(a_path_thingy, str):
# but it's usually text
pass
elif hasattr(a_path_thingy, '__fspath__'):
a_path_thingy = a_path_thingy.__fspath__()
else:
raise TypeError('not a valid path')
# do something with the path
If we add os.fspath(), but don't allow bytes to be returned from it, our
above example looks more like:
if isinstance(a_path_thingy, bytes):
# because os can accept bytes
pass
else:
a_path_thingy = os.fspath(a_path_thingy)
# do something with the path
Yes, it's better -- but it still requires a pre-check before calling
os.fspath().
It is my contention that this is better:
a_path_thingy = os.fspath(a_path_thingy)
This raises two issues:
1) Part of the stdlib is the new scandir module, which can work
with, and return, both bytes and text -- if __fspath__ can only
hold text, DirEntry will not get the __fspath__ method added,
and the pre-check, boiler-plate code will flourish;
2) pathlib.Path accepts bytes -- so what happens when a byte-derived
Path is passed to os.fspath()? Is a TypeError raised? Do we
guess and auto-convert with fsdecode()?
I think the best answer is to
- let __fspath__ hold bytes as well as text
- let fspath() return bytes as well as text
--
~Ethan~
More information about the Python-Dev
mailing list