[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Nick Coghlan ncoghlan at gmail.com
Wed Apr 13 22:49:09 EDT 2016


On 14 April 2016 at 07:37, Victor Stinner <victor.stinner at gmail.com> wrote:
> Le mercredi 13 avril 2016, Brett Cannon <brett at python.org> a écrit :
>>
>> All of this is demonstrated in
>> https://gist.github.com/brettcannon/b3719f54715787d54a206bc011869aa1 by the
>> various possibilities. In the end it's not a corner case because the
>> definition of __fspath__ will be such that there's no ambiguity in what
>> os.fspath() will accept and what __fspath__ can return and the code will be
>> written to conform to what the PEP dictates (IOW I'm aware that this needs
>> to be considered in the implementation :) .
>
> I'm not a big fan of a flag parameter to change the return type of a
> function. Usually, two functions are preferred. In the os module we have
> getcwd/getcwdb for example. I don't know if it's a good example

It is, as one of the benefits of the "two separate functions" model is
to improve type inference during static analysis - you don't
necessarily know the values of parameters at analysis time, but you do
know which function is being called.

> Do you know other examples of Python functions taking a (flag) parameter to
> change the result type?

subprocess.Popen has a couple of flags that can do that (more
precisely, they change the return type of some methods on the
resulting object), but that's not an especially pretty API in general.
String based type variations are more common (e.g. file mode flags,
using the codec module registry), but they're still used only
sparingly (since they make the code harder to reason about for both
humans and static analysers).

In terms of types for filesystem path APIs:

1. I assume we'll want a fast path for bytes & str to avoid
performance regressions (especially in os.path, where we may be doing
pure data manipulation without any IO operations)
2. I favour defining __fspath__ and os.fspath() in terms of what the
os and os.path modules need to handle both DirEntry and pathlib (which
I currently expect to be str-or-bytes)
3. For the benefit of higher level cross-platform code like pathlib,
it likely makes sense to also have a str-only API that throws an
exception rather than returning bytes

However, I also suggest deferring a decision on 3 until 2 has been
definitively answered by way of implementing the changes. If I'm right
about 2, then the API could be something like:

- os.fspath -> str-or-bytes
- os.fsencode -> bytes (with coercion from str)
- os.fsdecode -> str (with coercion from bytes)
- os.strpath -> str (no coercion)

It's also worth noting that os.fsencode and os.fsdecode are already
idempotent - their current signatures are "str-or-bytes -> bytes" and
"str-or-bytes -> str". With a str-or-bytes return type on os.fspath,
adapting them to handle rich path objects should just be a matter of
adding an os.fspath call as the first step.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list