[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Paul Moore p.f.moore at gmail.com
Sat Apr 16 12:30:15 EDT 2016


On 16 April 2016 at 14:46, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Paul Moore writes:
[...]
>  > 1. I just want to pass the argument on to other functions - just do
>  > so, stdlib functions will work fine.
>
> I think this is a bad idea unless you *need* polymorphism, but OK,
> it's "consenting adults".

All I'm really saying here is that if you don't need to care about
type checking (and 99% of Python programs rely on duck typing, so this
is pretty much the norm) then everything will be OK. I'm not
suggesting encouraging polymorphism, just pointing out that most code
should simply work and this whole debate is a non-issue for code like
that. (That's the whole point of getting the stdlib functions to
accept Path objects, after all :-))

>  > 2. I need a string - use os.fsdecode(p)
>  > 3. I need bytes - use os.fsencode(p)
>  > 4. I need a guaranteed pathlib.Path object so that I can use Path
>  > methods - convert via pathlib.Path(os.fsdecode(p))
>
> LGTM.  Applications or user toolkits could provide a derived
> IFeelLuckyPath(Path) for symmetry with the os functions.<wink/>
>
>  > I guess there's the possibility that you want to deliberately reject
>  > bytes-like paths,
>
> I wouldn't put it that way.  I think more likely is the possibility
> that you want to restrict yourself to a particular type, as all your
> code is written in terms of that type and expects that type.  Note
> that Nick's example shows that in both the bytes domain and the text
> domain you can easily end up with a filelike.name of the wrong type.

But within your own code, you do that by convention and good coding
practices, not by explicit type checks (except in boundary code). If
you're writing a library to be used by others, you should be as
permissive as possible - you may not expect your code to be called
with bytes-like paths, but why go out of your way to reject it? That's
not Pythonic, IMO. (On the other hand, documenting that only text-like
path objects are supported by your library is fine).

In my experience, bytes/text safety is about being aware of where the
two different types appear in your program, not about forcing only one
type. So my cases are about keeping the types clear - the output of
(1) is "same as input", of (2) is "string", of (3) is "bytes" and of
(4) is "Path". Call me with whatever you like, I can work with it in
terms I need.

But we're mostly just debating coding style here, I think we agree on
the basic principle.

>  > and it's not immediately obvious how you'd do that without
>  > os.fspath or using the __fspath__ protocol directly, but I'm not
>  > sure what anyone gains by doing so (maybe the chance to fail early?
>  > but doesn't using fsdecode mean I never need to fail at all?)
>
> Well, wouldn't you like to raise there if your dataflow spec says only
> one type should ever be observed?

Meh. Maybe asserts, maybe unit tests. But typechecks throughout my
code sounds more like strong typing than Python. But as I say, coding
style - I write scripts, glue code, and general-use libraries. None of
these lend themselves to that sort of rigorous dataflow analysis (this
is the same reason I have little personal use for the new typechecking
stuff).

> The reasons that I wouldn't bother are that (1) I suspect it's going
> to be very rare to see bytes in a text application, and (2) in bytes-
> oriented code I would be fairly likely to either specify literals as
> str (a bug, but nobody would ever notice) or importing them from an
> .ini or other text source (which might very well be in a non-
> filesystem encoding in my environment!)  In either case it's probably
> the filename I want but specified in the wrong form.

Also, that feels very much like the sort of boundary code that needs
to do the fiddly rigorous stuff so the rest of us don't have to :-)

Paul


More information about the Python-Dev mailing list