
On Tue, Nov 30, 2021 at 1:34 PM Paul Moore <p.f.moore@gmail.com> wrote:
On Tue, 30 Nov 2021 at 19:07, Brett Cannon <brett@python.org> wrote:
On Tue, Nov 30, 2021 at 9:09 AM Steven D'Aprano <steve@pearwood.info>
wrote:
On Tue, Nov 30, 2021 at 02:30:18PM +0000, Paul Moore wrote:
And to be clear, it's often very non-obvious how to annotate something - in https://github.com/pfmoore/editables I basically gave up because I couldn't work out how to write a maintainable annotation for an argument that is "a Path, or something that can be passed to the Path constructor to create a Path" (it's essentially impossible without copy/pasting the argument annotation for the Path constructor).
You're after https://docs.python.org/3/library/os.html?highlight=pathlike#os.PathLike: `str | PathLike[str]` (if you're only accepting string paths).
Well, it's not really. What I'm after, as I stated, is "anything that can be passed to the Path constructor". Yes, str | PathLike[str] is probably close enough (although why is it OK for me to prohibit bytes paths?)
It's your code; do what you want. 😁 If you want to accept bytes, then you can accept bytes. I personally just don't care about the bytes edge case, so I leave it out. Plus I don't think pathlib supports bytes paths to begin with, so if you are after just the pathlib.Path case then you don't want bytes unless you're going to handle the decoding. But if you want bytes then you're after https://github.com/python/typeshed/blob/8542b3518a36313af57b13227996168e592f... which is what `open()` takes (plus ints): `str | bytes | os.PathLike[str] | os.PathLike[bytes]`.
but that's what I mean about copying the Path constructor's annotations. If Path changes, I have to change my code.
Yep. You are writing down your expectations of what is acceptable to pass in and you're choosing to be very flexible, and so you have to write down more. I don't think anyone is claiming that typing your code takes no effort. But practically speaking the constructor to pathlib.Path isn't going to change.
This is a very common idiom:
def f(p: ???): p = Path(p) ...
Why isn't it correspondingly straightforward to annotate?
If we added an object to pathlib that represented the type annotation for the constructor of pathlib then it wouldn't be. It's probably not an unreasonable idea since this idiom is common enough, I just don't know if anyone has bothered to suggest it for the module.
If PathLike[str] included str, then it would be a lot easier. It's not at all obvious to me why it doesn't (well, that's not entirely true - it's because PathLike is an ABC, not a protocol, and it's not intended to define "the type of objects that the Path constructor takes"). It would still not be documented anywhere, though.
You're essentially wrapping the constructor to `pathlib.Path` as broadly as possible, which means you want types as broad as possible while still being accurate. You could also tell users, "give me pathlib.Path objects" and your troubles go away or "an object that implements the `__fspath__` protocol" (which is what `os.PathLike` represents, hence why `str` isn't covered by it); it all depends on what sort of assumptions you want to be able to make about what you're given. But because you're wanting to accept multiple types to support both the "old" way of string paths and the "new" way of `__fspath__` you then need to put the work in to accept those types. Personally, I treat string paths vs pathlib object paths like encoding and decoding strings; get it converted immediately at the boundary of your code and then just assume pathlib everywhere else. Pathlib was added in Python 3.4 and is well-known enough at this point that I don't worry about people not being aware of it, so I let users do the glue from any string paths they get to my APIs. But to be clear, path representations != pathlib.Path type parameters != `__fspath__` protocol; there's subtlety here regardless of the types.
I thought that type inference was supposed to solve that sort of problem? If the typechecker can see that an argument is passed to the Path constructor, it should be able to infer that it must be the same types as accepted by Path.
I would change that "should" to "may". Python's dynamism makes inferencing really hard.
That's fair. That's why I think it should be straightforward for the user to explicitly say "this argument should accept the same types as pathlib.Path does". If inference can't do it automatically, and the user can't (easily) let the checker know that it's OK to do it, then we're left with no easy way to express a very common pattern.
But how would you specify that? And what if the thing you're wrapping isn't typed itself? And what if the object you're wrapping takes multiple arguments? You could talk to the typing-sig and see if they have discussed some sort of `Parameter[pathlib.Path][0]` object to specify the type of the first argument to `pathlib.Path`.
Aside: I'm a little disappointed in the way the typing ecosystem has developed. What I understood was that we'd get type inference like ML or Haskell use, so we wouldn't need to annotate *everything*, only the bits needed to resolve ambiguity. But what we seem to have got is typing like C, Pascal and Java, except gradual. Am I being unreasonable to be disappointed? I'm not a heavy mypy user, I just dabble with it occasionally, so maybe I've missed something.
It really depends on the code base. Type checkers can make guesses based on the code they have available to them, but that only works if the usage is really clear and the dynamic nature of the code doesn't make things murky. For instance, look at open() and how whether you opened a file with `b` or not influences whether the object's methods return strings or bytes. What would you expect to be inferred in that case if you didn't annotate open() with overrides to specify how its arguments influence the returned object?
Personally, I'd be quite happy leaving open() as duck typed. I see this as what Steven was getting at - the "typing ecosystem" has moved into a situation where it's acknowledged that some typing problems are really hard, due to Python's dynamism, and yet there's still a drive to try to express such highly dynamic type constraints statically. And worse still, to insist that doing so is somehow necessary.
Sure, but how would you have expected old code that isn't about to change to be typed?
Whatever happened to "practicality beats purity", and typing being "gradual"? Surely annotating everything except open() is practical and gradual?
Yep, hence why people can still use code you didn't type and get *some* benefit from typing still. But there are limits as flowing types through untyped code is hard, hence why your users are asking for types; t limits what *they* can check for without writing type annotations for your code (at which point they are now the ones worrying about a drifting of types, much like you are with pathlib.Path). Code like pathlib.Path that can take various types or other things that can return different types become something of a blackhole for typing because the type checkers just can't figure out what you want or it figures it out to be so broad as to be useless.
Paul