pathlib update idea: iterable Path that returns subpaths

I’d like to propose an enhancement to the wonderful pathlib module: make Path instances iterable, where the iterator returns subpaths. This would be functionally very similar to Path(some_directory).rglob(‘*’). Why not just use rglob(‘*’)? While globs are an extremely useful shorthand, in my experience they’re a bit of an artifact of Unix shells that programmers frequently aren’t aware of or don’t understand. Python does a great job of distilling the concept of what it means to iterate over an object. IO objects come to mind, where [line for line in sys.stdin] very intuitively iterates through each line of input. I believe making Path iterable over its sub paths provides a similarly intuitive concept of iterating over a path. Strawman implementation: class Path: # ... def __iter__(self): return self.rglob('*') Questions: If this were added, where in the Path class hierarchy[1] would it belong? [1]: https://docs.python.org/3/library/pathlib.html#module-pathlib

On Sat, 2 Oct 2021 at 13:22, Aaron Stacy <aaron.r.stacy@gmail.com> wrote:
I’d like to propose an enhancement to the wonderful pathlib module: make Path instances iterable, where the iterator returns subpaths.
This would be functionally very similar to Path(some_directory).rglob(‘*’).
... which is of course the major argument against this - it's already very easy to do this, so adding *yet another* way of iterating over the contents of a directory (we have Path.iterdir, Path.[r]glob, os.walk, os.scandir, os.listdir, ...) is just making things even more confusing. The counter-argument is "there should be one obvious way" - we definitely don't only have *one* way, at the moment, but none of them are "obvious". My big problem is that I don't think that making Path instances iterable is "obvious", either. What if the path is a file, not a directory? Why are we doing a recursive traversal, not just doing iterdir? If you want an "obvious" (IMO - this is all very subjective, and I'm not Dutch ;-)) approach, I'd argue that Path.iterdir(recursive=True) would be a more reasonable place for this functionality. But I know that Guido doesn't like functions that behave differently based on an argument that is typically always supplied as a literal value, so maybe my intuition isn't correct. There's also a lot of design decisions around things like whether to follow symlinks, how permission problems should be handled, etc. Clearly we could just say that we do what rglob("*") does, but then we're back to why we need something else that just does what rglob("*") does... I have some sympathy with the idea that rglob("*") isn't very discoverable, and os.walk is over-complex, but I'm not convinced this proposal is the right solution, either. Paul

02.10.21 15:44, Paul Moore пише:
Ideas of making Path iterable are proposed every several months. The problem is that they are different ideas. One want to iterate path components. Other want to iterate a directory specified by the path (recursively or not). Originally it was rejected because some third-party Path-like implementations can subclass str, and therefore inherit yet different iteration behavior from strings. It is safer to not make Path iterable and provide different methods for iterating different things in different way.

On Sat, 2 Oct 2021 at 13:22, Aaron Stacy <aaron.r.stacy@gmail.com> wrote:
I’d like to propose an enhancement to the wonderful pathlib module: make Path instances iterable, where the iterator returns subpaths.
This would be functionally very similar to Path(some_directory).rglob(‘*’).
... which is of course the major argument against this - it's already very easy to do this, so adding *yet another* way of iterating over the contents of a directory (we have Path.iterdir, Path.[r]glob, os.walk, os.scandir, os.listdir, ...) is just making things even more confusing. The counter-argument is "there should be one obvious way" - we definitely don't only have *one* way, at the moment, but none of them are "obvious". My big problem is that I don't think that making Path instances iterable is "obvious", either. What if the path is a file, not a directory? Why are we doing a recursive traversal, not just doing iterdir? If you want an "obvious" (IMO - this is all very subjective, and I'm not Dutch ;-)) approach, I'd argue that Path.iterdir(recursive=True) would be a more reasonable place for this functionality. But I know that Guido doesn't like functions that behave differently based on an argument that is typically always supplied as a literal value, so maybe my intuition isn't correct. There's also a lot of design decisions around things like whether to follow symlinks, how permission problems should be handled, etc. Clearly we could just say that we do what rglob("*") does, but then we're back to why we need something else that just does what rglob("*") does... I have some sympathy with the idea that rglob("*") isn't very discoverable, and os.walk is over-complex, but I'm not convinced this proposal is the right solution, either. Paul

02.10.21 15:44, Paul Moore пише:
Ideas of making Path iterable are proposed every several months. The problem is that they are different ideas. One want to iterate path components. Other want to iterate a directory specified by the path (recursively or not). Originally it was rejected because some third-party Path-like implementations can subclass str, and therefore inherit yet different iteration behavior from strings. It is safer to not make Path iterable and provide different methods for iterating different things in different way.
participants (3)
-
Aaron Stacy
-
Paul Moore
-
Serhiy Storchaka