PEP 428 - pathlib API questions

PEP 428 looks nice. Thanks, Antoine! I have a couple of questions about the module name and API. I think I've read through most of the previous discussion, but may have missed some, so please point me to the right place if there have already been discussions about these things. 1) Someone on reddit.com/r/Python asked "Is the import going to be 'pathlib'? I thought the renaming going on of std lib things with the transition to Python 3 sought to remove the spurious usage of appending 'lib' to libs?" I wondered about this too. Has this been discussed/answered? 2) I think the operation of "suffix" and "suffixes" is good, but not so much the name. I saw Ben Finney's original suggestion about multiple extensions etc (https://mail.python.org/pipermail/python-ideas/2012-October/016437.html). However, it seems there was no further discussion about why not "extension" and "extensions"? I have never heard a filename extension being called a "suffix". I know it is a suffix in the sense of the English word, but I've never heard it called that in this context, and I think context is important. Put another way, "extension" is obvious and guessable, "suffix" isn't. 3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm wondering about writing portable code when you want the string version of the path. In Python 3.x you'll call str(path_obj), but in Python 2.x that will fail if the path has unicode chars in it, and you'll need to use unicode(path_obj), which of course doesn't work 3.x. Is this just a fact of life, or would .str() or .as_string() help for 2.x/3.x portability? 4) Is path_obj.glob() recursive? In the PEP it looks like it is if the pattern starts with '**', but in the pep428 branch of the code there are both glob() and rglob() functions. I've never seen the ** syntax before (though admittedly I'm a Windows dev), and much prefer the explicitness of having two functions, or maybe even better, path_obj.glob('*.py', recursive=True). Seems much more Pythonic to provide an actual argument (or different function) for this change in behaviour, rather than stuffing the "recursive flag" inside the pattern string. Has this ship already sailed with http://bugs.python.org/issue13968? Which I also think should also be rglob(pattern) or glob(pattern, recursive=True). Of course, if this ship has already sailed, it's definitely better for pathlib's glob to match glob.glob. Thanks, Ben

Hello, On Mon, 25 Nov 2013 11:00:09 +1300 Ben Hoyt <benhoyt@gmail.com> wrote:
1) Someone on reddit.com/r/Python asked "Is the import going to be 'pathlib'? I thought the renaming going on of std lib things with the transition to Python 3 sought to remove the spurious usage of appending 'lib' to libs?" I wondered about this too. Has this been discussed/answered?
Well, "path" is much too common already, and it's an obvious variable name for a filesystem path, so "pathlib" is better to avoid name clashes.
2) I think the operation of "suffix" and "suffixes" is good, but not so much the name. I saw Ben Finney's original suggestion about multiple extensions etc (https://mail.python.org/pipermail/python-ideas/2012-October/016437.html).
However, it seems there was no further discussion about why not "extension" and "extensions"? I have never heard a filename extension being called a "suffix". I know it is a suffix in the sense of the English word, but I've never heard it called that in this context, and I think context is important. Put another way, "extension" is obvious and guessable, "suffix" isn't.
Well, perhaps :-), but nobody opposed suffix and suffixes at the time. Note the API is provisional, so we can still make it change, but obviously the barrier for changes is higher now that the PEP is accepted and the beta has been cut.
3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm wondering about writing portable code when you want the string version of the path. In Python 3.x you'll call str(path_obj), but in Python 2.x that will fail if the path has unicode chars in it, and you'll need to use unicode(path_obj), which of course doesn't work 3.x.
The behaviour of unicode paths in Python 2 is erratic (system-dependent). pathlib can't really fix it: Python 2 doesn't know about a well-defined filesystem encoding.
4) Is path_obj.glob() recursive?
This is documented: http://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob
Seems much more Pythonic to provide an actual argument (or different function) for this change in behaviour, rather than stuffing the "recursive flag" inside the pattern string.
It's not a flag, it's a different wildcard. This allows e.g. a library function to call glob() and users to pass a recursive or non-recursive pattern as they wish.
Has this ship already sailed with http://bugs.python.org/issue13968?
This issue is still open, so no :-) Regards Antoine.

Well, "path" is much too common already, and it's an obvious variable name for a filesystem path, so "pathlib" is better to avoid name clashes.
Yep, that makes total sense, thanks.
However, it seems there was no further discussion about why not "extension" and "extensions"? I have never heard a filename extension being called a "suffix". I know it is a suffix in the sense of the English word, but I've never heard it called that in this context, and I think context is important. Put another way, "extension" is obvious and guessable, "suffix" isn't.
Well, perhaps :-), but nobody opposed suffix and suffixes at the time. Note the API is provisional, so we can still make it change, but obviously the barrier for changes is higher now that the PEP is accepted and the beta has been cut.
Okay. I won't push hard :-) as "suffix" isn't terrible, but has anyone else never (or rarely) heard the term "suffix" applied to filename extensions?
3) Obviously pathlib isn't going in the stdlib in Python 2.x, but I'm wondering about writing portable code when you want the string version of the path. In Python 3.x you'll call str(path_obj), but in Python 2.x that will fail if the path has unicode chars in it, and you'll need to use unicode(path_obj), which of course doesn't work 3.x.
The behaviour of unicode paths in Python 2 is erratic (system-dependent). pathlib can't really fix it: Python 2 doesn't know about a well-defined filesystem encoding.
Fair enough.
4) Is path_obj.glob() recursive?
This is documented: http://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob
Seems much more Pythonic to provide an actual argument (or different function) for this change in behaviour, rather than stuffing the "recursive flag" inside the pattern string.
It's not a flag, it's a different wildcard. This allows e.g. a library function to call glob() and users to pass a recursive or non-recursive pattern as they wish.
Okay, just saw those docs now -- thanks. Fair enough re "it's a different wildcard". At the least I don't think there should be two ways to do it -- in other words, either rglob() or glob('**'), both seems very un-PEP 20 to me. My preference is rglob(), but glob(recursive=True) would be fine too.
Has this ship already sailed with http://bugs.python.org/issue13968?
This issue is still open, so no :-)
Same goes for this issue -- there should be OOWTDI, and my preference is rglob() or glob(recursive=True). But maybe issue 13968's behaviour can be determined by pathlib's now that pathlib is the one getting done first. Thanks, Ben.

Ben Hoyt wrote:
However, it seems there was no further discussion about why not "extension" and "extensions"? I have never heard a filename extension being called a "suffix".
You can't have read many unix man pages, then! I just searched for "suffix" in the gcc man page, and found this: For any given input file, the file name suffix determines what kind of compilation is done:
I know it is a suffix in the sense of the English word, but I've never heard it called that in this context, and I think context is important.
This probably depends on your background. In my experience, the term "extension" arose in OSes where it was a formal part of the filename syntax, often highly constrained. E.g. RT11, CP/M, early MS-DOS. Unix has never had a formal notion of extensions like that, only informal conventions, and has called them suffixes at least some of the time for as long as I can remember.
4) Is path_obj.glob() recursive? In the PEP it looks like it is if the pattern starts with '**',
I don't think it has to *start* with **. Rather, the ** is a pattern that can span directory separators. It's not a flag that applies to the whole thing -- a pattern could have a * in one place and a ** in another. -- Greg

However, it seems there was no further discussion about why not "extension" and "extensions"? I have never heard a filename extension being called a "suffix".
You can't have read many unix man pages, then!
Huh, no I haven't! Certainly not regularly, as I'm almost exclusively a Windows user. :-)
This probably depends on your background. In my experience, the term "extension" arose in OSes where it was a formal part of the filename syntax, often highly constrained. E.g. RT11, CP/M, early MS-DOS.
Unix has never had a formal notion of extensions like that, only informal conventions, and has called them suffixes at least some of the time for as long as I can remember.
Yes, seems like it definitely is background-dependent. I'm Windows-centric. I stand corrected, and recant my position on "suffix". :-)
4) Is path_obj.glob() recursive? In the PEP it looks like it is if the pattern starts with '**',
I don't think it has to *start* with **. Rather, the ** is a pattern that can span directory separators. It's not a flag that applies to the whole thing -- a pattern could have a * in one place and a ** in another.
Oh okay, that makes more sense. It definitely needs more thorough documentation in that case. I would still prefer the simpler and more explicit rglob() / recursive=True rather than pattern new syntax, but I don't feel as strongly anymore. -Ben

On 25 Nov 2013 09:14, "Ben Hoyt" <benhoyt@gmail.com> wrote:
4) Is path_obj.glob() recursive? In the PEP it looks like it is if the pattern starts with '**',
I don't think it has to *start* with **. Rather, the ** is a pattern that can span directory separators. It's not a flag that applies to the whole thing -- a pattern could have a * in one place and a ** in another.
Oh okay, that makes more sense. It definitely needs more thorough documentation in that case. I would still prefer the simpler and more explicit rglob() / recursive=True rather than pattern new syntax, but I don't feel as strongly anymore.
Using "**" for directory spanning globs is also another case of us borrowing a reasonably common idiom from *nix systems that may not be familiar to Windows users. Cheers, Nick.
-Ben _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

Using "**" for directory spanning globs is also another case of us borrowing a reasonably common idiom from *nix systems that may not be familiar to Windows users.
Okay, *nix wins then. :-) Python's stdlib is already fairly *nix-oriented (even when it's being cross-platform), so I guess it's not a big deal. My only remaining concern then is that there shouldn't be more than one way to do recursive globbing in a new API like this. Why does rglob() exist when the documentation simply says "like calling glob() but with '**' added in front of the pattern"? http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob -Ben

On 25 Nov 2013 09:42, "Ben Hoyt" <benhoyt@gmail.com> wrote:
Using "**" for directory spanning globs is also another case of us
borrowing
a reasonably common idiom from *nix systems that may not be familiar to Windows users.
Okay, *nix wins then. :-) Python's stdlib is already fairly *nix-oriented (even when it's being cross-platform), so I guess it's not a big deal.
My only remaining concern then is that there shouldn't be more than one way to do recursive globbing in a new API like this. Why does rglob() exist when the documentation simply says "like calling glob() but with '**' added in front of the pattern"?
http://docs.python.org/dev/library/pathlib.html#pathlib.Path.rglob
Because it's a layered API - embedding ** in the pattern is a strictly more powerful interface, but can be a little tricky to get your head around (especially if you don't use a shell that has the feature). rglob() is simpler, but not as flexible. We offer that kind of multi-level API fairly often. For example, subprocess.call() and friends are simpler interfaces for particular ways of using the powerful-but-complex subprocess.Popen API. The metaprogramming stack (functions, classes, decorators, descriptors, metaclasses) similarly offers the ability to trade increased complexity for increases in power and flexibility. In these cases, the "obvious way" is to use the simplest API that covers the use case, and only reach for the more complex API when you genuinely need it. Cheers, Nick.
-Ben

2013/11/25 Greg Ewing <greg.ewing@canterbury.ac.nz>:
Ben Hoyt wrote:
However, it seems there was no further discussion about why not "extension" and "extensions"? I have never heard a filename extension being called a "suffix".
You can't have read many unix man pages, then! I just searched for "suffix" in the gcc man page, and found this:
For any given input file, the file name suffix determines what kind of compilation is done:
I know it is a suffix in the sense of the English word, but I've never heard it called that in this context, and I think context is important.
This probably depends on your background. In my experience, the term "extension" arose in OSes where it was a formal part of the filename syntax, often highly constrained. E.g. RT11, CP/M, early MS-DOS.
Unix has never had a formal notion of extensions like that, only informal conventions, and has called them suffixes at least some of the time for as long as I can remember.
Indeed. Just for reference, here's an extract of POSIX basename(1) man page [1]: """ SYNOPSIS basename string [suffix] DESCRIPTION The string operand shall be treated as a pathname, as defined in XBD Pathname. The string string shall be converted to the filename corresponding to the last pathname component in string and then the suffix string suffix, if present, shall be removed. """ [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/basename.html cf
participants (6)
-
Antoine Pitrou
-
Ben Hoyt
-
Charles-François Natali
-
Greg Ewing
-
Nick Coghlan
-
Serhiy Storchaka