
I have been using pathlib, and I have come up with a few suggestions on what would make the module more useful for me (and hopefully others): First, for me, extensions are primarily useful as a single unit. So, practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is ".tar.gz". So it would be nice to have some properties to make it easier to deal with the "complete" extension like this. There is a "suffixes" property, but it returns a list, which you then have to recombine manually. And as far as I can tell there is no method to return the name without any extension. And there is no method for replacing all the extensions at once. So although the names are tentative, perhaps there could be a "fullsuffix" property to return the extensions as a single string, a "nosuffix" extension to return the path without any extensions, and a "with_suffixes" method that replaces all the suffix and can accept multiple arguments (which would then be joined to create the extensions). Second, for methods like "rename" and "replace", it would be nice if there was an "exist_ok" argument that defaults to "True" to allow for safe renaming. Third, it would be nice if there was a "uid" and "gid" method for getting the numeric user and group IDs for a file, or alternatively a "numeric" argument for the "owner" and "group" methods. Fourth, for the "is_*" methods, it would be nice if there was a "strict" argument that would raise an exception if the file or directory doesn't exist. Finally, although not problem with the module per se, the example for the "parts" property should probably show at least one file with an extension, to make it clear how it deals with extensions (since the documentation is ambiguous in this regard). Thanks for your time.

As another suggestion, I'd love an rmtree method analogous to shutil.rmtree. And maybe also a remove method, that basically does: if path.is_dir(): path.rmtree() else: path.unlink() \-- Ryan (ライアン) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else <http://refi64.com/> On Jan 24 2017, at 2:32 pm, Todd <toddrjen@gmail.com> wrote:
I have been using pathlib, and I have come up with a few suggestions on what would make the module more useful for me (and hopefully others):
First, for me, extensions are primarily useful as a single unit. So, practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is ".tar.gz". So it would be nice to have some properties to make it easier to deal with the "complete" extension like this. There is a "suffixes" property, but it returns a list, which you then have to recombine manually. And as far as I can tell there is no method to return the name without any extension. And there is no method for replacing all the extensions at once. So although the names are tentative, perhaps there could be a "fullsuffix" property to return the extensions as a single string, a "nosuffix" extension to return the path without any extensions, and a "with_suffixes" method that replaces all the suffix and can accept multiple arguments (which would then be joined to create the extensions). Second, for methods like "rename" and "replace", it would be nice if there was an "exist_ok" argument that defaults to "True" to allow for safe renaming. Third, it would be nice if there was a "uid" and "gid" method for getting the numeric user and group IDs for a file, or alternatively a "numeric" argument for the "owner" and "group" methods. Fourth, for the "is_*" methods, it would be nice if there was a "strict" argument that would raise an exception if the file or directory doesn't exist. Finally, although not problem with the module per se, the example for the "parts" property should probably show at least one file with an extension, to make it clear how it deals with extensions (since the documentation is ambiguous in this regard). Thanks for your time.

On Wed, Jan 25, 2017 at 7:30 AM, Todd <toddrjen@gmail.com> wrote:
+0. Not all files with multiple dots in them are actually using them to mean multiple file extensions. Every day I'm working with files that use dots to separate words in a title, or have section numbers ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc. Since there's no perfect way to pin these down, this needs to be a completely separate feature, and it'd only really be useful for some situations. So go ahead, if there's interest, but the current one shouldn't be deprecated or anything. ChrisA

On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico <rosuav@gmail.com> wrote:
Of course the current ones shouldn't be deprecated, I never suggested they should be. The whole point of using new method and property names was to avoid any conflict with the existing methods. And yes, it won't work in all situations. Which method or property you would use depends on your specific needs.

(I have a small question, I hope it's not off-topic for this thread.) What was the rationale behind an explicit `iterdir` method? Why not simply make the `Path` objects iterable? ________________________________________ From: Python-ideas <python-ideas-bounces+vamsi_ism=outlook.com@python.org> on behalf of Todd <toddrjen@gmail.com> Sent: Wednesday, January 25, 2017 3:32:14 AM To: python-ideas Subject: Re: [Python-ideas] pathlib suggestions On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico <rosuav@gmail.com<mailto:rosuav@gmail.com>> wrote: On Wed, Jan 25, 2017 at 7:30 AM, Todd <toddrjen@gmail.com<mailto:toddrjen@gmail.com>> wrote:
+0. Not all files with multiple dots in them are actually using them to mean multiple file extensions. Every day I'm working with files that use dots to separate words in a title, or have section numbers ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc. Since there's no perfect way to pin these down, this needs to be a completely separate feature, and it'd only really be useful for some situations. So go ahead, if there's interest, but the current one shouldn't be deprecated or anything. ChrisA Of course the current ones shouldn't be deprecated, I never suggested they should be. The whole point of using new method and property names was to avoid any conflict with the existing methods. And yes, it won't work in all situations. Which method or property you would use depends on your specific needs.

I'm just going to let fly with the +1s and -1s, don't take them too seriously, they're basically impressionistic (I'm not a huge user of pathlib yet). Todd writes:
So although the names are tentative, perhaps there could be a "fullsuffix" property to return the extensions as a single string,
-0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I also don't really see the use case.
a "nosuffix" extension to return the path without any extensions,
+1 (subject to name bikeshedding) .suffixes itself is kinda useless without this, and you shouldn't have to roll your own Do you propose to return a Path or a str here?
Do you propose to return a Path or a str here? +1 for a Path, +0 for a str.
-1 I don't see how this is an improvement. If it would raise if exist_ok == False, then try: p.rename(another_p, exist_ok=False) except ExistNotOKError: take_evasive_action(p) doesn't seem like a big improvement over if p.exists(): take_evasive_action(p) else: p.rename(another_p) And if it doesn't raise, then the action just silently fails? Name bikeshedding: IIRC, if an argument is essentially always going to be one of a small number of literals, Guido strongly prefers a new method (eg, rename_safely). I will admit that the current API seems strange to me: on Unix, .rename and .replace are apparently the same, and both unsafe? I would prefer .rename Unix semantics (deprecated) .rename_safely replacement for .rename, raises if exists .replace silently replace Names to be bikeshedded per usual.
Third, it would be nice if there was a "uid" and "gid" method for getting the numeric user and group IDs for a file,
+1
or alternatively a "numeric" argument for the "owner" and "group" methods.
-1 (see "Guido prefers" above)
-1 That seems weird in a library intended for the syntactic manipulation of uninterpreted paths (even though this is a semantic operation). TOOWTDI and EIBTI, as well. For backward compatibility, strict would have to default to False.
the example for the "parts" property should probably show at least one file with an extension,
+1 Steve

On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
The whole point of pathlib is to provide convenience functions for common path-related operations. It is full of methods and properties that could be implemented other ways. Dealing with multi-part extensions, at least for me, is extremely common. A ".tar.gz" file is not the same as a ".tar.bz2" or a ".svg.gz". When I want to find a ".tar.gz" file, having to deal with the ".tar" and ".gz" parts separately is nothing but a nuisance. If I want to find and extract ".rar" files, I don't want ".part1.rar" files, ".part2.rar" files, and so on. So for me dealing with the extension as a single unit, rather than individual parts, is the most common approach.
I intend it to behave as much as possible like the existing "stem" property, so a string.
It is intended to behave as much as possible like the existing "with_suffix" method, so a Path.
As Ed said, this can lead to race conditions. Something could happen after you check "exists". Also, the "mkdir" method already has an "exist_ok" argument, and the "open" function has the "x" flag to raise an exception if the file already exists. It seems like a major omission to me that there are safe ways to make files and safe ways to make directories, but no safe way to move files or directories.
File and directory handling is already full of flags like this. This argument was taken verbatim from the existing "mkdir" method for consistency.
First, these methods only exist for "concrete" paths, which are explicitly intended for use in I/O operations. Second, as before, this argument is taken from another method. In this case, the "resolve" method has a "strict" argument. Any other approach suffers from the same race conditions as "rename" and "replace", and again it seems weird that resolving a path can be done safely but testing it can't be. And yes, the argument would have to default to "False". All of my suggestions are intended to be completely backwards-compatible. I don't see that as a problem, though.

On 01/25/2017 04:04 PM, Todd wrote:
But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? Existing tools like glob and endswith() can deal with the ".tar.gz" extension reliably, but "fullsuffix" would, arguably, not give the answers you want. Perhaps more specialized tools would be useful, though, for example: repacked_path = original_path.replace_suffix(".tar.gz", ".zip")

On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin <encukou@gmail.com> wrote:
I wouldn't use it in that situation. The existing "suffix" and "stem" properties also only work reliably under certain situations.
Perhaps more specialized tools would be useful, though, for example: repacked_path = original_path.replace_suffix(".tar.gz", ".zip")
That is helpful if I want to rename, not if I want to (for example) uncompress a file.

On Wed, Jan 25, 2017, at 03:33 PM, Todd wrote:
I wouldn't use it in that situation.
You might not, but it seems like an attractive nuisance. You can't reliably use it as a test for .tar.gz files, but it would be easy to think that you can and write buggy code using it. And I can't currently think of a general example where it would be useful. I thought about suggesting a 'hassuffix' method, but it doesn't pass the 'one way to do it' test when you can do: p.name.endswith('.tar.gz')

On Wed, Jan 25, 2017 at 10:45 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
From my perspective at least, those arguments apply just as well to the existing "suffix" and "stem" properties.
Then why is there a "match" method? It doesn't seem like the "one way to do it test" is being used for pathlib, nor do I think it really applies for a module whose whole point is to provide convenience tools.

On Wed, Jan 25, 2017, at 03:58 PM, Todd wrote:
From my perspective at least, those arguments apply just as well to the existing "suffix" and "stem" properties.
To some extent it does. But the convention of looking at a single extension is common enough that there's a stronger case for providing easy access to that.
I thought about suggesting a 'hassuffix' method, but it doesn't pass the 'one way to do it' test when you can do:
p.name.endswith('.tar.gz')
Everything is trade-offs: if you can justify why a new thing is useful enough, that can override the 'one way to do it' consideration. That's why we now have four kinds of string formatting. But I don't think 'X got away with it so we should allow Y too' is a compelling argument.

Hi all, It seems to me that the correct algorithm to get the "full suffix" is not to take everything after the FIRST dot, but rather to: 1. Recognize that the last suffix is one of the UNIX-style compression tools .Z, .gz, ,bz2, .xz, .lzma (at least) 2. Then add the next-to-last suffix. So we can then determine that the suffix of order.for.tar.ps.gz is .ps.gz and the basename is order.for.tar . However, I am not sure if we want to hard-code a list of such suffixes in the standard library. (Even though it could be user-extensible.) Stephan 2017-01-25 16:33 GMT+01:00 Todd <toddrjen@gmail.com>:

On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote:
Those [.tar.foo] are just examples that I encounter a lot, there can be other cases where multiple extensions are used.
The real issue is that there's no definition of what an extension is. You can have dots anywhere in a filename, and it's not at all unusual for them to be used before the bit we recognise as the extension. Almost every package on PyPI has files named like 'pip-9.0.1.tar.gz', but '.0.1.tar.gz' clearly doesn't make any sense as an extension. Without a good definition of what the 'full extension' is, we can't have code to find it. Thomas

On 25 January 2017 at 16:04, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
More precisely, we *can* have code to find it, but it's of necessity application-specific, and so not a good fit for a general library like the stdlib. One of the design principles for code in the stdlib is "does it solve a sufficiently general problem?" In this case, there's a general problem, which is "give me back what I think of as the suffix in this case" - but the proposed method doesn't solve that problem (because of the cases already quoted). Conversely, the problem which the proposed solution *does* solve ("give me the part of the filename after the first dot") isn't general enough to warrant going into the stdlib, because it's too often not what people actually want. Paul

On Wed, Jan 25, 2017 at 11:04 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Right, that is why we would have three properties 1. suffix: gets the part after the last period as a string, including the period (already exists), so "spam.tar.gz" -> ".gz" 2. fullsuffix: gets the part after the first period as a string, including the period (this is what I am proposing), so "spam.tar.gz" -> ".gz" 3. suffixes: gets the part after the first period as a list of strings split on the leading period, each including the leading period (already exists), so "spam.tar.gz" -> [".tar", ".gz"] "suffix" is only useful if you are sure only the part after the last period is useful, "fullsuffix" is only useful if you are sure the entire part after first period is useful, and "suffixes" is needed in more complicated situations. This is similar in principle to having "str.split", "str.rsplit", "str.partition", and "str.rpartition". pathlib currently has the equivalent of "str.split" (suffixes) and "str.rpartition" (suffix), but lacks the equivalent of "str.partition" (fullsuffix).

On 01/25/2017 04:33 PM, Todd wrote:
Which situations do you mean? It works quite fine with multiple suffixes: The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can reasonably expect it's a gz-compressed file. If you uncompress it and strip the extension, you'll end up with a "pip-9.0.1.tar", where the suffix is ".tar" -- and humans would be surprised if it wasn't a tar archive. The function can't determine what a particular human would think of as the full (or "real") suffix in a particular situation -- but I wouldn't call it unreliable.
Something like this? uncompressed = original_path.replace_suffix(".tar.gz", "")

On 25Jan2017 0816, Petr Viktorin wrote:
It may be handy if suffixes was a reversed tuple of suffixes (or possibly a cumulative tuple):
Path('pip-9.0.1.tar.gz').suffixes ('.gz', '.tar', '.1', '.0')
This has a nice benefit for comparisons:
targzs = [f for f in all_files if f.suffixes[:2] == ('.gz', '.tar')]
It doesn't necessarily improve over .endswith(), but it has a slight convenience over .split() and arguably demonstrates intent more clearly. (Though my biggest issue with all of this is case-sensitivity, which probably means we need to add comparison functions to Path flavours in order to do this stuff properly.) The "cumulative tuple" version would be like this:
Path('pip-9.0.1.tar.gz').suffixes ('.gz', '.tar.gz', '.1.tar.gz', '.0.1.tar.gz')
This doesn't compare as nicely, since now we would use f.suffixes[1] which will raise if there is only one suffix (likely). But it does return a value which cannot be easily recreated using other functions. Cheers, Steve

On Wed, Jan 25, 2017 at 11:16 AM, Petr Viktorin <encukou@gmail.com> wrote:
A ".tar.gz" is not the same as a ".svg.gz". The fact that they are both gzip-compressed is an implementation detail as far as most software I deal with is concerned. My unarchiver will extract a ".tar.gz" into a directory as if it was just a ".tar", while my image viewer will view a ".svg.gz" as a vector image as if it was just a ".svg". From a user-interaction standpoint, the ".gz" part is ignored.

Just to be sure we're on the same page: - A .tar file is an uncompressed bundle of files. - A .gz file is a compressed version of a single file. - Technically, there's no such thing as a .tar.gz file. "x.tar.gz" means that if you unwrap it with gunzip, you'll get a file called "x.tar", which you can then unpack with tar. "x.tar.gz" is not a tar file using the gzip compression. It's a gz file which unpacks to a tar file. Conceptually, your unarchiver does it in two separate steps. Similarly, "x.svg.gz" is a gz file which unpacks to an svg file. Your viewer just knows to unzip it before use. I don't wanna appear as a naysayer, so here's an alternative suggestion: A parameter for a collection of "extension suffixes". The function will try to eat extensions from the end until it finds one NOT on the list (or it runs out). The docs can recommend `('gz', 'xz', 'bz', 'bz2', ...)`. Maybe a later Python version can use that recommendation as the default. IMO, ".part1" is not a part of the extension. You'd usually have "x.part1.rar" and "x.part2.rar" in the same folder, and it makes more sense that there are two files with base names "x.part1" and "x.part2" than to have two different files with the same base name and an extension which just keeps them ordered.

How about adding a new argument to with_suffix? Path.with_suffix(suffix: str, stripped: Union[int, str, Iterable[str]]=1) stripped would either receive an int (in which case it will greedily strip up to that many suffixes), or a (optionally compound) suffix which would be stripped if present verbatim, or an iterable of suffix strings, in which case it would strip all suffixes in the iterable as many times as available. Examples: Path('flop.pkg.tar.gz').with_suffix('') → Path('flop.pkg.tar') # current behavior Path('flop.pkg.tar.gz').with_suffix('', 2) → Path('flop.pkg') # you have to know what you’re doing. 3 would have stripped '.pkg' too Path('flop.pkg.tar.gz').with_suffix('', '.tar.gz') → Path('flop.pkg') Path('flop.pkg.tar.gz').with_suffix('', '.gz.tar') → Path('flop.pkg.tar.gz') # not stripped, the suffix doesn’t appear verbatim Path('flop.pkg.tar.gz.tar').with_suffix('', ['.gz', '.tar']) → Path('flop.pkg') # all instances stripped. probably useless. Franklin? Lee <leewangzhong+python@gmail.com> schrieb am Mi., 25. Jan. 2017 um 21:44 Uhr:

As another suggestion, I'd love an rmtree method analogous to shutil.rmtree. And maybe also a remove method, that basically does: if path.is_dir(): path.rmtree() else: path.unlink() \-- Ryan (ライアン) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else <http://refi64.com/> On Jan 24 2017, at 2:32 pm, Todd <toddrjen@gmail.com> wrote:
I have been using pathlib, and I have come up with a few suggestions on what would make the module more useful for me (and hopefully others):
First, for me, extensions are primarily useful as a single unit. So, practically speaking, the extension of "spam.tar.gz" isn't ".gz", it is ".tar.gz". So it would be nice to have some properties to make it easier to deal with the "complete" extension like this. There is a "suffixes" property, but it returns a list, which you then have to recombine manually. And as far as I can tell there is no method to return the name without any extension. And there is no method for replacing all the extensions at once. So although the names are tentative, perhaps there could be a "fullsuffix" property to return the extensions as a single string, a "nosuffix" extension to return the path without any extensions, and a "with_suffixes" method that replaces all the suffix and can accept multiple arguments (which would then be joined to create the extensions). Second, for methods like "rename" and "replace", it would be nice if there was an "exist_ok" argument that defaults to "True" to allow for safe renaming. Third, it would be nice if there was a "uid" and "gid" method for getting the numeric user and group IDs for a file, or alternatively a "numeric" argument for the "owner" and "group" methods. Fourth, for the "is_*" methods, it would be nice if there was a "strict" argument that would raise an exception if the file or directory doesn't exist. Finally, although not problem with the module per se, the example for the "parts" property should probably show at least one file with an extension, to make it clear how it deals with extensions (since the documentation is ambiguous in this regard). Thanks for your time.

On Wed, Jan 25, 2017 at 7:30 AM, Todd <toddrjen@gmail.com> wrote:
+0. Not all files with multiple dots in them are actually using them to mean multiple file extensions. Every day I'm working with files that use dots to separate words in a title, or have section numbers ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc. Since there's no perfect way to pin these down, this needs to be a completely separate feature, and it'd only really be useful for some situations. So go ahead, if there's interest, but the current one shouldn't be deprecated or anything. ChrisA

On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico <rosuav@gmail.com> wrote:
Of course the current ones shouldn't be deprecated, I never suggested they should be. The whole point of using new method and property names was to avoid any conflict with the existing methods. And yes, it won't work in all situations. Which method or property you would use depends on your specific needs.

(I have a small question, I hope it's not off-topic for this thread.) What was the rationale behind an explicit `iterdir` method? Why not simply make the `Path` objects iterable? ________________________________________ From: Python-ideas <python-ideas-bounces+vamsi_ism=outlook.com@python.org> on behalf of Todd <toddrjen@gmail.com> Sent: Wednesday, January 25, 2017 3:32:14 AM To: python-ideas Subject: Re: [Python-ideas] pathlib suggestions On Tue, Jan 24, 2017 at 4:27 PM, Chris Angelico <rosuav@gmail.com<mailto:rosuav@gmail.com>> wrote: On Wed, Jan 25, 2017 at 7:30 AM, Todd <toddrjen@gmail.com<mailto:toddrjen@gmail.com>> wrote:
+0. Not all files with multiple dots in them are actually using them to mean multiple file extensions. Every day I'm working with files that use dots to separate words in a title, or have section numbers ("4.2.5 Yada Yada Yada.md" does not have a base name of "4"), etc. Since there's no perfect way to pin these down, this needs to be a completely separate feature, and it'd only really be useful for some situations. So go ahead, if there's interest, but the current one shouldn't be deprecated or anything. ChrisA Of course the current ones shouldn't be deprecated, I never suggested they should be. The whole point of using new method and property names was to avoid any conflict with the existing methods. And yes, it won't work in all situations. Which method or property you would use depends on your specific needs.

I'm just going to let fly with the +1s and -1s, don't take them too seriously, they're basically impressionistic (I'm not a huge user of pathlib yet). Todd writes:
So although the names are tentative, perhaps there could be a "fullsuffix" property to return the extensions as a single string,
-0 '.'.join(p.suffixes) vs. p.fullsuffix? TOOWTDI says no. I also don't really see the use case.
a "nosuffix" extension to return the path without any extensions,
+1 (subject to name bikeshedding) .suffixes itself is kinda useless without this, and you shouldn't have to roll your own Do you propose to return a Path or a str here?
Do you propose to return a Path or a str here? +1 for a Path, +0 for a str.
-1 I don't see how this is an improvement. If it would raise if exist_ok == False, then try: p.rename(another_p, exist_ok=False) except ExistNotOKError: take_evasive_action(p) doesn't seem like a big improvement over if p.exists(): take_evasive_action(p) else: p.rename(another_p) And if it doesn't raise, then the action just silently fails? Name bikeshedding: IIRC, if an argument is essentially always going to be one of a small number of literals, Guido strongly prefers a new method (eg, rename_safely). I will admit that the current API seems strange to me: on Unix, .rename and .replace are apparently the same, and both unsafe? I would prefer .rename Unix semantics (deprecated) .rename_safely replacement for .rename, raises if exists .replace silently replace Names to be bikeshedded per usual.
Third, it would be nice if there was a "uid" and "gid" method for getting the numeric user and group IDs for a file,
+1
or alternatively a "numeric" argument for the "owner" and "group" methods.
-1 (see "Guido prefers" above)
-1 That seems weird in a library intended for the syntactic manipulation of uninterpreted paths (even though this is a semantic operation). TOOWTDI and EIBTI, as well. For backward compatibility, strict would have to default to False.
the example for the "parts" property should probably show at least one file with an extension,
+1 Steve

On Wed, Jan 25, 2017 at 12:25 AM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
The whole point of pathlib is to provide convenience functions for common path-related operations. It is full of methods and properties that could be implemented other ways. Dealing with multi-part extensions, at least for me, is extremely common. A ".tar.gz" file is not the same as a ".tar.bz2" or a ".svg.gz". When I want to find a ".tar.gz" file, having to deal with the ".tar" and ".gz" parts separately is nothing but a nuisance. If I want to find and extract ".rar" files, I don't want ".part1.rar" files, ".part2.rar" files, and so on. So for me dealing with the extension as a single unit, rather than individual parts, is the most common approach.
I intend it to behave as much as possible like the existing "stem" property, so a string.
It is intended to behave as much as possible like the existing "with_suffix" method, so a Path.
As Ed said, this can lead to race conditions. Something could happen after you check "exists". Also, the "mkdir" method already has an "exist_ok" argument, and the "open" function has the "x" flag to raise an exception if the file already exists. It seems like a major omission to me that there are safe ways to make files and safe ways to make directories, but no safe way to move files or directories.
File and directory handling is already full of flags like this. This argument was taken verbatim from the existing "mkdir" method for consistency.
First, these methods only exist for "concrete" paths, which are explicitly intended for use in I/O operations. Second, as before, this argument is taken from another method. In this case, the "resolve" method has a "strict" argument. Any other approach suffers from the same race conditions as "rename" and "replace", and again it seems weird that resolving a path can be done safely but testing it can't be. And yes, the argument would have to default to "False". All of my suggestions are intended to be completely backwards-compatible. I don't see that as a problem, though.

On 01/25/2017 04:04 PM, Todd wrote:
But what if the .tar.gz file is called "spam-4.2.5-final.tar.gz"? Existing tools like glob and endswith() can deal with the ".tar.gz" extension reliably, but "fullsuffix" would, arguably, not give the answers you want. Perhaps more specialized tools would be useful, though, for example: repacked_path = original_path.replace_suffix(".tar.gz", ".zip")

On Wed, Jan 25, 2017 at 10:18 AM, Petr Viktorin <encukou@gmail.com> wrote:
I wouldn't use it in that situation. The existing "suffix" and "stem" properties also only work reliably under certain situations.
Perhaps more specialized tools would be useful, though, for example: repacked_path = original_path.replace_suffix(".tar.gz", ".zip")
That is helpful if I want to rename, not if I want to (for example) uncompress a file.

On Wed, Jan 25, 2017, at 03:33 PM, Todd wrote:
I wouldn't use it in that situation.
You might not, but it seems like an attractive nuisance. You can't reliably use it as a test for .tar.gz files, but it would be easy to think that you can and write buggy code using it. And I can't currently think of a general example where it would be useful. I thought about suggesting a 'hassuffix' method, but it doesn't pass the 'one way to do it' test when you can do: p.name.endswith('.tar.gz')

On Wed, Jan 25, 2017 at 10:45 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
From my perspective at least, those arguments apply just as well to the existing "suffix" and "stem" properties.
Then why is there a "match" method? It doesn't seem like the "one way to do it test" is being used for pathlib, nor do I think it really applies for a module whose whole point is to provide convenience tools.

On Wed, Jan 25, 2017, at 03:58 PM, Todd wrote:
From my perspective at least, those arguments apply just as well to the existing "suffix" and "stem" properties.
To some extent it does. But the convention of looking at a single extension is common enough that there's a stronger case for providing easy access to that.
I thought about suggesting a 'hassuffix' method, but it doesn't pass the 'one way to do it' test when you can do:
p.name.endswith('.tar.gz')
Everything is trade-offs: if you can justify why a new thing is useful enough, that can override the 'one way to do it' consideration. That's why we now have four kinds of string formatting. But I don't think 'X got away with it so we should allow Y too' is a compelling argument.

Hi all, It seems to me that the correct algorithm to get the "full suffix" is not to take everything after the FIRST dot, but rather to: 1. Recognize that the last suffix is one of the UNIX-style compression tools .Z, .gz, ,bz2, .xz, .lzma (at least) 2. Then add the next-to-last suffix. So we can then determine that the suffix of order.for.tar.ps.gz is .ps.gz and the basename is order.for.tar . However, I am not sure if we want to hard-code a list of such suffixes in the standard library. (Even though it could be user-extensible.) Stephan 2017-01-25 16:33 GMT+01:00 Todd <toddrjen@gmail.com>:

On Wed, Jan 25, 2017, at 03:54 PM, Todd wrote:
Those [.tar.foo] are just examples that I encounter a lot, there can be other cases where multiple extensions are used.
The real issue is that there's no definition of what an extension is. You can have dots anywhere in a filename, and it's not at all unusual for them to be used before the bit we recognise as the extension. Almost every package on PyPI has files named like 'pip-9.0.1.tar.gz', but '.0.1.tar.gz' clearly doesn't make any sense as an extension. Without a good definition of what the 'full extension' is, we can't have code to find it. Thomas

On 25 January 2017 at 16:04, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
More precisely, we *can* have code to find it, but it's of necessity application-specific, and so not a good fit for a general library like the stdlib. One of the design principles for code in the stdlib is "does it solve a sufficiently general problem?" In this case, there's a general problem, which is "give me back what I think of as the suffix in this case" - but the proposed method doesn't solve that problem (because of the cases already quoted). Conversely, the problem which the proposed solution *does* solve ("give me the part of the filename after the first dot") isn't general enough to warrant going into the stdlib, because it's too often not what people actually want. Paul

On Wed, Jan 25, 2017 at 11:04 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Right, that is why we would have three properties 1. suffix: gets the part after the last period as a string, including the period (already exists), so "spam.tar.gz" -> ".gz" 2. fullsuffix: gets the part after the first period as a string, including the period (this is what I am proposing), so "spam.tar.gz" -> ".gz" 3. suffixes: gets the part after the first period as a list of strings split on the leading period, each including the leading period (already exists), so "spam.tar.gz" -> [".tar", ".gz"] "suffix" is only useful if you are sure only the part after the last period is useful, "fullsuffix" is only useful if you are sure the entire part after first period is useful, and "suffixes" is needed in more complicated situations. This is similar in principle to having "str.split", "str.rsplit", "str.partition", and "str.rpartition". pathlib currently has the equivalent of "str.split" (suffixes) and "str.rpartition" (suffix), but lacks the equivalent of "str.partition" (fullsuffix).

On 01/25/2017 04:33 PM, Todd wrote:
Which situations do you mean? It works quite fine with multiple suffixes: The suffix of "pip-9.0.1.tar.gz" is ".gz", and sure enough, you can reasonably expect it's a gz-compressed file. If you uncompress it and strip the extension, you'll end up with a "pip-9.0.1.tar", where the suffix is ".tar" -- and humans would be surprised if it wasn't a tar archive. The function can't determine what a particular human would think of as the full (or "real") suffix in a particular situation -- but I wouldn't call it unreliable.
Something like this? uncompressed = original_path.replace_suffix(".tar.gz", "")

On 25Jan2017 0816, Petr Viktorin wrote:
It may be handy if suffixes was a reversed tuple of suffixes (or possibly a cumulative tuple):
Path('pip-9.0.1.tar.gz').suffixes ('.gz', '.tar', '.1', '.0')
This has a nice benefit for comparisons:
targzs = [f for f in all_files if f.suffixes[:2] == ('.gz', '.tar')]
It doesn't necessarily improve over .endswith(), but it has a slight convenience over .split() and arguably demonstrates intent more clearly. (Though my biggest issue with all of this is case-sensitivity, which probably means we need to add comparison functions to Path flavours in order to do this stuff properly.) The "cumulative tuple" version would be like this:
Path('pip-9.0.1.tar.gz').suffixes ('.gz', '.tar.gz', '.1.tar.gz', '.0.1.tar.gz')
This doesn't compare as nicely, since now we would use f.suffixes[1] which will raise if there is only one suffix (likely). But it does return a value which cannot be easily recreated using other functions. Cheers, Steve

On Wed, Jan 25, 2017 at 11:16 AM, Petr Viktorin <encukou@gmail.com> wrote:
A ".tar.gz" is not the same as a ".svg.gz". The fact that they are both gzip-compressed is an implementation detail as far as most software I deal with is concerned. My unarchiver will extract a ".tar.gz" into a directory as if it was just a ".tar", while my image viewer will view a ".svg.gz" as a vector image as if it was just a ".svg". From a user-interaction standpoint, the ".gz" part is ignored.

Just to be sure we're on the same page: - A .tar file is an uncompressed bundle of files. - A .gz file is a compressed version of a single file. - Technically, there's no such thing as a .tar.gz file. "x.tar.gz" means that if you unwrap it with gunzip, you'll get a file called "x.tar", which you can then unpack with tar. "x.tar.gz" is not a tar file using the gzip compression. It's a gz file which unpacks to a tar file. Conceptually, your unarchiver does it in two separate steps. Similarly, "x.svg.gz" is a gz file which unpacks to an svg file. Your viewer just knows to unzip it before use. I don't wanna appear as a naysayer, so here's an alternative suggestion: A parameter for a collection of "extension suffixes". The function will try to eat extensions from the end until it finds one NOT on the list (or it runs out). The docs can recommend `('gz', 'xz', 'bz', 'bz2', ...)`. Maybe a later Python version can use that recommendation as the default. IMO, ".part1" is not a part of the extension. You'd usually have "x.part1.rar" and "x.part2.rar" in the same folder, and it makes more sense that there are two files with base names "x.part1" and "x.part2" than to have two different files with the same base name and an extension which just keeps them ordered.

How about adding a new argument to with_suffix? Path.with_suffix(suffix: str, stripped: Union[int, str, Iterable[str]]=1) stripped would either receive an int (in which case it will greedily strip up to that many suffixes), or a (optionally compound) suffix which would be stripped if present verbatim, or an iterable of suffix strings, in which case it would strip all suffixes in the iterable as many times as available. Examples: Path('flop.pkg.tar.gz').with_suffix('') → Path('flop.pkg.tar') # current behavior Path('flop.pkg.tar.gz').with_suffix('', 2) → Path('flop.pkg') # you have to know what you’re doing. 3 would have stripped '.pkg' too Path('flop.pkg.tar.gz').with_suffix('', '.tar.gz') → Path('flop.pkg') Path('flop.pkg.tar.gz').with_suffix('', '.gz.tar') → Path('flop.pkg.tar.gz') # not stripped, the suffix doesn’t appear verbatim Path('flop.pkg.tar.gz.tar').with_suffix('', ['.gz', '.tar']) → Path('flop.pkg') # all instances stripped. probably useless. Franklin? Lee <leewangzhong+python@gmail.com> schrieb am Mi., 25. Jan. 2017 um 21:44 Uhr:
participants (13)
-
Chris Angelico
-
Ed Kellett
-
Franklin? Lee
-
Paul Moore
-
Petr Viktorin
-
Philipp A.
-
Ryan Gonzalez
-
Stephan Houben
-
Stephen J. Turnbull
-
Steve Dower
-
Thomas Kluyver
-
Todd
-
Vamsi Krishna Avula