data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers. 1. copy The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links. A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including: * symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one. 2. recursive remove This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way. 3. newLine for write_text This is the only relevant option that "Path.open" has but "Path.write_text" doesn't, and is a serious omission when dealing with multiple operating systems. 4. uid and gid You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number. 5. Stem with no suffixes The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them. 6. with_suffixes Equivalent to with_suffix, but replacing all suffixes. Again, this is a symmetry issue. It is hard to manipulate all the suffixes right now, as the example show. You can add them or extract them, but not change them without doing several steps. 7. exist_ok for is_* methods Currently all the is_* methods (such as is_file) return False if the file doesn't exist or if it is a broken symlink. This can be dangerous, since it is not trivially easy to tell if you are dealing with the wrong type of file vs. a missing file. And it isn't obvious behavior just from the method name. I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
data:image/s3,"s3://crabby-images/7c5da/7c5da102c926b3f2d1d8a7b421a337a59d187a84" alt=""
I really like these ideas. Effectively, we can use pathlib.Path without ever needing to import shutil. We would like also copyfile from shutil if we are only interested copying the file data. How about adding append_text and append_bytes with newLine similar to what you suggested?
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Sun, Nov 22, 2020 at 3:27 PM Abdulla Al Kathiri < alkathiri.abdulla@gmail.com> wrote:
There have been proposals to make pathlib provide ALL file-related operations, but that is not this proposal. shutil would still provide a lot of more advanced functionality. So I think just one operation, copying the file in the least destructive manner possible (probably equivalent to copy2). I wouldn't use "append_text" or "append_bytes" since you can use "open" for something like that. Sincerely, Todd On Sun, Nov 22, 2020 at 3:27 PM Abdulla Al Kathiri < alkathiri.abdulla@gmail.com> wrote:
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Nov 23, 2020 at 6:54 AM Todd <toddrjen@gmail.com> wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
Keep 'em in one thread for now, but if any of them become too controversial, it's probably worth narrowing the scope and spinning off the debatable ones in their own threads. General principle, by the way: The operations that currently exist are the fundamental primitives, and you're asking for higher-level operations to be made available. That might be a good summary for the proposal. (For example, renaming one thing to another is a primitive, but copying a file generally means opening both names, reading and writing, and then closing.) A few specifics:
I don't think it's so very strange (see above about primitive vs high level), but it does seem a reasonable enhancement. (It'd need the same caveats as on shutil.copy.)
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
Absolutely agree, but not for the same reason: pruning a branch off a directory tree is VERY easy to naively get wrong, and shutil.rmtree has a lot of code in it to protect itself.
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
That does seem a strange omission. If the other proposals get bogged down in controversy, spin this one off as its own thread, as I think it shouldn't be difficult to add it. It might be worth looking at this as "making shutil support Path objects", and then have the Path objects grow methods that delegate to shutil. That'd avoid duplicating logic eg for rmtree and copyfile. ChrisA
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Sun, Nov 22, 2020 at 5:46 PM Chris Angelico <rosuav@gmail.com> wrote:
I think even that is debatable. I would say "read_text", "read_bytes", "write_text", and "write_bytes" are higher-level operations on top of "open" in much the same way "copy" is. "glob" and "rglob" are also higher-level operations on top of iterdirs. And as far as I can see that only really applies to "copy". "user" and "group" are really higher-level routines on top of the primitive "gid" and "uid", and the rest are meant to be counterparts of operations that already exist.
As I said, I don't think this is any less primitive than "read_text", "read_bytes", "write_text", "write_bytes", "glob", or "rglob".
Another good point. The question is whether it should be its own method or an argument.
Sure.
shutil already supports Path objects. And yes, I was planning to delegate the logic to existing functions there or in "os".
data:image/s3,"s3://crabby-images/390e4/390e4af1b83b782269a8cec2804d302f0bb2cbc5" alt=""
For Path.mkdir, exist_ok=True inhibits an error if a directory already exists. You're proposing that for Path.is_dir, exist_ok=True should inhibit an error if the directory does not exist. A parameter to enable that behavior sounds reasonable to me, but it definitely shouldn't have the name "exist_ok"; it does the opposite of what exist_ok does.
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
Hi Todd, my comments below. Also would offer my time for reviewing/testing if wanted. On 22.11.20 20:53, Todd wrote:
I really would appreciate that one. If I could through in another detail which we needed a lot: - atomic_copy or copy(atomic=True) whatever form you prefer It is not as easy to achieve as it may look on the first sight. Especially when it comes to tempfiles and permissions. The use cases of atomic copy included scenarios for multiple parallel access of files like caches in web development.
Importing shutil does not seem to be a big deal but I agree that it's somehow weird to be missing. Correct me if I'm wrong, but os.path somehow is closer to OS-level operations whereas shutil basically provides all the missing convenience features that sh provided. So, to me it boils down to the question if pathlib is a completely new paradigm. If so, then sure let's add it. Additionally, I like the "batteries included" theme of Python. Last but not least, I tend more towards the "rmtree" method just to make it crystal clear to everyone. Maybe docs could cross-refer both methods. Tree manipulations are inherently complicated and a lot can go wrong. Symmetry is not 100% given as you might delete more than what you've created (which was a single node path).
+1
+1
+1 Does anybody rely of this behavior of ".stem"? It always seemed odd to me but that might be because of the use-cases I work with. So, another possibility would be to fix "stem" to do what makes sense. Maybe also a renaming the concept "suffix" to "final_suffix" (also more concurrent to what docs says: "The file extension of the final component, if any:"). To me that has always been the weirdest conceptual behavior of the lib. Not sure if that's possible to fix before people need time machines.
+1 Same comment like for basestem.
+1 Maybe missing_ok could help more to make people understand what the parameter actually does. exist_ok is used for creation methods (mkdir and touch). So, the name makes more sense in these context. Best Sven
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
Hi Sven, Thanks for your support and feedback. On Thu, Dec 31, 2020, 07:23 Sven R. Kunze <srkunze@mail.de> wrote:
Is there already support for atomic writes in the standard library? I am not planning on implementing anything new, only exposing existing functionality. Adding atomic operations to the stslib would likely require a pep and substantial discussion of API and implementation. I don't really have the background to do that. A common objection is that pathlib doesn't work on multiple paths. But
Pathlib already has a number of higher-level operations besides what is in os, Last but not least, I tend more towards the "rmtree" method just to make it
We already have tree removal functionality that this can use internally. As for the name, one thing to consider is that making a recursive tree uses an argument. And I think the argument would need to be keyword-only to avoid accidentally invoking it.
This is a backwards compatibility break and I don't want to get into the complications of doing that. There is really no benefit to breaking backwards compatibility. I would strongly suspect renaming a method then making a new, completely different method with the same name is not going to happen. The burden is just too high relative to the benefits.
Yes, you are right. Someone else pointed out this issue too.
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
I split my answers up to address different issues in different threads. On 31.12.20 15:32, Todd wrote:
So far I didn't find any of this implemented in the stdlib but please correct me if I am wrong. As far as I know, one working pattern would be 1. creating a file or the directory structure using tempfile 2. then setting permissions from the original directory object 3. and finally moving it to its final destination (path and name) The last part is done atomically at least in Linux (rename) and Windows (ReplaceFile). Especially setting permission an easy oversight which can cause issues e.g. with xsendfile. What would be the steps to do get it done? Best Sven
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Thu, Jan 7, 2021, 10:54 Sven R. Kunze <srkunze@mail.de> wrote:
This is outside my area of expertise and is far outside the scope of my proposal, so it really needs its own thread. I think there have periodically been requests for atomic file operations on this mailing list, so the first step would be to search the mailing list and see what prevented those from going anywhere.
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
I am not much concerned about the internals; shutil.rmtree should work fine here. I am more concerned with the external interface (see also root/base/stem) and its impression on developers.
Why not adding a new method? I am still not convinced from a safety perspective that adding a new meaning-changing argument to a removal function is such a good idea. Just consider the following example. When deleting /my/soon/to/be/deleted/dir, you DO NOT delete a simple directory as the method name "rmdir" would suggest: - /my/soon/to/be/deleted/dir instead you delete something like this: - /my/soon/to/be/deleted/dir/d1/f1 - /my/soon/to/be/deleted/dir/d1/d3/f1 - /my/soon/to/be/deleted/dir/d1/d3/f2 - /my/soon/to/be/deleted/dir/d1/d3 - /my/soon/to/be/deleted/dir/d1/f2 - /my/soon/to/be/deleted/dir/d1/f3 - /my/soon/to/be/deleted/dir/d1 - /my/soon/to/be/deleted/dir/d2/f1 - /my/soon/to/be/deleted/dir/d2/f2 - /my/soon/to/be/deleted/dir/d2 - /my/soon/to/be/deleted/dir That is a completely different beast (a complete tree) and there is no way back once deleted. And there are a couple of other reasons when I look as the interface of shutil.rmtree and what "recursively" really means for pathlib. Best Sven
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
On 31.12.20 15:32, Todd wrote:
I disagree that breaking compatibility has no benefit. The benefit is always long-term but I understand that you don't want to go through deprecating and re-adding the same name. One concern I have is that "rootsteam" or "basestem" is not really a well-defined concept (as was "stem" when it was added - at least the multiple suffix concept was there but has little influence on the naming of the stem concept). We already have concepts like: basename << rightmost part of a path root << toplevel node of a tree; also filesystem
The burden is just too high relative to the benefits.
What exactly is the burden? Best Sven
data:image/s3,"s3://crabby-images/3c316/3c31677f0350484505fbc9b436d43c966f3627ad" alt=""
Todd wrote: I'm in favor of most of these additions. I was a heavy user of path.py and I'm missing those "advanced" features in pathlib.
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ? It's pretty common to have dots in filenames instead of blanks for example, and stem does the right thing here : '/data/my.little.file.txt'. There is also the case of hidden files on Linux, what do you expect for /home/toto/.program.cfg ?
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Joseph Martinot-Lagarde writes:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ?
Not really. stem.ext -> stem.ext.zzz where zzz is a compression extension is a pretty common naming convention. For me ext == 'tar' is by far the most common case (74%), 'tis true, but 'patch' (10%), 'txt' (6%), 'tab', 'gml', 'xml', 'svg', 'pdf', 'ps', ' dvi', 'diff', 'pdb', 'cpp', 'el', and 'data' also exist somewhere under $HOME. I'll bet others show up if I search /usr, /var, and /opt.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jan 10, 2021 at 4:51 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Yep, and most of my man pages are compressed, so there's usr/share/man/man1/*.1.gz and friends. I'd say the most common case with multiple extensions is indeed precisely two, where the first one is the type of file (or in the case of man pages, the section), and the second is a compression format. But there'll be less common cases too. ChrisA
data:image/s3,"s3://crabby-images/4937b/4937b27410834ce81f696e8505f05dcd413883b2" alt=""
On 2021-01-10 at 05:03:08 +1100, Chris Angelico <rosuav@gmail.com> wrote:
I also have a pile of whatever-x.y.z.* files, where the * is some kind of compression extension and x.y.z is a major.minor.patch identifier. Most of the time, my brain is big enough to spot where x.y.z ends and the extension(s) begin(s), but throw in a version identifier like 4.3.beta, and all bets are off (unless I happen to know exactly what to look for, in which case I wouldn't bother with a general purpose library function that might make the wrong assumption).
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On my system: % find ~ -name '*.*.*' | rev | cut -d. -f-2 | rev | sort | uniq -c | sort -nr | head -30 17278 d.ts 11314 js.map 6600 symbolic.png 4041 png.i 3968 cpython-37.pyc 2656 yarn-metadata.json 2614 yarn-tarball.tgz 2575 c.i 2526 csv.gz 1727 h.i 1659 opt-1.pyc 1590 opt-2.pyc 1302 autogen.js 1151 ts.map 1148 js.flow 854 svg.i 852 min.js 744 test.js 651 travis.yml 560 gif.i 522 so.0 403 indexeddb.leveldb 384 pom.sha1 368 ref.css 367 0.0 357 so.1 311 event.jsonlz4 283 xpm.i 278 ref.ui 275 am.i Most of those I honestly have no idea what they are. That's just starting from $HOME. System wide, who knows. On Sat, Jan 9, 2021 at 7:27 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
For my entire filesystem: 124920 cpython-38.pyc 50034 html.gz 31158 cpython-39.pyc 31032 d.ts 30415 cpython-37.pyc 21473 cpython-36.pyc 19000 js.map 9888 symbolic.png 5086 cpython-35.pyc 5004 1.gz 4657 cpython-38-x86_64-linux-gnu.so 4261 pypy36.pyc 4152 Debian.gz 4041 png.i 3534 cpython-33.pyc 3421 cpython-34.pyc 2950 min.js 2880 cpython-34.pyo 2668 unix.ip 2668 unix.gid 2668 rpcsec.init 2668 rpcsec.context 2656 yarn-metadata.json 2615 csv.gz 2614 yarn-tarball.tgz 2575 c.i 2442 3.gz 2202 tar.bz2 2128 so.0 2124 ts.map On Sat, Jan 9, 2021 at 9:37 PM David Mertz <mertz@gnosis.cx> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/7c5da/7c5da102c926b3f2d1d8a7b421a337a59d187a84" alt=""
I really like these ideas. Effectively, we can use pathlib.Path without ever needing to import shutil. We would like also copyfile from shutil if we are only interested copying the file data. How about adding append_text and append_bytes with newLine similar to what you suggested?
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Sun, Nov 22, 2020 at 3:27 PM Abdulla Al Kathiri < alkathiri.abdulla@gmail.com> wrote:
There have been proposals to make pathlib provide ALL file-related operations, but that is not this proposal. shutil would still provide a lot of more advanced functionality. So I think just one operation, copying the file in the least destructive manner possible (probably equivalent to copy2). I wouldn't use "append_text" or "append_bytes" since you can use "open" for something like that. Sincerely, Todd On Sun, Nov 22, 2020 at 3:27 PM Abdulla Al Kathiri < alkathiri.abdulla@gmail.com> wrote:
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Nov 23, 2020 at 6:54 AM Todd <toddrjen@gmail.com> wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
Keep 'em in one thread for now, but if any of them become too controversial, it's probably worth narrowing the scope and spinning off the debatable ones in their own threads. General principle, by the way: The operations that currently exist are the fundamental primitives, and you're asking for higher-level operations to be made available. That might be a good summary for the proposal. (For example, renaming one thing to another is a primitive, but copying a file generally means opening both names, reading and writing, and then closing.) A few specifics:
I don't think it's so very strange (see above about primitive vs high level), but it does seem a reasonable enhancement. (It'd need the same caveats as on shutil.copy.)
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
Absolutely agree, but not for the same reason: pruning a branch off a directory tree is VERY easy to naively get wrong, and shutil.rmtree has a lot of code in it to protect itself.
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
That does seem a strange omission. If the other proposals get bogged down in controversy, spin this one off as its own thread, as I think it shouldn't be difficult to add it. It might be worth looking at this as "making shutil support Path objects", and then have the Path objects grow methods that delegate to shutil. That'd avoid duplicating logic eg for rmtree and copyfile. ChrisA
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Sun, Nov 22, 2020 at 5:46 PM Chris Angelico <rosuav@gmail.com> wrote:
I think even that is debatable. I would say "read_text", "read_bytes", "write_text", and "write_bytes" are higher-level operations on top of "open" in much the same way "copy" is. "glob" and "rglob" are also higher-level operations on top of iterdirs. And as far as I can see that only really applies to "copy". "user" and "group" are really higher-level routines on top of the primitive "gid" and "uid", and the rest are meant to be counterparts of operations that already exist.
As I said, I don't think this is any less primitive than "read_text", "read_bytes", "write_text", "write_bytes", "glob", or "rglob".
Another good point. The question is whether it should be its own method or an argument.
Sure.
shutil already supports Path objects. And yes, I was planning to delegate the logic to existing functions there or in "os".
data:image/s3,"s3://crabby-images/390e4/390e4af1b83b782269a8cec2804d302f0bb2cbc5" alt=""
For Path.mkdir, exist_ok=True inhibits an error if a directory already exists. You're proposing that for Path.is_dir, exist_ok=True should inhibit an error if the directory does not exist. A parameter to enable that behavior sounds reasonable to me, but it definitely shouldn't have the name "exist_ok"; it does the opposite of what exist_ok does.
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
Hi Todd, my comments below. Also would offer my time for reviewing/testing if wanted. On 22.11.20 20:53, Todd wrote:
I really would appreciate that one. If I could through in another detail which we needed a lot: - atomic_copy or copy(atomic=True) whatever form you prefer It is not as easy to achieve as it may look on the first sight. Especially when it comes to tempfiles and permissions. The use cases of atomic copy included scenarios for multiple parallel access of files like caches in web development.
Importing shutil does not seem to be a big deal but I agree that it's somehow weird to be missing. Correct me if I'm wrong, but os.path somehow is closer to OS-level operations whereas shutil basically provides all the missing convenience features that sh provided. So, to me it boils down to the question if pathlib is a completely new paradigm. If so, then sure let's add it. Additionally, I like the "batteries included" theme of Python. Last but not least, I tend more towards the "rmtree" method just to make it crystal clear to everyone. Maybe docs could cross-refer both methods. Tree manipulations are inherently complicated and a lot can go wrong. Symmetry is not 100% given as you might delete more than what you've created (which was a single node path).
+1
+1
+1 Does anybody rely of this behavior of ".stem"? It always seemed odd to me but that might be because of the use-cases I work with. So, another possibility would be to fix "stem" to do what makes sense. Maybe also a renaming the concept "suffix" to "final_suffix" (also more concurrent to what docs says: "The file extension of the final component, if any:"). To me that has always been the weirdest conceptual behavior of the lib. Not sure if that's possible to fix before people need time machines.
+1 Same comment like for basestem.
+1 Maybe missing_ok could help more to make people understand what the parameter actually does. exist_ok is used for creation methods (mkdir and touch). So, the name makes more sense in these context. Best Sven
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
Hi Sven, Thanks for your support and feedback. On Thu, Dec 31, 2020, 07:23 Sven R. Kunze <srkunze@mail.de> wrote:
Is there already support for atomic writes in the standard library? I am not planning on implementing anything new, only exposing existing functionality. Adding atomic operations to the stslib would likely require a pep and substantial discussion of API and implementation. I don't really have the background to do that. A common objection is that pathlib doesn't work on multiple paths. But
Pathlib already has a number of higher-level operations besides what is in os, Last but not least, I tend more towards the "rmtree" method just to make it
We already have tree removal functionality that this can use internally. As for the name, one thing to consider is that making a recursive tree uses an argument. And I think the argument would need to be keyword-only to avoid accidentally invoking it.
This is a backwards compatibility break and I don't want to get into the complications of doing that. There is really no benefit to breaking backwards compatibility. I would strongly suspect renaming a method then making a new, completely different method with the same name is not going to happen. The burden is just too high relative to the benefits.
Yes, you are right. Someone else pointed out this issue too.
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
I split my answers up to address different issues in different threads. On 31.12.20 15:32, Todd wrote:
So far I didn't find any of this implemented in the stdlib but please correct me if I am wrong. As far as I know, one working pattern would be 1. creating a file or the directory structure using tempfile 2. then setting permissions from the original directory object 3. and finally moving it to its final destination (path and name) The last part is done atomically at least in Linux (rename) and Windows (ReplaceFile). Especially setting permission an easy oversight which can cause issues e.g. with xsendfile. What would be the steps to do get it done? Best Sven
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Thu, Jan 7, 2021, 10:54 Sven R. Kunze <srkunze@mail.de> wrote:
This is outside my area of expertise and is far outside the scope of my proposal, so it really needs its own thread. I think there have periodically been requests for atomic file operations on this mailing list, so the first step would be to search the mailing list and see what prevented those from going anywhere.
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
I am not much concerned about the internals; shutil.rmtree should work fine here. I am more concerned with the external interface (see also root/base/stem) and its impression on developers.
Why not adding a new method? I am still not convinced from a safety perspective that adding a new meaning-changing argument to a removal function is such a good idea. Just consider the following example. When deleting /my/soon/to/be/deleted/dir, you DO NOT delete a simple directory as the method name "rmdir" would suggest: - /my/soon/to/be/deleted/dir instead you delete something like this: - /my/soon/to/be/deleted/dir/d1/f1 - /my/soon/to/be/deleted/dir/d1/d3/f1 - /my/soon/to/be/deleted/dir/d1/d3/f2 - /my/soon/to/be/deleted/dir/d1/d3 - /my/soon/to/be/deleted/dir/d1/f2 - /my/soon/to/be/deleted/dir/d1/f3 - /my/soon/to/be/deleted/dir/d1 - /my/soon/to/be/deleted/dir/d2/f1 - /my/soon/to/be/deleted/dir/d2/f2 - /my/soon/to/be/deleted/dir/d2 - /my/soon/to/be/deleted/dir That is a completely different beast (a complete tree) and there is no way back once deleted. And there are a couple of other reasons when I look as the interface of shutil.rmtree and what "recursively" really means for pathlib. Best Sven
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
On 31.12.20 15:32, Todd wrote:
I disagree that breaking compatibility has no benefit. The benefit is always long-term but I understand that you don't want to go through deprecating and re-adding the same name. One concern I have is that "rootsteam" or "basestem" is not really a well-defined concept (as was "stem" when it was added - at least the multiple suffix concept was there but has little influence on the naming of the stem concept). We already have concepts like: basename << rightmost part of a path root << toplevel node of a tree; also filesystem
The burden is just too high relative to the benefits.
What exactly is the burden? Best Sven
data:image/s3,"s3://crabby-images/3c316/3c31677f0350484505fbc9b436d43c966f3627ad" alt=""
Todd wrote: I'm in favor of most of these additions. I was a heavy user of path.py and I'm missing those "advanced" features in pathlib.
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ? It's pretty common to have dots in filenames instead of blanks for example, and stem does the right thing here : '/data/my.little.file.txt'. There is also the case of hidden files on Linux, what do you expect for /home/toto/.program.cfg ?
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Joseph Martinot-Lagarde writes:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ?
Not really. stem.ext -> stem.ext.zzz where zzz is a compression extension is a pretty common naming convention. For me ext == 'tar' is by far the most common case (74%), 'tis true, but 'patch' (10%), 'txt' (6%), 'tab', 'gml', 'xml', 'svg', 'pdf', 'ps', ' dvi', 'diff', 'pdb', 'cpp', 'el', and 'data' also exist somewhere under $HOME. I'll bet others show up if I search /usr, /var, and /opt.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jan 10, 2021 at 4:51 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Yep, and most of my man pages are compressed, so there's usr/share/man/man1/*.1.gz and friends. I'd say the most common case with multiple extensions is indeed precisely two, where the first one is the type of file (or in the case of man pages, the section), and the second is a compression format. But there'll be less common cases too. ChrisA
data:image/s3,"s3://crabby-images/4937b/4937b27410834ce81f696e8505f05dcd413883b2" alt=""
On 2021-01-10 at 05:03:08 +1100, Chris Angelico <rosuav@gmail.com> wrote:
I also have a pile of whatever-x.y.z.* files, where the * is some kind of compression extension and x.y.z is a major.minor.patch identifier. Most of the time, my brain is big enough to spot where x.y.z ends and the extension(s) begin(s), but throw in a version identifier like 4.3.beta, and all bets are off (unless I happen to know exactly what to look for, in which case I wouldn't bother with a general purpose library function that might make the wrong assumption).
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
On my system: % find ~ -name '*.*.*' | rev | cut -d. -f-2 | rev | sort | uniq -c | sort -nr | head -30 17278 d.ts 11314 js.map 6600 symbolic.png 4041 png.i 3968 cpython-37.pyc 2656 yarn-metadata.json 2614 yarn-tarball.tgz 2575 c.i 2526 csv.gz 1727 h.i 1659 opt-1.pyc 1590 opt-2.pyc 1302 autogen.js 1151 ts.map 1148 js.flow 854 svg.i 852 min.js 744 test.js 651 travis.yml 560 gif.i 522 so.0 403 indexeddb.leveldb 384 pom.sha1 368 ref.css 367 0.0 357 so.1 311 event.jsonlz4 283 xpm.i 278 ref.ui 275 am.i Most of those I honestly have no idea what they are. That's just starting from $HOME. System wide, who knows. On Sat, Jan 9, 2021 at 7:27 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
For my entire filesystem: 124920 cpython-38.pyc 50034 html.gz 31158 cpython-39.pyc 31032 d.ts 30415 cpython-37.pyc 21473 cpython-36.pyc 19000 js.map 9888 symbolic.png 5086 cpython-35.pyc 5004 1.gz 4657 cpython-38-x86_64-linux-gnu.so 4261 pypy36.pyc 4152 Debian.gz 4041 png.i 3534 cpython-33.pyc 3421 cpython-34.pyc 2950 min.js 2880 cpython-34.pyo 2668 unix.ip 2668 unix.gid 2668 rpcsec.init 2668 rpcsec.context 2656 yarn-metadata.json 2615 csv.gz 2614 yarn-tarball.tgz 2575 c.i 2442 3.gz 2202 tar.bz2 2128 so.0 2124 ts.map On Sat, Jan 9, 2021 at 9:37 PM David Mertz <mertz@gnosis.cx> wrote:
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
participants (10)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Abdulla Al Kathiri
-
Chris Angelico
-
David Mertz
-
Joseph Martinot-Lagarde
-
Matt Wozniski
-
Random832
-
Stephen J. Turnbull
-
Sven R. Kunze
-
Todd