I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers. 1. copy The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links. A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including: * symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one. 2. recursive remove This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way. 3. newLine for write_text This is the only relevant option that "Path.open" has but "Path.write_text" doesn't, and is a serious omission when dealing with multiple operating systems. 4. uid and gid You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number. 5. Stem with no suffixes The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them. 6. with_suffixes Equivalent to with_suffix, but replacing all suffixes. Again, this is a symmetry issue. It is hard to manipulate all the suffixes right now, as the example show. You can add them or extract them, but not change them without doing several steps. 7. exist_ok for is_* methods Currently all the is_* methods (such as is_file) return False if the file doesn't exist or if it is a broken symlink. This can be dangerous, since it is not trivially easy to tell if you are dealing with the wrong type of file vs. a missing file. And it isn't obvious behavior just from the method name. I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
I really like these ideas. Effectively, we can use pathlib.Path without ever needing to import shutil. We would like also copyfile from shutil if we are only interested copying the file data. How about adding append_text and append_bytes with newLine similar to what you suggested?
On Nov 22, 2020, at 11:53 PM, Todd <toddrjen@gmail.com> wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including:
* symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile
I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one.
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
3. newLine for write_text
This is the only relevant option that "Path.open" has but "Path.write_text" doesn't, and is a serious omission when dealing with multiple operating systems.
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
5. Stem with no suffixes
The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them.
6. with_suffixes
Equivalent to with_suffix, but replacing all suffixes. Again, this is a symmetry issue. It is hard to manipulate all the suffixes right now, as the example show. You can add them or extract them, but not change them without doing several steps.
7. exist_ok for is_* methods
Currently all the is_* methods (such as is_file) return False if the file doesn't exist or if it is a broken symlink. This can be dangerous, since it is not trivially easy to tell if you are dealing with the wrong type of file vs. a missing file. And it isn't obvious behavior just from the method name. I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YEN7UA... Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Nov 22, 2020 at 3:27 PM Abdulla Al Kathiri < alkathiri.abdulla@gmail.com> wrote:
On Nov 22, 2020, at 11:53 PM, Todd <toddrjen@gmail.com> wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including:
* symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile
I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one.
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
3. newLine for write_text
This is the only relevant option that "Path.open" has but "Path.write_text" doesn't, and is a serious omission when dealing with multiple operating systems.
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
5. Stem with no suffixes
The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them.
6. with_suffixes
Equivalent to with_suffix, but replacing all suffixes. Again, this is a symmetry issue. It is hard to manipulate all the suffixes right now, as the example show. You can add them or extract them, but not change them without doing several steps.
7. exist_ok for is_* methods
Currently all the is_* methods (such as is_file) return False if the file doesn't exist or if it is a broken symlink. This can be dangerous, since it is not trivially easy to tell if you are dealing with the wrong type of file vs. a missing file. And it isn't obvious behavior just from the method name. I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YEN7UA... Code of Conduct: http://python.org/psf/codeofconduct/
I really like these ideas. Effectively, we can use pathlib.Path without ever needing to import shutil. We would like also copyfile from shutil if we are only interested copying the file data. How about adding append_text and append_bytes with newLine similar to what you suggested?
There have been proposals to make pathlib provide ALL file-related operations, but that is not this proposal. shutil would still provide a lot of more advanced functionality. So I think just one operation, copying the file in the least destructive manner possible (probably equivalent to copy2). I wouldn't use "append_text" or "append_bytes" since you can use "open" for something like that. Sincerely, Todd On Sun, Nov 22, 2020 at 3:27 PM Abdulla Al Kathiri < alkathiri.abdulla@gmail.com> wrote:
I really like these ideas. Effectively, we can use pathlib.Path without ever needing to import shutil. We would like also copyfile from shutil if we are only interested copying the file data. How about adding append_text and append_bytes with newLine similar to what you suggested?
On Nov 22, 2020, at 11:53 PM, Todd <toddrjen@gmail.com> wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including:
* symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile
I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one.
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
3. newLine for write_text
This is the only relevant option that "Path.open" has but "Path.write_text" doesn't, and is a serious omission when dealing with multiple operating systems.
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
5. Stem with no suffixes
The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them.
6. with_suffixes
Equivalent to with_suffix, but replacing all suffixes. Again, this is a symmetry issue. It is hard to manipulate all the suffixes right now, as the example show. You can add them or extract them, but not change them without doing several steps.
7. exist_ok for is_* methods
Currently all the is_* methods (such as is_file) return False if the file doesn't exist or if it is a broken symlink. This can be dangerous, since it is not trivially easy to tell if you are dealing with the wrong type of file vs. a missing file. And it isn't obvious behavior just from the method name. I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YEN7UA... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Nov 23, 2020 at 6:54 AM Todd <toddrjen@gmail.com> wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
Keep 'em in one thread for now, but if any of them become too controversial, it's probably worth narrowing the scope and spinning off the debatable ones in their own threads. General principle, by the way: The operations that currently exist are the fundamental primitives, and you're asking for higher-level operations to be made available. That might be a good summary for the proposal. (For example, renaming one thing to another is a primitive, but copying a file generally means opening both names, reading and writing, and then closing.) A few specifics:
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including:
* symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile
I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one.
I don't think it's so very strange (see above about primitive vs high level), but it does seem a reasonable enhancement. (It'd need the same caveats as on shutil.copy.)
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
Absolutely agree, but not for the same reason: pruning a branch off a directory tree is VERY easy to naively get wrong, and shutil.rmtree has a lot of code in it to protect itself.
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
That does seem a strange omission. If the other proposals get bogged down in controversy, spin this one off as its own thread, as I think it shouldn't be difficult to add it. It might be worth looking at this as "making shutil support Path objects", and then have the Path objects grow methods that delegate to shutil. That'd avoid duplicating logic eg for rmtree and copyfile. ChrisA
On Sun, Nov 22, 2020 at 5:46 PM Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Nov 23, 2020 at 6:54 AM Todd <toddrjen@gmail.com> wrote:
I know enhancements to pathlib gets brought up occasionally, but it
doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
Keep 'em in one thread for now, but if any of them become too controversial, it's probably worth narrowing the scope and spinning off the debatable ones in their own threads.
General principle, by the way: The operations that currently exist are the fundamental primitives, and you're asking for higher-level operations to be made available. That might be a good summary for the proposal. (For example, renaming one thing to another is a primitive, but copying a file generally means opening both names, reading and writing, and then closing.)
I think even that is debatable. I would say "read_text", "read_bytes", "write_text", and "write_bytes" are higher-level operations on top of "open" in much the same way "copy" is. "glob" and "rglob" are also higher-level operations on top of iterdirs. And as far as I can see that only really applies to "copy". "user" and "group" are really higher-level routines on top of the primitive "gid" and "uid", and the rest are meant to be counterparts of operations that already exist.
A few specifics:
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including:
* symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile
I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one.
I don't think it's so very strange (see above about primitive vs high level), but it does seem a reasonable enhancement. (It'd need the same caveats as on shutil.copy.)
As I said, I don't think this is any less primitive than "read_text", "read_bytes", "write_text", "write_bytes", "glob", or "rglob".
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
Absolutely agree, but not for the same reason: pruning a branch off a directory tree is VERY easy to naively get wrong, and shutil.rmtree has a lot of code in it to protect itself.
Another good point. The question is whether it should be its own method or an argument.
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
That does seem a strange omission. If the other proposals get bogged down in controversy, spin this one off as its own thread, as I think it shouldn't be difficult to add it.
Sure.
It might be worth looking at this as "making shutil support Path objects", and then have the Path objects grow methods that delegate to shutil. That'd avoid duplicating logic eg for rmtree and copyfile.
shutil already supports Path objects. And yes, I was planning to delegate the logic to existing functions there or in "os".
On Mon, Nov 23, 2020 at 11:37 AM Todd <toddrjen@gmail.com> wrote:
It might be worth looking at this as "making shutil support Path objects", and then have the Path objects grow methods that delegate to shutil. That'd avoid duplicating logic eg for rmtree and copyfile.
shutil already supports Path objects. And yes, I was planning to delegate the logic to existing functions there or in "os".
Oh, okay. That should make this a lot easier then. +1. I wasn't sure. ChrisA
I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
For Path.mkdir, exist_ok=True inhibits an error if a directory already exists. You're proposing that for Path.is_dir, exist_ok=True should inhibit an error if the directory does not exist. A parameter to enable that behavior sounds reasonable to me, but it definitely shouldn't have the name "exist_ok"; it does the opposite of what exist_ok does.
On Sun, Nov 22, 2020 at 9:49 PM Matt Wozniski <godlygeek@gmail.com> wrote:
I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
For Path.mkdir, exist_ok=True inhibits an error if a directory already exists. You're proposing that for Path.is_dir, exist_ok=True should inhibit an error if the directory does not exist.
A parameter to enable that behavior sounds reasonable to me, but it definitely shouldn't have the name "exist_ok"; it does the opposite of what exist_ok does.
Good point, perhaps "missing_ok" then.
Does anyone have any further thoughts on these? Should I split these into separate threads? On Sun, Nov 22, 2020 at 10:05 PM Todd <toddrjen@gmail.com> wrote:
On Sun, Nov 22, 2020 at 9:49 PM Matt Wozniski <godlygeek@gmail.com> wrote:
I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
For Path.mkdir, exist_ok=True inhibits an error if a directory already exists. You're proposing that for Path.is_dir, exist_ok=True should inhibit an error if the directory does not exist.
A parameter to enable that behavior sounds reasonable to me, but it definitely shouldn't have the name "exist_ok"; it does the opposite of what exist_ok does.
Good point, perhaps "missing_ok" then.
Hi Todd, my comments below. Also would offer my time for reviewing/testing if wanted. On 22.11.20 20:53, Todd wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
I really would appreciate that one. If I could through in another detail which we needed a lot: - atomic_copy or copy(atomic=True) whatever form you prefer It is not as easy to achieve as it may look on the first sight. Especially when it comes to tempfiles and permissions. The use cases of atomic copy included scenarios for multiple parallel access of files like caches in web development.
A common objection is that pathlib doesn't work on multiple paths. But that isn't the case. There are a ton of methods that do that, including:
* symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one.
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
Importing shutil does not seem to be a big deal but I agree that it's somehow weird to be missing. Correct me if I'm wrong, but os.path somehow is closer to OS-level operations whereas shutil basically provides all the missing convenience features that sh provided. So, to me it boils down to the question if pathlib is a completely new paradigm. If so, then sure let's add it. Additionally, I like the "batteries included" theme of Python. Last but not least, I tend more towards the "rmtree" method just to make it crystal clear to everyone. Maybe docs could cross-refer both methods. Tree manipulations are inherently complicated and a lot can go wrong. Symmetry is not 100% given as you might delete more than what you've created (which was a single node path).
3. newLine for write_text
This is the only relevant option that "Path.open" has but "Path.write_text" doesn't, and is a serious omission when dealing with multiple operating systems.
+1
4. uid and gid
You can get the owner and group name of a file (with the "owner" and "group" methods), but there is no easy way to get the corresponding number.
+1
5. Stem with no suffixes
The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them.
+1 Does anybody rely of this behavior of ".stem"? It always seemed odd to me but that might be because of the use-cases I work with. So, another possibility would be to fix "stem" to do what makes sense. Maybe also a renaming the concept "suffix" to "final_suffix" (also more concurrent to what docs says: "The file extension of the final component, if any:"). To me that has always been the weirdest conceptual behavior of the lib. Not sure if that's possible to fix before people need time machines.
6. with_suffixes
Equivalent to with_suffix, but replacing all suffixes. Again, this is a symmetry issue. It is hard to manipulate all the suffixes right now, as the example show. You can add them or extract them, but not change them without doing several steps.
+1 Same comment like for basestem.
7. exist_ok for is_* methods
Currently all the is_* methods (such as is_file) return False if the file doesn't exist or if it is a broken symlink. This can be dangerous, since it is not trivially easy to tell if you are dealing with the wrong type of file vs. a missing file. And it isn't obvious behavior just from the method name. I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
+1 Maybe missing_ok could help more to make people understand what the parameter actually does. exist_ok is used for creation methods (mkdir and touch). So, the name makes more sense in these context. Best Sven
Hi Sven, Thanks for your support and feedback. On Thu, Dec 31, 2020, 07:23 Sven R. Kunze <srkunze@mail.de> wrote:
Hi Todd,
my comments below. Also would offer my time for reviewing/testing if wanted.
On 22.11.20 20:53, Todd wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
I really would appreciate that one. If I could through in another detail which we needed a lot:
- atomic_copy or copy(atomic=True) whatever form you prefer
It is not as easy to achieve as it may look on the first sight. Especially when it comes to tempfiles and permissions. The use cases of atomic copy included scenarios for multiple parallel access of files like caches in web development.
Is there already support for atomic writes in the standard library? I am not planning on implementing anything new, only exposing existing functionality. Adding atomic operations to the stslib would likely require a pep and substantial discussion of API and implementation. I don't really have the background to do that. A common objection is that pathlib doesn't work on multiple paths. But
that isn't the case. There are a ton of methods that do that, including:
* symlink_to * link_to * rename * replace * glob * rglob * iterdir * is_relative_to * relative_to * samefile
I think this is really the only common file operation that someone would need to switch to a different module to do, and it seems pretty strange to me to be able to make symbolic or hard links to a file but not straight up copy one.
2. recursive remove
This could be a "recursive" option to "rmdir" or a "rmtree" method (I prefer the option). The main reason for this is symmetry. It is possible to create a tree of folders (using "mkdir(parents=True)"), but once you do that you cannot remove it again in a straightforward way.
Importing shutil does not seem to be a big deal but I agree that it's somehow weird to be missing.
Correct me if I'm wrong, but os.path somehow is closer to OS-level operations whereas shutil basically provides all the missing convenience features that sh provided.
So, to me it boils down to the question if pathlib is a completely new paradigm. If so, then sure let's add it. Additionally, I like the "batteries included" theme of Python.
Pathlib already has a number of higher-level operations besides what is in os, Last but not least, I tend more towards the "rmtree" method just to make it
crystal clear to everyone. Maybe docs could cross-refer both methods. Tree manipulations are inherently complicated and a lot can go wrong. Symmetry is not 100% given as you might delete more than what you've created (which was a single node path).
We already have tree removal functionality that this can use internally. As for the name, one thing to consider is that making a recursive tree uses an argument. And I think the argument would need to be keyword-only to avoid accidentally invoking it.
5. Stem with no suffixes
The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them.
+1
Does anybody rely of this behavior of ".stem"? It always seemed odd to me but that might be because of the use-cases I work with.
So, another possibility would be to fix "stem" to do what makes sense.
This is a backwards compatibility break and I don't want to get into the complications of doing that. There is really no benefit to breaking backwards compatibility. I would strongly suspect renaming a method then making a new, completely different method with the same name is not going to happen. The burden is just too high relative to the benefits.
7. exist_ok for is_* methods
Currently all the is_* methods (such as is_file) return False if the file doesn't exist or if it is a broken symlink. This can be dangerous, since it is not trivially easy to tell if you are dealing with the wrong type of file vs. a missing file. And it isn't obvious behavior just from the method name. I suggest adding an "exist_ok" argument to all of these, with the default being "True" for backwards-compatibility. This argument name is already in use elsewhere in pathlib. If this is False and the file is not present, a "FileNotFoundError" is raised.
+1
Maybe missing_ok could help more to make people understand what the parameter actually does.
exist_ok is used for creation methods (mkdir and touch). So, the name makes more sense in these context.
Yes, you are right. Someone else pointed out this issue too.
I split my answers up to address different issues in different threads. On 31.12.20 15:32, Todd wrote:
Hi Sven,
Thanks for your support and feedback.
On Thu, Dec 31, 2020, 07:23 Sven R. Kunze <srkunze@mail.de <mailto:srkunze@mail.de>> wrote:
Hi Todd,
my comments below. Also would offer my time for reviewing/testing if wanted.
On 22.11.20 20:53, Todd wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
I really would appreciate that one. If I could through in another detail which we needed a lot:
- atomic_copy or copy(atomic=True) whatever form you prefer
It is not as easy to achieve as it may look on the first sight. Especially when it comes to tempfiles and permissions. The use cases of atomic copy included scenarios for multiple parallel access of files like caches in web development.
Is there already support for atomic writes in the standard library? I am not planning on implementing anything new, only exposing existing functionality. Adding atomic operations to the stslib would likely require a pep and substantial discussion of API and implementation. I don't really have the background to do that.
So far I didn't find any of this implemented in the stdlib but please correct me if I am wrong. As far as I know, one working pattern would be 1. creating a file or the directory structure using tempfile 2. then setting permissions from the original directory object 3. and finally moving it to its final destination (path and name) The last part is done atomically at least in Linux (rename) and Windows (ReplaceFile). Especially setting permission an easy oversight which can cause issues e.g. with xsendfile. What would be the steps to do get it done? Best Sven
On Thu, Jan 7, 2021, 10:54 Sven R. Kunze <srkunze@mail.de> wrote:
I split my answers up to address different issues in different threads.
On 31.12.20 15:32, Todd wrote:
Hi Sven,
Thanks for your support and feedback.
On Thu, Dec 31, 2020, 07:23 Sven R. Kunze <srkunze@mail.de> wrote:
Hi Todd,
my comments below. Also would offer my time for reviewing/testing if wanted.
On 22.11.20 20:53, Todd wrote:
I know enhancements to pathlib gets brought up occasionally, but it doesn't look like anyone has been willing to take the initiative and see things through to completion. I am willing to keep the ball rolling here and even implement these myself. I have some suggestions and I would like to discuss them. I don't think any of them are significant enough to require a pep. These can be split it into independent threads if anyone prefers.
1. copy
The big one people keep bringing up that I strongly agree on is a "copy" method. This is really the only common file manipulation task that currently isn't possible. You can make files, read them, move them, delete them, create directories, even do less common operations like change owners or create symlinks or hard links.
I really would appreciate that one. If I could through in another detail which we needed a lot:
- atomic_copy or copy(atomic=True) whatever form you prefer
It is not as easy to achieve as it may look on the first sight. Especially when it comes to tempfiles and permissions. The use cases of atomic copy included scenarios for multiple parallel access of files like caches in web development.
Is there already support for atomic writes in the standard library? I am not planning on implementing anything new, only exposing existing functionality. Adding atomic operations to the stslib would likely require a pep and substantial discussion of API and implementation. I don't really have the background to do that.
So far I didn't find any of this implemented in the stdlib but please correct me if I am wrong.
As far as I know, one working pattern would be
1. creating a file or the directory structure using tempfile 2. then setting permissions from the original directory object 3. and finally moving it to its final destination (path and name)
The last part is done atomically at least in Linux (rename) and Windows (ReplaceFile).
Especially setting permission an easy oversight which can cause issues e.g. with xsendfile.
What would be the steps to do get it done?
This is outside my area of expertise and is far outside the scope of my proposal, so it really needs its own thread. I think there have periodically been requests for atomic file operations on this mailing list, so the first step would be to search the mailing list and see what prevented those from going anywhere.
Last but not least, I tend more towards the "rmtree" method just to make it crystal clear to everyone. Maybe docs could cross-refer both methods. Tree manipulations are inherently complicated and a lot can go wrong. Symmetry is not 100% given as you might delete more than what you've created (which was a single node path).
We already have tree removal functionality that this can use internally.
I am not much concerned about the internals; shutil.rmtree should work fine here. I am more concerned with the external interface (see also root/base/stem) and its impression on developers.
As for the name, one thing to consider is that making a recursive tree uses an argument.
And I think the argument would need to be keyword-only to avoid accidentally invoking it.
Why not adding a new method? I am still not convinced from a safety perspective that adding a new meaning-changing argument to a removal function is such a good idea. Just consider the following example. When deleting /my/soon/to/be/deleted/dir, you DO NOT delete a simple directory as the method name "rmdir" would suggest: - /my/soon/to/be/deleted/dir instead you delete something like this: - /my/soon/to/be/deleted/dir/d1/f1 - /my/soon/to/be/deleted/dir/d1/d3/f1 - /my/soon/to/be/deleted/dir/d1/d3/f2 - /my/soon/to/be/deleted/dir/d1/d3 - /my/soon/to/be/deleted/dir/d1/f2 - /my/soon/to/be/deleted/dir/d1/f3 - /my/soon/to/be/deleted/dir/d1 - /my/soon/to/be/deleted/dir/d2/f1 - /my/soon/to/be/deleted/dir/d2/f2 - /my/soon/to/be/deleted/dir/d2 - /my/soon/to/be/deleted/dir That is a completely different beast (a complete tree) and there is no way back once deleted. And there are a couple of other reasons when I look as the interface of shutil.rmtree and what "recursively" really means for pathlib. Best Sven
On 31.12.20 15:32, Todd wrote:
5. Stem with no suffixes
The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them.
+1
Does anybody rely of this behavior of ".stem"? It always seemed odd to me but that might be because of the use-cases I work with.
So, another possibility would be to fix "stem" to do what makes sense.
This is a backwards compatibility break and I don't want to get into the complications of doing that. There is really no benefit to breaking backwards compatibility. I would strongly suspect renaming a method then making a new, completely different method with the same name is not going to happen.
I disagree that breaking compatibility has no benefit. The benefit is always long-term but I understand that you don't want to go through deprecating and re-adding the same name. One concern I have is that "rootsteam" or "basestem" is not really a well-defined concept (as was "stem" when it was added - at least the multiple suffix concept was there but has little influence on the naming of the stem concept). We already have concepts like: basename << rightmost part of a path root << toplevel node of a tree; also filesystem
The burden is just too high relative to the benefits.
What exactly is the burden? Best Sven
Todd wrote: I'm in favor of most of these additions. I was a heavy user of path.py and I'm missing those "advanced" features in pathlib.
Stem with no suffixes
The stem property only takes off the last suffix, but even in the example given ('my/library.tar.gz') it isn't really useful because the suffix has two parts ('.tar' and '.gz'). I suggest another property, probably called "rootstem" or "basestem", that takes off all the suffixes, using the same logic as the "suffixes" property. This is another symmetry issue: it is possible to extract all the suffixes, but not remove them.
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ? It's pretty common to have dots in filenames instead of blanks for example, and stem does the right thing here : '/data/my.little.file.txt'. There is also the case of hidden files on Linux, what do you expect for /home/toto/.program.cfg ?
Joseph Martinot-Lagarde writes:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ?
Not really. stem.ext -> stem.ext.zzz where zzz is a compression extension is a pretty common naming convention. For me ext == 'tar' is by far the most common case (74%), 'tis true, but 'patch' (10%), 'txt' (6%), 'tab', 'gml', 'xml', 'svg', 'pdf', 'ps', ' dvi', 'diff', 'pdb', 'cpp', 'el', and 'data' also exist somewhere under $HOME. I'll bet others show up if I search /usr, /var, and /opt.
On Sun, Jan 10, 2021 at 4:51 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Joseph Martinot-Lagarde writes:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ?
Not really. stem.ext -> stem.ext.zzz where zzz is a compression extension is a pretty common naming convention. For me ext == 'tar' is by far the most common case (74%), 'tis true, but 'patch' (10%), 'txt' (6%), 'tab', 'gml', 'xml', 'svg', 'pdf', 'ps', ' dvi', 'diff', 'pdb', 'cpp', 'el', and 'data' also exist somewhere under $HOME. I'll bet others show up if I search /usr, /var, and /opt.
Yep, and most of my man pages are compressed, so there's usr/share/man/man1/*.1.gz and friends. I'd say the most common case with multiple extensions is indeed precisely two, where the first one is the type of file (or in the case of man pages, the section), and the second is a compression format. But there'll be less common cases too. ChrisA
On 2021-01-10 at 05:03:08 +1100, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Jan 10, 2021 at 4:51 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Joseph Martinot-Lagarde writes:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ?
Not really. stem.ext -> stem.ext.zzz where zzz is a compression extension is a pretty common naming convention. For me ext == 'tar' is by far the most common case (74%), 'tis true, but 'patch' (10%), 'txt' (6%), 'tab', 'gml', 'xml', 'svg', 'pdf', 'ps', ' dvi', 'diff', 'pdb', 'cpp', 'el', and 'data' also exist somewhere under $HOME. I'll bet others show up if I search /usr, /var, and /opt.
Yep, and most of my man pages are compressed, so there's usr/share/man/man1/*.1.gz and friends.
I'd say the most common case with multiple extensions is indeed precisely two, where the first one is the type of file (or in the case of man pages, the section), and the second is a compression format. But there'll be less common cases too.
I also have a pile of whatever-x.y.z.* files, where the * is some kind of compression extension and x.y.z is a major.minor.patch identifier. Most of the time, my brain is big enough to spot where x.y.z ends and the extension(s) begin(s), but throw in a version identifier like 4.3.beta, and all bets are off (unless I happen to know exactly what to look for, in which case I wouldn't bother with a general purpose library function that might make the wrong assumption).
On my system: % find ~ -name '*.*.*' | rev | cut -d. -f-2 | rev | sort | uniq -c | sort -nr | head -30 17278 d.ts 11314 js.map 6600 symbolic.png 4041 png.i 3968 cpython-37.pyc 2656 yarn-metadata.json 2614 yarn-tarball.tgz 2575 c.i 2526 csv.gz 1727 h.i 1659 opt-1.pyc 1590 opt-2.pyc 1302 autogen.js 1151 ts.map 1148 js.flow 854 svg.i 852 min.js 744 test.js 651 travis.yml 560 gif.i 522 so.0 403 indexeddb.leveldb 384 pom.sha1 368 ref.css 367 0.0 357 so.1 311 event.jsonlz4 283 xpm.i 278 ref.ui 275 am.i Most of those I honestly have no idea what they are. That's just starting from $HOME. System wide, who knows. On Sat, Jan 9, 2021 at 7:27 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 2021-01-10 at 05:03:08 +1100, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Jan 10, 2021 at 4:51 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Joseph Martinot-Lagarde writes:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ?
Not really. stem.ext -> stem.ext.zzz where zzz is a compression extension is a pretty common naming convention. For me ext == 'tar' is by far the most common case (74%), 'tis true, but 'patch' (10%), 'txt' (6%), 'tab', 'gml', 'xml', 'svg', 'pdf', 'ps', ' dvi', 'diff', 'pdb', 'cpp', 'el', and 'data' also exist somewhere under $HOME. I'll bet others show up if I search /usr, /var, and /opt.
Yep, and most of my man pages are compressed, so there's usr/share/man/man1/*.1.gz and friends.
I'd say the most common case with multiple extensions is indeed precisely two, where the first one is the type of file (or in the case of man pages, the section), and the second is a compression format. But there'll be less common cases too.
I also have a pile of whatever-x.y.z.* files, where the * is some kind of compression extension and x.y.z is a major.minor.patch identifier.
Most of the time, my brain is big enough to spot where x.y.z ends and the extension(s) begin(s), but throw in a version identifier like 4.3.beta, and all bets are off (unless I happen to know exactly what to look for, in which case I wouldn't bother with a general purpose library function that might make the wrong assumption). _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WPDXKR... Code of Conduct: http://python.org/psf/codeofconduct/
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
For my entire filesystem: 124920 cpython-38.pyc 50034 html.gz 31158 cpython-39.pyc 31032 d.ts 30415 cpython-37.pyc 21473 cpython-36.pyc 19000 js.map 9888 symbolic.png 5086 cpython-35.pyc 5004 1.gz 4657 cpython-38-x86_64-linux-gnu.so 4261 pypy36.pyc 4152 Debian.gz 4041 png.i 3534 cpython-33.pyc 3421 cpython-34.pyc 2950 min.js 2880 cpython-34.pyo 2668 unix.ip 2668 unix.gid 2668 rpcsec.init 2668 rpcsec.context 2656 yarn-metadata.json 2615 csv.gz 2614 yarn-tarball.tgz 2575 c.i 2442 3.gz 2202 tar.bz2 2128 so.0 2124 ts.map On Sat, Jan 9, 2021 at 9:37 PM David Mertz <mertz@gnosis.cx> wrote:
On my system:
% find ~ -name '*.*.*' | rev | cut -d. -f-2 | rev | sort | uniq -c | sort -nr | head -30 17278 d.ts 11314 js.map 6600 symbolic.png 4041 png.i 3968 cpython-37.pyc 2656 yarn-metadata.json 2614 yarn-tarball.tgz 2575 c.i 2526 csv.gz 1727 h.i 1659 opt-1.pyc 1590 opt-2.pyc 1302 autogen.js 1151 ts.map 1148 js.flow 854 svg.i 852 min.js 744 test.js 651 travis.yml 560 gif.i 522 so.0 403 indexeddb.leveldb 384 pom.sha1 368 ref.css 367 0.0 357 so.1 311 event.jsonlz4 283 xpm.i 278 ref.ui 275 am.i
Most of those I honestly have no idea what they are. That's just starting from $HOME. System wide, who knows.
On Sat, Jan 9, 2021 at 7:27 PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
On 2021-01-10 at 05:03:08 +1100, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Jan 10, 2021 at 4:51 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Joseph Martinot-Lagarde writes:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ?
Not really. stem.ext -> stem.ext.zzz where zzz is a compression extension is a pretty common naming convention. For me ext == 'tar' is by far the most common case (74%), 'tis true, but 'patch' (10%), 'txt' (6%), 'tab', 'gml', 'xml', 'svg', 'pdf', 'ps', ' dvi', 'diff', 'pdb', 'cpp', 'el', and 'data' also exist somewhere under $HOME. I'll bet others show up if I search /usr, /var, and /opt.
Yep, and most of my man pages are compressed, so there's usr/share/man/man1/*.1.gz and friends.
I'd say the most common case with multiple extensions is indeed precisely two, where the first one is the type of file (or in the case of man pages, the section), and the second is a compression format. But there'll be less common cases too.
I also have a pile of whatever-x.y.z.* files, where the * is some kind of compression extension and x.y.z is a major.minor.patch identifier.
Most of the time, my brain is big enough to spot where x.y.z ends and the extension(s) begin(s), but throw in a version identifier like 4.3.beta, and all bets are off (unless I happen to know exactly what to look for, in which case I wouldn't bother with a general purpose library function that might make the wrong assumption). _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WPDXKR... Code of Conduct: http://python.org/psf/codeofconduct/
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
On Fri, Jan 8, 2021, at 15:47, Joseph Martinot-Lagarde wrote:
One remark about this : .tar.gz files are the exception rather than the rule, and AFAIK maybe the only one ? It's pretty common to have dots in filenames instead of blanks for example, and stem does the right thing here : '/data/my.little.file.txt'. There is also the case of hidden files on Linux, what do you expect for /home/toto/.program.cfg ?
Hidden files are already treated specially - a file called simply ".whatever" is considered to be the stem, not the suffix.
participants (10)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Abdulla Al Kathiri
-
Chris Angelico
-
David Mertz
-
Joseph Martinot-Lagarde
-
Matt Wozniski
-
Random832
-
Stephen J. Turnbull
-
Sven R. Kunze
-
Todd