Rewriting file - pythonic way

Hi all, I am new in python (i am moving from Perl world), but I always love Python for hight level, beatuful and clean syntax. Now I have question/idea about working with files. On mine opinion it very popular use case: 1. Open file (for read and write) 2. Read data from file 3. Modify data. 4. Rewrite file by modified data. But now it is looks not so pythonic: with open(filename, 'r+') as file: data = file.read() data = data.replace('old', 'new') file.seek(0) file.write(data) file.truncate() or something like this with open(filename) as file: data = file.read() data = data.replace('old', 'new') with open(filename) as file: file.write(data) I think best way is something like this with open(filename, 'r+') as file: data = file.read() data = data.replace('old', 'new') file.rewrite(data) but for this io.BufferedIOBase must contain rewrite method what you think about this?

15.04.18 11:57, Alexey Shrub пише:
What do you mean by calling this not pythonic?
If the problem is that you want to use a single line instead of three line, you can add a function: def file_rewrite(file, data): file.seek(0) file.write(data) file.truncate() and use it. This looks pretty pythonic to me.

В Воскресенье, 15 апр. 2018 в 12:40 , Serhiy Storchaka <storchaka@gmail.com> написал:
If the problem is that you want to use a single line instead of three line, you can add a function
Yes, I think that single line with word 'rewrite' is much more readable than those three lines. And yes, I can make my own function, but it is typical task - maybe it must be in standard library?

15.04.18 12:49, Alexey Shrub пише:
Not every three lines of code must be a function in standard library. And these three lines don't look enough common. Actually the reliable code should write into a separate file and replace the original file by the new file only if writing is successful. Or backup the old file and restore it if writing is failed. Or do both. And handle hard and soft links if necessary. And use file locks if needed to prevent race condition when read/write by different processes. Depending on the specific of the application you may need different code. Your three lines are enough for a one-time script if the risk of a powerful blackout or disk space exhaustion is insignificant or if the data is not critical.

On 15 April 2018 at 11:22, Elazar <elazarg@gmail.com> wrote:
It certainly sounds like a good reason for someone to write a "safe file rewrite" library function. But I don't think that it's such a common need that it needs to be a stdlib function. It may well even be the case that there's such a function already available on PyPI - has anyone actually checked? And if there isn't, then writing module and publishing it there would seem like a *very* good starting point - as well as allowing the developer to thrash out the best API, it would also provide for lots of testing in unusual scenarios that the developer may not have thought about (Windows file locking is very different from Unix, what is an atomic operation differs between platforms, error handling and retries may be something to consider, etc). The result would be a useful package, and the download and activity stats for it would be a great indication of whether it's a frequent enough need to justify including in core Python. IMO, it probably isn't. I suspect that most uses would be fine with the quoted 3-liner, but very few people would need the sort of robustness that Serhiy is describing (and that level of robustness *would* be needed for a stdlib implementation). So PyPI is likely a better home for the "bulletproof" version, and 3 lines of code is a perfectly acceptable and Pythonic solution for people with simpler needs. Paul

On 15 April 2018 at 20:47, Paul Moore <p.f.moore@gmail.com> wrote:
There wasn't last time I checked (which admittedly was several years ago now). The issue is that it's painfully difficult to write a robust cross-platform "atomic rewrite" operation that can cleanly handle a wide range of arbitrary use cases - instead, folks are more likely to write simpler alternatives that work well enough given whichever simplifying assumptions are applicable to their use case (which may even include "I don't care about atomicity, and am quite happy to let a poorly timed Ctrl-C or unexpected system shutdown corrupt the file I'm rewriting"). https://bugs.python.org/issue8604#msg174104 is the relevant tracker discussion (deliberately linking into the middle of it, since the early part is akin to this thread: reactions mostly along the lines of "that's easy, and doesn't need to be in the standard library". It definitely *isn't* easy, but it's also challenging to publish on PyPI, since it's a quagmire of platform specific complexity and edge cases, if you mess it up you can cause significant data loss, and anyone that already knows they need atomic rewrites is likely to be able to come up with their own purpose specific implementation in less time than it would take them to assess the suitability of 3rd party alternatives). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

В Воскресенье, 15 апр. 2018 в 2:40 , Nick Coghlan <ncoghlan@gmail.com> написал:
https://bugs.python.org/issue8604#msg174104 is the relevant tracker discussion
Thanks all, I agree that universal and absolutly safe solution is very difficult, but for experiment I made some draft https://github.com/worldmind/scripts/tree/master/filerewrite main code here https://github.com/worldmind/scripts/blob/master/filerewrite/filerewrite.py#...

On Sun, Apr 15, 2018 at 05:15:55PM +0300, Alexey Shrub <ashrub@yandex.ru> wrote:
Good!
main code here https://github.com/worldmind/scripts/blob/master/filerewrite/filerewrite.py#...
Can I recommend to catch exceptions in `backuper.backup()`, cleanup backuper and unlock locker? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Depending on how firm your requirements around locking are, you may find this code useful: https://github.com/mahmoud/boltons/blob/6b0721b6aeda6d3ec6f5d31be7c741bc7fcc... (docs here: http://boltons.readthedocs.io/en/latest/fileutils.html#atomic-file-saving ) Basically every operating system has _some_ way of doing an atomic file replacement, letting us guarantee that a file at a given location is always valid. atomic_save provides a unified interface to that cross-platform behavior. The code does not do locking, as neither I nor its other users have wanted it, but I'd be happy to extend it if there's a sensible default. On Sun, Apr 15, 2018 at 8:19 AM, Oleg Broytman <phd@phdru.name> wrote:

On Sun, Apr 15, 2018 at 09:10:57AM -0700, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
I don't like it renames the file at the end. Renaming could lead to changed file ownership and permissions; restoring permissions is not always possible, restoring ownership is almost never possible. Renaming is also not always possible due to restricted directory permissions.
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

В Воскресенье, 15 апр. 2018 в 6:19 , Oleg Broytman <phd@phdru.name> написал:
Can I recommend to catch exceptions in `backuper.backup()`, cleanup backuper and unlock locker?
Yes, thanks, I move .backup() to try, about other exception I think that it must be catched outside, because this module don't know that to do with such problems

Hi, some similar thing already exist in standard: https://docs.python.org/3/library/fileinput.html fileinput(... inplace=True...) BR, George 2018-04-15 10:57 GMT+02:00 Alexey Shrub <ashrub@yandex.ru>:

В Воскресенье, 15 апр. 2018 в 10:47 , George Fischhof <george@fischhof.hu> написал:
Thanks, it works https://github.com/worldmind/scripts/blob/master/filerewrite/fileinputtest.p... but looks like that way only for line by line processing

В Воскресенье, 15 апр. 2018 в 10:47 , George Fischhof <george@fischhof.hu> написал:
https://pypi.python.org/pypi/in-place looks not bad too

В Понедельник, 16 апр. 2018 в 2:48 , Alexey Shrub <ashrub@yandex.ru> написал:
I like in_place module https://github.com/worldmind/scripts/blob/master/filerewrite/inplacetest.py it fix some strange features of fileinput module. Maybe in_place must be in standard library instead fileinput?


15.04.18 11:57, Alexey Shrub пише:
What do you mean by calling this not pythonic?
If the problem is that you want to use a single line instead of three line, you can add a function: def file_rewrite(file, data): file.seek(0) file.write(data) file.truncate() and use it. This looks pretty pythonic to me.

В Воскресенье, 15 апр. 2018 в 12:40 , Serhiy Storchaka <storchaka@gmail.com> написал:
If the problem is that you want to use a single line instead of three line, you can add a function
Yes, I think that single line with word 'rewrite' is much more readable than those three lines. And yes, I can make my own function, but it is typical task - maybe it must be in standard library?

15.04.18 12:49, Alexey Shrub пише:
Not every three lines of code must be a function in standard library. And these three lines don't look enough common. Actually the reliable code should write into a separate file and replace the original file by the new file only if writing is successful. Or backup the old file and restore it if writing is failed. Or do both. And handle hard and soft links if necessary. And use file locks if needed to prevent race condition when read/write by different processes. Depending on the specific of the application you may need different code. Your three lines are enough for a one-time script if the risk of a powerful blackout or disk space exhaustion is insignificant or if the data is not critical.

On 15 April 2018 at 11:22, Elazar <elazarg@gmail.com> wrote:
It certainly sounds like a good reason for someone to write a "safe file rewrite" library function. But I don't think that it's such a common need that it needs to be a stdlib function. It may well even be the case that there's such a function already available on PyPI - has anyone actually checked? And if there isn't, then writing module and publishing it there would seem like a *very* good starting point - as well as allowing the developer to thrash out the best API, it would also provide for lots of testing in unusual scenarios that the developer may not have thought about (Windows file locking is very different from Unix, what is an atomic operation differs between platforms, error handling and retries may be something to consider, etc). The result would be a useful package, and the download and activity stats for it would be a great indication of whether it's a frequent enough need to justify including in core Python. IMO, it probably isn't. I suspect that most uses would be fine with the quoted 3-liner, but very few people would need the sort of robustness that Serhiy is describing (and that level of robustness *would* be needed for a stdlib implementation). So PyPI is likely a better home for the "bulletproof" version, and 3 lines of code is a perfectly acceptable and Pythonic solution for people with simpler needs. Paul

On 15 April 2018 at 20:47, Paul Moore <p.f.moore@gmail.com> wrote:
There wasn't last time I checked (which admittedly was several years ago now). The issue is that it's painfully difficult to write a robust cross-platform "atomic rewrite" operation that can cleanly handle a wide range of arbitrary use cases - instead, folks are more likely to write simpler alternatives that work well enough given whichever simplifying assumptions are applicable to their use case (which may even include "I don't care about atomicity, and am quite happy to let a poorly timed Ctrl-C or unexpected system shutdown corrupt the file I'm rewriting"). https://bugs.python.org/issue8604#msg174104 is the relevant tracker discussion (deliberately linking into the middle of it, since the early part is akin to this thread: reactions mostly along the lines of "that's easy, and doesn't need to be in the standard library". It definitely *isn't* easy, but it's also challenging to publish on PyPI, since it's a quagmire of platform specific complexity and edge cases, if you mess it up you can cause significant data loss, and anyone that already knows they need atomic rewrites is likely to be able to come up with their own purpose specific implementation in less time than it would take them to assess the suitability of 3rd party alternatives). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

В Воскресенье, 15 апр. 2018 в 2:40 , Nick Coghlan <ncoghlan@gmail.com> написал:
https://bugs.python.org/issue8604#msg174104 is the relevant tracker discussion
Thanks all, I agree that universal and absolutly safe solution is very difficult, but for experiment I made some draft https://github.com/worldmind/scripts/tree/master/filerewrite main code here https://github.com/worldmind/scripts/blob/master/filerewrite/filerewrite.py#...

On Sun, Apr 15, 2018 at 05:15:55PM +0300, Alexey Shrub <ashrub@yandex.ru> wrote:
Good!
main code here https://github.com/worldmind/scripts/blob/master/filerewrite/filerewrite.py#...
Can I recommend to catch exceptions in `backuper.backup()`, cleanup backuper and unlock locker? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Depending on how firm your requirements around locking are, you may find this code useful: https://github.com/mahmoud/boltons/blob/6b0721b6aeda6d3ec6f5d31be7c741bc7fcc... (docs here: http://boltons.readthedocs.io/en/latest/fileutils.html#atomic-file-saving ) Basically every operating system has _some_ way of doing an atomic file replacement, letting us guarantee that a file at a given location is always valid. atomic_save provides a unified interface to that cross-platform behavior. The code does not do locking, as neither I nor its other users have wanted it, but I'd be happy to extend it if there's a sensible default. On Sun, Apr 15, 2018 at 8:19 AM, Oleg Broytman <phd@phdru.name> wrote:

On Sun, Apr 15, 2018 at 09:10:57AM -0700, Mahmoud Hashemi <mahmoud@hatnote.com> wrote:
I don't like it renames the file at the end. Renaming could lead to changed file ownership and permissions; restoring permissions is not always possible, restoring ownership is almost never possible. Renaming is also not always possible due to restricted directory permissions.
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

В Воскресенье, 15 апр. 2018 в 6:19 , Oleg Broytman <phd@phdru.name> написал:
Can I recommend to catch exceptions in `backuper.backup()`, cleanup backuper and unlock locker?
Yes, thanks, I move .backup() to try, about other exception I think that it must be catched outside, because this module don't know that to do with such problems

Hi, some similar thing already exist in standard: https://docs.python.org/3/library/fileinput.html fileinput(... inplace=True...) BR, George 2018-04-15 10:57 GMT+02:00 Alexey Shrub <ashrub@yandex.ru>:

В Воскресенье, 15 апр. 2018 в 10:47 , George Fischhof <george@fischhof.hu> написал:
Thanks, it works https://github.com/worldmind/scripts/blob/master/filerewrite/fileinputtest.p... but looks like that way only for line by line processing

В Воскресенье, 15 апр. 2018 в 10:47 , George Fischhof <george@fischhof.hu> написал:
https://pypi.python.org/pypi/in-place looks not bad too

В Понедельник, 16 апр. 2018 в 2:48 , Alexey Shrub <ashrub@yandex.ru> написал:
I like in_place module https://github.com/worldmind/scripts/blob/master/filerewrite/inplacetest.py it fix some strange features of fileinput module. Maybe in_place must be in standard library instead fileinput?
participants (8)
-
Alexey Shrub
-
Elazar
-
George Fischhof
-
Mahmoud Hashemi
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Serhiy Storchaka