Saving a file "in the background" -- How?
Akira Li
4kir4.1i at gmail.com
Fri Oct 31 08:07:42 EDT 2014
Virgil Stokes <vs at it.uu.se> writes:
> While running a python program I need to save some of the data that is
> being created. I would like to save the data to a file on a disk
> according to a periodical schedule (e.g. every 10
> minutes). Initially, the amount of data is small (< 1 MB) but after
> sometime the amount of data can be >10MB. If a problem occurs during
> data creation, then the user should be able to start over from the
> last successfully saved data.
>
> For my particular application, no other file is being saved and the
> data should always replace (not be appended to) the previous data
> saved. It is important that the data be saved without any obvious
> distraction to the user who is busy creating more data. That is, I
> would like to save the data "in the background".
>
> What is a good method to perform this task using Python 2.7.8 on a
> Win32 platform?
There are several requirements:
- save data asynchroniously -- "without any obvious distraction to the
user"
- save data durably -- avoid corrupting previously saved data or
writing only partial new data e.g., in case of a power failure
- do it periodically -- handle drift/overlap gracefully in a documented
way
A simple way to do asynchronios I/O on Python 2.7.8 on a Win32 platform
is to use threads:
t = threading.Thread(target=backup_periodically, kwargs=dict(period=600))
t.daemon = True # stop if the program exits
t.start()
where backup_periodically() backups data every period seconds:
import time
def backup_periodically(period, timer=time.time, sleep=time.sleep):
start = timer()
while True:
try:
backup()
except Exception: # log exceptions and continue
logging.exception()
# lock with the timer
sleep(period - (timer() - start) % period)
To avoid drift over time of backup times, the sleep is locked with the
timer using the modulo operation. If backup() takes longer than *period*
seconds (unlikely for 10MB per 10 minutes) then the step may be
skipped.
backup() makes sure that the data is saved and can be restore at any
time.
def backup():
with atomic_open('backup', 'w') as file:
file.write(get_data())
where atomic_open() [1] tries to overcome multiple issues with saving
data reliably:
- write to a temporary file so that the old data is always available
- rename the file when all new data is written, handle cases such as:
* "antivirus opens old file thus preventing me from replacing it"
either the operation succeeds and 'backup' contains new data or it fails
and 'backup' contains untouched ready-to-restore old data -- nothing in
between.
[1]: https://github.com/mitsuhiko/python-atomicfile/blob/master/atomicfile.py
I don't know how ready atomicfile.py but you should be aware of the
issues it is trying to solve if you want a reliable backup solution.
--
Akira
More information about the Python-list
mailing list