[CentralOH] Automated Patches

Mon Oct 29 21:53:34 CET 2012

On Thu, 25 Oct 2012 15:48:16 -0700, Austin Godber <godber at gmail.com> wrote:

> I am not sure if I have fully grokked your problem, ... 

I am archiving data files as originally received. 
It is a requirement to preserve the original files 
as received, regardless of how good or bad they are. 
They are all compressed. There are many files. 
Some of them are around 1/2 Gigabyte. 
We have enough data that we have to move files that are not 
likely to be needed, from production servers, to off-line 
storage. 

Some of the data files have some bad content, 
and the corrections to the (uncompressed) content are tiny, 
so diffs are a nice way to keep track of corrections, 
using little extra storage while preserving the original 
(bad) files. 

Conventional version control systems don't seem to be a good 
fit. Many don't handle compressed files well. I don't know 
how well they would handle moving older data to off-line 
storage. 

I'm looking for a Pythonic way to automate the application of an 
individual patches, when they exist, without modfying, creating 
or renaming any files[2]. Pipes are good, whether ala Unix, 
or within Python[1]. 

Instead of using open(filename), I would call open_patched(filename)
which would look for both filename and filename + '.patch'. 
If both files exist, open_patched would return a file-like object 
that would yield the output of something like[3]

   cat $filename | patcheroo filename+'.patch'

otherwise open_patched() would work just like open(), 
returning a file-like object for just filename. 

[1] http://mail.python.org/pipermail/centraloh/2012-August/001369.html
    Thanks again Neil. 
[2] Of course, I could copy or write the uncompressed file in some 
    temporary directory, modify it, consume it, then delete it. 
    It'd be nice if to avoid having temporary files by using 
    pipes or pipe-like goodness instead. 
    The patch command might require temp files. 
[3] I use the ficticious patcheroo which uses stdin for 
    the unpatched data, and stdout for the patched data, 
    to avoid dealing with patch's grammar and behavior. 
[4] And now for something completely different. NATs are good. 
    http://www.youtube.com/watch?v=v26BAlfWBm8 
    Thanks Rick.