[CentralOH] Automated Patches

jep200404 at columbus.rr.com jep200404 at columbus.rr.com
Sun Nov 4 23:15:59 CET 2012


Brace yourself for some hurlingly bad code below. 

On Thu, 25 Oct 2012 18:26:32 -0400, jep200404 at columbus.rr.com wrote:

> Does a library exist for something like I describe below exist? 
> If so, I'd prefer to use it and avoid re-inventing the wheel 
> and NIH[1]. 
> 
> I'm dealing with a many input files, some of which have some bad 
> data. Coding one-off ad-hoc workarounds is bad for production. 
> I'm tempted to write a file-like class for reading the input 
> files and automatically finding and applying corresponding 
> patch files. The patch files would be UNIX patch files and 
> their names would be the input file name or sha1sum of the 
> input file, with '.patch' appended.
> 
> The generation of the patch files would be manual. 
> That hassle would not be eliminated, 
> but the application of them would be automated, 
> and the input files would be maintained in their original 
> state, so one wouldn't have to worry about whether one 
> had the original or corrected version of the input file. 

I wrote something that works, but the code is quite disgusting. 
What improvements can you make to hurl.py while avoiding 
temporary files and changes to existing data files? 
Partial improvements are welcome. 

from subprocess import Popen, PIPE

from zipfile import ZipFile
import fnmatch
import pipes

def autopatch(filename, inside_filename):
    patch_filename = '.'.join([filename, inside_filename, 'ed-patch'])
    # print 'patch_filename', repr(patch_filename)
    try:
        raw_patch_file = open(patch_filename, 'rU')
    except IOError:
        # No patch file exists, so just read inside_filename
        # print patch_filename, 'does not exist'
        patched_file = ZipFile(filename).open(inside_filename, 'rU')
    else:
        # print patch_filename, 'exists'
        patch = Popen(
            ['awk', '{print} END {print "1,$p"}'],
            stdin=raw_patch_file,
            stdout=PIPE,
            universal_newlines=True,).stdout
        patched_file = Popen(
            ['ed', '-s', '!unzip -p ' + pipes.quote(filename) + ' ' +
            pipes.quote(inside_filename)],
            stdin=patch,
            stdout=PIPE,
            stderr=open('/dev/null', 'w'),
            universal_newlines=True,).stdout
    return patched_file

filename='foo.zip'
if True:
        z = ZipFile(filename)
        for inside_filename in fnmatch.filter(z.namelist(), '*author.txt'):
            patched_file = autopatch(filename, inside_filename)
            for line in patched_file.readlines():
                print line,

With files from the attached file, foo.tgz, it should work as 
follows. 

[jep200404 at test ~]$ python hurl.py 
Aaron Aardvark
Boris Bouchet
Bjork (rhymes with jerk)
Calmus Chavet
[jep200404 at test ~]$ rm foo.zip.200707author.txt.ed-patch 
[jep200404 at test ~]$ python hurl.py 
Aaron Aardvark
Boris Bovchek
Calmus Chavet
[jep200404 at test ~]$ 

foo.zip.200707author.txt.ed-patch was generated by diff -e. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.tgz
Type: application/x-gzip
Size: 969 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/centraloh/attachments/20121104/e33d479b/attachment.bin>


More information about the CentralOH mailing list