[Tutor] code review

Adam Gold awg1 at gmx.com
Wed Jun 11 12:43:33 CEST 2014


>>> Thanks for the reply Steven.  It's no more than 100 lines at a
>>> guess
> 
>> In that case just copy and paste it into a message and send it to
>> the group. Anyone with time available can then take a peek.
> 
> One way noobs anywhere can learn is by listening in to other people's
> conversations - it's called lurking, I believe.
> 
> So I would say, please do this on the list, and many more people than
> Adam may benefit. Others can ignore the thread if they wish.
> 
> Bob

Oke doke, here it is below.  Just for convenience's sake, I'm going to
repeat what the basic steps are.  It's a backup script for certain xen
virtual machines ("VM") running on my server.  Each VM runs on its own
logical volume (as opposed to a file-based loop device).  From my own
(bitter) experience, the absolutely best way to back up a VM running on
a logical volume is to clone it to an image file using dd.  I'm aware
that a separate discussion could be had around this (on a different
mailing list) but, unless someone thinks this is a horribly flawed
approach, it may be best to assume this approach is 'fine' so as not to
distract from the code review!!

Here are the steps:
1) create snapshots of the xen logical volumes using the built in
snapshot feature of LVM2 (this way I can backup each logical volume
without having to shut down the VM)
2) dd and bzip2 (using a pipe) the snapshots to .img.bz2 files for
storage on the same server
3) gpg encrypt the same files and upload them to Amazon s3
4) remove the logical volume snapshots (because they accumulate disk
space and I'm doing this daily) and the .gpg files
5) deletes files in the s3 directory which are older than X days

As I've mentioned, I'm a real noob, so I'm still mastering some basic
stuff.  The script works fine for my purposes, I'm keen to understand
where it could be improved from a python pov.  Finally, yes I could have
written this in bash but I prefer python!

P.S. I think some of the comments have been wrapped onto more than one
line by my email client, I hope this doesn't cause too much inconvenience.
====================================

#!/usr/bin/python3

############################################
## XEN VIRTUAL MACHINE BACKUP SCRIPT
##
## Copyright (C) 2014 Adam Gold
##


## This program is free software: you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or (at
## your option) any later version.
##
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
## the GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program. If not see <http://gnu.org/licenses/>
##

## Version: 0.4
## 2014-06-10

############################################

import datetime, time, subprocess, shlex, os, gnupg, glob, shutil

# logical volumes exist in two different volume groups, vgxen and vg_data
# hence two lists of vms
vgxenList = ['vm1', 'vm2', 'vm3', 'vm4', 'vm5', 'vm6' ]
vg_dataList = ['vm1', 'vm2']
backupList = [ ]
snapNameList = [ ]


# create snapshot names like the following: 2014-06-10T01-00-01.vm1.img.bz2
for i in vgxenList:
    DATE = datetime.datetime.now().strftime("%Y-%m-%d" + "T" + "%H-%M-%S")
    vgxenName = "/dev/vgxen/"
    lvName = i
    origName = vgxenName + lvName
    snapName= DATE + "." + lvName
    snapNameList.append(snapName)
    backupList.append(vgxenName + snapName)
    subprocess.call(['lvcreate', '-s', '-L1G', origName, '-n', snapName])


for h in vg_dataList:
    DATE = datetime.datetime.now().strftime("%Y-%m-%d" + "T" +  "%H-%M-%S")
    vg_dataName = "/dev/vg_data/"
    lvName = h
    origName = vg_dataName + lvName
    snapName = DATE + "." + lvName
    snapNameList.append(snapName)
    backupList.append(vg_dataName + snapName)
    subprocess.call(['lvcreate', '-s', '-L1G', origName, '-n', snapName])


# backupPath is list of full paths of each snapshot
# the string is extacted from backupList using 'join'
backupPath = ' '.join(backupList)
for j, k in zip(backupList, snapNameList):
    backupPath = j
    backupSnapshot = k
# run dd and pipe to bz2 file using subprocess module
    ddIf = shlex.split("dd if=%s bs=4k conv=noerror,notrunc,sync" %
(backupPath))
    compress = "pbzip2"
    filename = "/home/files/temp/%s.img.bz2" % (backupSnapshot)
    p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE)
    with p1.stdout as fin, open(filename, "w") as fout:
        p2 = subprocess.Popen(compress, stdin=fin, stdout=fout)
    ret1 = p1.wait()
    ret2 = p2.wait()


# create list of files to be encrypted with full path names
# start with list of unencrypted files
cryptDir = '/home/files/temp/'
unencrypted = [u for u in os.listdir(cryptDir)]
# join absolute path to file names to create new list (list comprehension)
cryptDir_unencrypted = [ os.path.join(cryptDir, s) for s in unencrypted ]


# encrypt files
for G in cryptDir_unencrypted:
    gpg = gnupg.GPG(gnupghome='/root/.gnupg')
    phrase = "passphrase"  # HORRIBLE SECURITY, I KNOW!  The script is
running as a cronjob so I can't interactively enter the passphrase.
Suggestions are welcome.
    cipher = "AES256"
    with open(G, 'rb') as f:
        status = gpg.encrypt_file(f, None, armor=False,
passphrase=phrase, symmetric=cipher.upper(), output=G + '.gpg')


# move unencypted files out of temp directory
for data in glob.glob(cryptDir + '*.bz2'):
    shutil.move(data,'/home/files/')


# delete snapshots
for r in snapNameList:
    removeSnapshots1 = 'lvremove -f ' + vgxenName + r
    subprocess.call(shlex.split(removeSnapshots1))
    removeSnapshots2 = 'lvremove -f ' + vg_dataName + r
    subprocess.call(shlex.split(removeSnapshots2))


# create list of file names to be uploaded (list comprehension)
uploads = [y for y in os.listdir(cryptDir)]
# join absolute path to file names to create new list (list comprehension)
cryptDir_uploads = [ os.path.join(cryptDir, t) for t in uploads ]

# upload to Amazon s3
for d in cryptDir_uploads:
    s3Upload = 's3cmd put ' + d + ' s3://bucket/dir/'
    subprocess.call(shlex.split(s3Upload))
    subprocess.call(shlex.split('rm ' + d))


# move working path to list of unencrypted vm backup files
path = '/home/files/'
os.chdir(path)

# build list of unencrypted vm backup files (list comprehension)
fileNames = [ u for u in os.listdir(path) if os.path.isfile(u) ]

# build list of	unencrypted vm backup files with .gpg
# this will mirror the list of files in s3 (list comprehension)
fileNames_gpg = [ p + '.gpg' for p in fileNames ]

# NOTE: I tried to collapse the previous two list comprehensions into one
# as it would seem to be possible - the contents of fileNames_gpg should be
# an os.listdir() of the directory they are in.  For some reason I couldn't
# do this so I did the above hack by appending the .gpg extension to a
different list

# calculate age of unencrypted vm backup files and
# hence encrypted files on s3 (list comprehension)
# NOTE: I have to use the unencrypted files on the server
# as the gpg files get deleted after each upload to s3
ageList = [ round((time.time() - os.stat(n).st_mtime)/60/60/24) for n in
fileNames ]

# delete files older than 'age' variable; age of file in ageList compared
# against name in fileNames_gpg
age = 7
for x, y in zip(fileNames_gpg, ageList):
    if y > age:
        subprocess.call(shlex.split('s3cmd del s3://bucket/dir/' + x))


More information about the Tutor mailing list