[Tutor] code review

Thu Jun 12 01:38:49 CEST 2014

On 11/06/14 11:43, Adam Gold wrote:

> # create snapshot names like the following: 2014-06-10T01-00-01.vm1.img.bz2
> for i in vgxenList:
>      DATE = datetime.datetime.now().strftime("%Y-%m-%d" + "T" + "%H-%M-%S")

Why use addition? You could just insett the literal T...

DATE = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")

Also using all caps for the name suggests a constant by convention.

>      vgxenName = "/dev/vgxen/"
>      lvName = i

why not just say

for lvname in vxgenlist

and avoid the extra assignment?

>      origName = vgxenName + lvName
>      snapName= DATE + "." + lvName

Again you could have done that in the original strftime() call.

>      snapNameList.append(snapName)
>      backupList.append(vgxenName + snapName)
>      subprocess.call(['lvcreate', '-s', '-L1G', origName, '-n', snapName])
>
>
> for h in vg_dataList:
>      DATE = datetime.datetime.now().strftime("%Y-%m-%d" + "T" +  "%H-%M-%S")

see above

>      vg_dataName = "/dev/vg_data/"
>      lvName = h

again, why not just use the meaningful name in the for loop?

>      origName = vg_dataName + lvName
>      snapName = DATE + "." + lvName
>      snapNameList.append(snapName)
>      backupList.append(vg_dataName + snapName)
>      subprocess.call(['lvcreate', '-s', '-L1G', origName, '-n', snapName])

In fact these loops are so similar you should probably make
a function with a couple of parameters and call it from both loops.
The only things different are the list you loop over and the path. Make 
those params and you get something like this (untested):

def create_volumes(volList, path):
     for name in volList:
        dt = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S."+name)
        dataName = path
        origName = dataName + name
        snapName = DATE + "." + name
        snapNameList.append(snapName)
        backupList.append(dataName + snapName)
        subprocess.call(['lvcreate', '-s', '-L1G',
                          origName, '-n', snapName])

> # backupPath is list of full paths of each snapshot
> # the string is extacted from backupList using 'join'
> backupPath = ' '.join(backupList)
> for j, k in zip(backupList, snapNameList):
>      backupPath = j
>      backupSnapshot = k
> # run dd and pipe to bz2 file using subprocess module
>      ddIf = shlex.split("dd if=%s bs=4k conv=noerror,notrunc,sync" %
> (backupPath))
>      compress = "pbzip2"
>      filename = "/home/files/temp/%s.img.bz2" % (backupSnapshot)
>      p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE)
>      with p1.stdout as fin, open(filename, "w") as fout:
>          p2 = subprocess.Popen(compress, stdin=fin, stdout=fout)

Take note of the warning in the subprocess documentation about
using stdin/stdout directly. You should ideally access them
via Popen.communicate().

>      ret1 = p1.wait()
>      ret2 = p2.wait()

I may just be missing it but I don't see you do anything with these 
ret1,ret2 variables?

> # create list of files to be encrypted with full path names
> # start with list of unencrypted files
> cryptDir = '/home/files/temp/'
> unencrypted = [u for u in os.listdir(cryptDir)]

This is a redundant comprehension. listdir() returns the list for you.

Any time you see

foo = [item for item in a_list]
it probably should just be:

foo = a_list

Or if a_list is really an iterator it could be

foo = list(an_iter)

Comprehensions are good when you have a filter condition on the end
or if you are returning a function/expression involving item.

> # join absolute path to file names to create new list (list comprehension)
> cryptDir_unencrypted = [ os.path.join(cryptDir, s) for s in unencrypted ]

Like that one for example...

> # encrypt files
> for G in cryptDir_unencrypted:

You seem to like single letter loop variables. Its better to use 
something more meaningful. Say:

for filename in cryptDir_unencrypted:

>      gpg = gnupg.GPG(gnupghome='/root/.gnupg')
>      phrase = "passphrase"  # HORRIBLE SECURITY, I KNOW!  The script is
> running as a cronjob so I can't interactively enter the passphrase.
> Suggestions are welcome.

Use a config file that you can edit when needed. Read the passwords from 
that. It could even be encrypted...

>      cipher = "AES256"
>      with open(G, 'rb') as f:
>          status = gpg.encrypt_file(f, None, armor=False,
> passphrase=phrase, symmetric=cipher.upper(), output=G + '.gpg')
>
>
> # move unencypted files out of temp directory
> for data in glob.glob(cryptDir + '*.bz2'):

You should really use os.join() instead of string addition...

>      shutil.move(data,'/home/files/')
>
>
> # delete snapshots
> for r in snapNameList:
>      removeSnapshots1 = 'lvremove -f ' + vgxenName + r
>      subprocess.call(shlex.split(removeSnapshots1))
>      removeSnapshots2 = 'lvremove -f ' + vg_dataName + r
>      subprocess.call(shlex.split(removeSnapshots2))
>
>
> # create list of file names to be uploaded (list comprehension)
> uploads = [y for y in os.listdir(cryptDir)]

see previous comment

> # join absolute path to file names to create new list (list comprehension)
> cryptDir_uploads = [ os.path.join(cryptDir, t) for t in uploads ]
>
> # upload to Amazon s3
> for d in cryptDir_uploads:
>      s3Upload = 's3cmd put ' + d + ' s3://bucket/dir/'
>      subprocess.call(shlex.split(s3Upload))
>      subprocess.call(shlex.split('rm ' + d))

I'm not sure the shlex.split calls add much value. You could just
hard code the command lists in this case.

> # move working path to list of unencrypted vm backup files
> path = '/home/files/'
> os.chdir(path)
>
> # build list of unencrypted vm backup files (list comprehension)
> fileNames = [ u for u in os.listdir(path) if os.path.isfile(u) ]
>
> # build list of	unencrypted vm backup files with .gpg
> # this will mirror the list of files in s3 (list comprehension)
> fileNames_gpg = [ p + '.gpg' for p in fileNames ]
>
> # NOTE: I tried to collapse the previous two list comprehensions into one
> # as it would seem to be possible

Why not:

fileNames_gpg = [ u+'.gpg' for u in os.listdir(path)
                   if os.path.isfile(u) ]

> # NOTE: I have to use the unencrypted files on the server
> # as the gpg files get deleted after each upload to s3
> ageList = [ round((time.time() - os.stat(n).st_mtime)/60/60/24) for n in
> fileNames ]

This seems to give the lie to the statement that you wanted to
combine the comprehensions... you are using the intermediate
list here...

> # delete files older than 'age' variable; age of file in ageList compared
> # against name in fileNames_gpg
> age = 7
> for x, y in zip(fileNames_gpg, ageList):
>      if y > age:

This is probably neater done using datetime and timedelta objects.

>          subprocess.call(shlex.split('s3cmd del s3://bucket/dir/' + x))

Again, I'm not sure shlex adds anything here.

HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos