[Tutor] Script Feedback

Tue Mar 30 19:41:34 CEST 2010

On Tue, 30 Mar 2010 10:27:43 -0400
Damon Timm <damontimm at gmail.com> wrote:

> As a self-taught Python user I am still looking for insight on the
> most pythonic and programmatically-friendly way of accomplishing a
> given task. In this case, I have written a script that will perform a
> “clean bzip2″ of a directory (or directories). Mac OS X (via AFP and
> netatalk, in my case) tends leaves a bunch of ugly files/directories
> hanging around and I would rather not include them in my compressed
> tar file.
> 
> In writing the script, though, I ran into some questions and I am not
> sure what the recommended approach would be. The script works, as it
> is, but I feel its a little hacked together and also a little limited
> in its application.   There is something to be said for programs that
> "just work" (this does) but I want to take it a little further as an
> educational endeavor and would like it to appear robust,
> future-thinking, and pythonic.

Very good point of view, imo.
Be aware that some of the notes below are personal opinions.

> My initial questions are:
> 
> 1. Is there a better way to implement a --quiet flag?

I don't think so.

> 2. I am not very clear on the use of Exceptions (or even if I am using
> it in a good way here) — is what I have done the right approach?

You can consider exceptions, in general, as a kind of intra-program (interruption-) signal system. When an exception occurs, the normal execution flow is stopped, and the program branches to a special routine designed to cope with the signal -- if any. Else it stops with a message on the terminal (actually on stderr).
There are python builtin exception types, most errors; if you don't catch them, then the user gets the kind of error message you're probably familiar with ;-) If you do, by wrapping the potentially interrupting bit of code into a try...except construct, then the occurrence of an error lets the program jump into the except routine: this is a sophisticated and specialised goto.
You can design your own exception types that work the same way. Just record needed info in __init__ and write a meaningful message in __str__ (see below). Custom exception types can be used for (1) real errors =  "abnormal" cases (2) exceptional, but "normal", cases (like in your case) (3) any need for signaling between separate parts of your code.

> 3. Finally, in general: any feedback on how to improve this? (I am
> thinking, just now, that the script is only suitable for a command
> line usage, and couldn’t be imported by another script, for example.)

Yo. Just do it! (the "if __name__=="__main__:" part won't be executed on import -- write a func, eg clean(), that can be called from importing code -- then when it runs stand-alone the main part could just call it after parsing the command-line args)

> Any feedback is greatly appreciated. Writing a script like this is a
> good learning tool (for me, at least).
> 
> I have posted this email online if you want to see the script with
> pretty code formatting:
> http://blog.damontimm.com/python-script-clean-bzip/
> 
> Thanks for any insight you may provide.
> 
> Damon
> 
> Script follows
> ************************
> 
> #! /usr/bin/env python
> 
> '''Script to perform a "clean" bzip2 on a directory (or directories).  Removes
> extraneous files that are created by Apple/AFP/netatalk before compressing.
> '''
> 
> import os
> import tarfile
> from optparse import OptionParser
> 
> IGNORE_DIRS = ( '.AppleDouble', )
> IGNORE_FILES = ('.DS_Store', )

+++ for using constants

> class DestinationTarFileExists(Exception):
>     '''If the destination tar.bz2 file already exists.'''
    def __init__(self, filename):
        self.filename = filename
    def __str__(self, filename):
        ''' (just an example) '''
        return 'Destination tar file "%s" already exists.' %(self.filename)
By defining the "magic" methods above, you store relevant info about an exception, and get an output format for it. __str__ is doubly magic for an exception, because (in addition to define its str(), which can often be useful), it's also automatically output when the exception falls through uncaught (written after the exception type).

> def ignore_walk(directory):
>     '''Ignore defined files and directories when doing the walk.'''
>     for dirpath, dirnames, filenames in os.walk(directory):
>         dirnames[:] = [ dn for dn in dirnames if dn not in IGNORE_DIRS ]
>         filenames[:] = [ fn for fn in filenames if fn not in IGNORE_FILES ]
>         yield dirpath, dirnames, filenames

First, it seems you use [:] only to preserves the object identity so that it remains a generator. But it may be better (at least clearer for me) to filter and transform the generation process so as to get what you actually need, I guess: iterating on (dirpath,filename) pairs. If I'm right on this, maybe try to figure out how do that.
I would call the func eg "filtered_dir_walk" or "relevant_dir_walk".

> def tar_bzip2_directories(directories):
>     for directory in directories:
>         file_name = '-'.join(directory.split(' '))
>         tar_name = file_name.replace('/','').lower() + ".tar.bz2"
> 
>         if os.path.exists(tar_name):
>             raise DestinationTarFileExists()
> 
>         if not options.quiet:
>             print 'Compressing files into: ' + tar_name
> 
>         tar = tarfile.open(tar_name, 'w:bz2')
> 
>         for dirpath, dirnames, filenames in ignore_walk(directory):
>             for file in filenames:
>                 if not options.quiet:
>                     print os.path.join(dirpath, file)
> 
>                 tar.add(os.path.join(dirpath, file))

--> see remark above. Could be:

          for (dirpath, filename) in ignore_walk(directory):
               if not options.quiet:
                   print os.path.join(dirpath, filename)
               tar.add(os.path.join(dirpath, filename))

>         tar.close()
> 
> if __name__ == "__main__":
> 
>     parser = OptionParser(usage="%prog [options: -q ] [directory]")
>     parser.add_option("-q", "--quiet", action="store_true", dest="quiet")
>     options, args = parser.parse_args()
> 
>     directories = []
> 
>     for arg in args:
>         if os.path.isdir(arg):
>             directories.append(arg)
>         else:
>             print "Ingoring: %s (it's not a directory)." % arg
> 
>     try:
>         tar_bzip2_directories(directories)
>     except DestinationTarFileExists:
>         print "A tar file already exists this this directory name."
>         print "Move or rename it and try again."

Here you can use exception info and/or str().
Maybe let a choice for overwrite/skip/stop for any existing name.

Denis
________________________________

vit esse estrany ☣

spir.wikidot.com