Help making this script better

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Aug 6 21:24:22 EDT 2009


En Thu, 06 Aug 2009 11:50:07 -0300, jakecjacobson  
<jakecjacobson at gmail.com> escribió:

> After much Google searching and trial & error, I was able to write a
> Python script that posts XML files to a REST API using HTTPS and
> passing PEM cert & key file.  It seems to be working but would like
> some pointers on how to handle errors.  I am using Python 2.4, I don't
> have the capability to upgrade even though I would like to.  I am very
> new to Python so help will be greatly appreciated and I hope others
> can use this script.

Just a few remarks, mostly on style rather than functionality:

> #!/usr/bin/python
> #################################################################
> # catalog_feeder.py
> #################################################################
> # This sciript will process a directory of XML files and push them to
> the Enterprise Catalog.
> #  You configure this script by using a configuration file that
> describes the required variables.
> #  The path to this file is either passed into the script as a command
> line argument or hard coded
> #  in the script.  The script will terminate with an error if it can't
> process the XML file.
> #################################################################

Note that Python has "docstrings" - the __doc__ attribute attached to  
every module, class and function. The very first string in the  
module/function/class becomes its docstring. The interactive help system  
-and other tools like pydoc- can inspect and show such info.
The above comment could serve perfectly as this module's docstring - just  
remove all the #'s and enclose the whole text in """triple quotes"""  
(required as it spawns many lines).
By example, in that case you could print the text in your usage() function  
like this:
print __doc__

> 	try:
> 		# Process the XML conf file ....
> 		xmldoc = minidom.parse(c)
> 		catalog_host = readConfFile(xmldoc, 'catalog_host')
> 		catalog_port = int(readConfFile(xmldoc, 'catalog_port'))
> 		catalog_path = readConfFile(xmldoc, 'catalog_path')
> 		collection_name = readConfFile(xmldoc, 'collection_name')
> 		cert_file = readConfFile(xmldoc, 'cert_file')
> 		key_file = readConfFile(xmldoc, 'key_file')
> 		log_file = readConfFile(xmldoc, 'log_file')
> 		input_dir = readConfFile(xmldoc, 'input_dir')
> 		archive_dir = readConfFile(xmldoc, 'archive_dir')
> 		hold_dir = readConfFile(xmldoc, 'hold_dir')
> 	except Exception, inst:
> 		# I had an error so report it and exit script
> 		print "Unexpected error opening %s: %s" % (c, inst)
> 		sys.exit(1)

Ok, an unexpected error: but *which* one? doing exactly *what*? You're  
hiding important information (the error type, the full traceback, the  
source line that raised the error...) that's very valuable when something  
goes wrong and you have to determine what happened.
In this case, you're adding a bit of information: the name of the file  
being processed. That's good. But at the same time, hiding all the other  
info, and that's not so good. What about this:

	except Exception:
		print >>sys.stderr, "Unexpected error opening %s" % c
		raise

The final raise will propagate the exception; by default, Python will  
print the exception type, the exception value, the full traceback  
including source lines, and exit the script with a status code of 1. The  
same effect that you intended, but more complete.

In other cases, where you don't have anything to add to the default  
exception handling, the best thing to do is: nothing. That is, don't catch  
an exception unless you have something good to do with it. (Exceptions  
aren't Pokémon: you don't have to "catch 'em all!")

> 	# Log Starting
> 	logOut = verifyLogging(log_file)
> 	if logOut:
> 		log(logOut, "Processing Started ...")

I would move the decision into the log function (that is, just write  
log("something") and make the log function decide whether to write to file  
or not). For your next script, look at the logging module:  
http://docs.python.org/library/logging.html

> # Get an arry of files from the input_dir
> def getFiles2Post(d):
> 	return (os.listdir(d))

Note that return isn't a function but a statement. You don't need the  
outer (). Also, using a docstring instead of a comment:

def getFiles2Post(input_dir):
     "Return the list of files in input_dir to process"
     return os.listdir(input_dir)

> # Read out the conf file and set the needed global variable
> def readConfFile(xmldoc, tag):
> 	return (xmldoc.getElementsByTagName(tag)[0].firstChild.data)

Same as above. Which "needed global variable"?

> def cleanup(logOut):
>       [...]	sys.exit(0)

Exiting the script from everywhere makes it harder to reuse some of its  
functions later. Just return the desired status code to the caller, which  
in turn will return to main(), until it gets to the outermost level, which  
is the only place I'd use sys.exit()

-- 
Gabriel Genellina




More information about the Python-list mailing list