[Tutor] Parsing data from a set of files iteratively
Steven D'Aprano
steve at pearwood.info
Wed May 30 09:16:16 CEST 2012
On Wed, May 30, 2012 at 07:00:30AM +0100, Spyros Charonis wrote:
> FINAL SOLUTION:
Not quite. You are making the mistake of many newbies to treat Python
exceptions as a problem to be covered up and hidden, instead of as a
useful source of information.
To quote Chris Smith:
"I find it amusing when novice programmers believe their main
job is preventing programs from crashing. ... More experienced
programmers realize that correct code is great, code that
crashes could use improvement, but incorrect code that doesn't
crash is a horrible nightmare."
-- http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/
There is little as painful as a program which prints "An error occurred"
and then *keeps working*. What does this mean? Can I trust that the
program's final result is correct? How can it be correct if an error
occurred? What error occurred? How do I fix it?
Exceptions are your friend, not your enemy. An exception tells you that
there is a problem with your program that needs to be fixed. Don't
cover-up exceptions unless you absolutely have to.
Sadly, your indentation is still being broken when you post. Please
ensure you include indentation, and disable HTML or "Rich Text" posting.
I have tried to guess the correct indentation below, and fix it in
place, but apologies if I get it wrong.
> ### LOOP OVER DIRECTORY
> location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels'
> zdata = []
> for filename in os.listdir(location):
> filename = os.path.join(location, filename)
> try:
> zdata.extend(extract_zcoord(filename))
> except NameError:
> print "No such file!"
Incorrect. When a file is missing, you do not get NameError. This
except-clause merely disguises programming errors in favour of a
misleading and incorrect error message.
If you get a NameError, your program has a bug. Don't just hide the bug,
fix it.
> except SyntaxError:
> print "Check Your Syntax!"
This except-clause is even more useless. SyntaxErrors happen when the
code is compiled, not run, so by the time the for-loop is entered, the
code has already been compiled and cannot possibly raise SyntaxError.
Even if it could, what is the point of this? Instead of a useful
exception traceback, which tells you not only which line contains the
error, but even highlights the point of the error with a ^ caret, you
hide all the useful information and tease the user with a useless
message "Check Your Syntax!".
Again, if your program raises a SyntaxError, it has a bug. Don't hide
the bug, fix it.
> except IOError:
> print "PDB file NOT FOUND!"
This, at least, is somewhat less useless than the others. At least it is
a valid exception, and if your intention is to skip missing files,
catching IOError is a reasonable way to do it.
But you don't just get IOError for *missing* files, but also for
*unreadable* files, perhaps because you don't have permission to read
them, or perhaps because the file is corrupt and can't be read.
In any case, as usual, imagine yourself as the recipient of this
message: "PDB file NOT FOUND!" -- what do you expect to do about it?
Which file is missing or unreadable? How can you tell? Is this a
problem? Are your results still valid without that PDB file's data?
If this can be be ignored, IGNORE IT! Don't bother the user with scary
messages that a problem occurred, if it isn't a problem! At *most*,
print a notice that you have skipped a file:
print "Skipping file", filename
(perhaps giving the reason for skipping it). Or even just ignore it
completely:
pass
> else:
> continue
This is pointless. All it does is what would have been done anyway: if
no exception occurs, it continues to the next loop. Get rid of it: your
code will be shorter and neater without this unnecessary two lines.
--
Steven
More information about the Tutor
mailing list