[Tutor] Parsing data from a set of files iteratively

Steven D'Aprano steve at pearwood.info
Wed May 30 09:16:16 CEST 2012


On Wed, May 30, 2012 at 07:00:30AM +0100, Spyros Charonis wrote:
> FINAL SOLUTION:

Not quite. You are making the mistake of many newbies to treat Python 
exceptions as a problem to be covered up and hidden, instead of as a 
useful source of information.

To quote Chris Smith:

    "I find it amusing when novice programmers believe their main 
    job is preventing programs from crashing. ... More experienced 
    programmers realize that correct code is great, code that 
    crashes could use improvement, but incorrect code that doesn't 
    crash is a horrible nightmare."
    -- http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/


There is little as painful as a program which prints "An error occurred" 
and then *keeps working*. What does this mean? Can I trust that the 
program's final result is correct? How can it be correct if an error 
occurred? What error occurred? How do I fix it?

Exceptions are your friend, not your enemy. An exception tells you that 
there is a problem with your program that needs to be fixed. Don't 
cover-up exceptions unless you absolutely have to.

Sadly, your indentation is still being broken when you post. Please 
ensure you include indentation, and disable HTML or "Rich Text" posting.
I have tried to guess the correct indentation below, and fix it in 
place, but apologies if I get it wrong.

 
> ### LOOP OVER DIRECTORY
> location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels'
> zdata = []
> for filename in os.listdir(location):
>     filename = os.path.join(location, filename)
>     try:
>         zdata.extend(extract_zcoord(filename))
>     except NameError:
>         print "No such file!"

Incorrect. When a file is missing, you do not get NameError. This 
except-clause merely disguises programming errors in favour of a 
misleading and incorrect error message.

If you get a NameError, your program has a bug. Don't just hide the bug, 
fix it.


>     except SyntaxError:
>         print "Check Your Syntax!"

This except-clause is even more useless. SyntaxErrors happen when the 
code is compiled, not run, so by the time the for-loop is entered, the 
code has already been compiled and cannot possibly raise SyntaxError.

Even if it could, what is the point of this? Instead of a useful 
exception traceback, which tells you not only which line contains the 
error, but even highlights the point of the error with a ^ caret, you 
hide all the useful information and tease the user with a useless 
message "Check Your Syntax!".

Again, if your program raises a SyntaxError, it has a bug. Don't hide 
the bug, fix it.


>     except IOError:
>         print "PDB file NOT FOUND!"

This, at least, is somewhat less useless than the others. At least it is 
a valid exception, and if your intention is to skip missing files, 
catching IOError is a reasonable way to do it.

But you don't just get IOError for *missing* files, but also for 
*unreadable* files, perhaps because you don't have permission to read 
them, or perhaps because the file is corrupt and can't be read.

In any case, as usual, imagine yourself as the recipient of this 
message: "PDB file NOT FOUND!" -- what do you expect to do about it? 
Which file is missing or unreadable? How can you tell? Is this a 
problem? Are your results still valid without that PDB file's data?

If this can be be ignored, IGNORE IT! Don't bother the user with scary 
messages that a problem occurred, if it isn't a problem! At *most*, 
print a notice that you have skipped a file:

        print "Skipping file", filename

(perhaps giving the reason for skipping it). Or even just ignore it 
completely:

        pass


>     else:
>         continue

This is pointless. All it does is what would have been done anyway: if 
no exception occurs, it continues to the next loop. Get rid of it: your 
code will be shorter and neater without this unnecessary two lines.



-- 
Steven



More information about the Tutor mailing list