[Tutor] Parsing data from a set of files iteratively

Spyros Charonis s.charonis at gmail.com
Wed May 30 10:37:52 CEST 2012

On Wed, May 30, 2012 at 8:16 AM, Steven D'Aprano <steve at pearwood.info>wrote:

> On Wed, May 30, 2012 at 07:00:30AM +0100, Spyros Charonis wrote:
> Not quite. You are making the mistake of many newbies to treat Python
> exceptions as a problem to be covered up and hidden, instead of as a
> useful source of information.
> To quote Chris Smith:
>    "I find it amusing when novice programmers believe their main
>    job is preventing programs from crashing. ... More experienced
>    programmers realize that correct code is great, code that
>    crashes could use improvement, but incorrect code that doesn't
>    crash is a horrible nightmare."
>    -- http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/
> Ok, so basically wrong code beats useless code.
> There is little as painful as a program which prints "An error occurred"
> and then *keeps working*. What does this mean? Can I trust that the
> program's final result is correct? How can it be correct if an error
> occurred? What error occurred? How do I fix it?
My understanding is that an except clause will catch a relevant error and
raise an exception if there is one, discontinuing program execution.

> Exceptions are your friend, not your enemy. An exception tells you that
> there is a problem with your program that needs to be fixed. Don't
> cover-up exceptions unless you absolutely have to.

> Sadly, your indentation is still being broken when you post. Please
> ensure you include indentation, and disable HTML or "Rich Text" posting.
> I have tried to guess the correct indentation below, and fix it in
> place, but apologies if I get it wrong.
Yes, that is the way my code looks in a python interpreter

> > location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels'
> > zdata = []
> > for filename in os.listdir(location):
> >     filename = os.path.join(location, filename)
> >     try:
> >         zdata.extend(extract_zcoord(filename))
> >     except NameError:
> >         print "No such file!"
> Incorrect. When a file is missing, you do not get NameError. This
> except-clause merely disguises programming errors in favour of a
> misleading and incorrect error message.
> If you get a NameError, your program has a bug. Don't just hide the bug,
> fix it.
> >     except SyntaxError:
> >         print "Check Your Syntax!"
> This except-clause is even more useless. SyntaxErrors happen when the
> code is compiled, not run, so by the time the for-loop is entered, the
> code has already been compiled and cannot possibly raise SyntaxError.
What I meant was, check the syntax of my pathname specification, i.e. check
that I
did not make a type when writing the path of the directory I want to scan
over. I realize
syntax has a much more specific meaning in the context of programming -
code syntax!

> Even if it could, what is the point of this? Instead of a useful
> exception traceback, which tells you not only which line contains the
> error, but even highlights the point of the error with a ^ caret, you
> hide all the useful information and tease the user with a useless
> message "Check Your Syntax!".
Ok, I didn't realize I was being so reckless - thanks for pointing that

> Again, if your program raises a SyntaxError, it has a bug. Don't hide
> the bug, fix it.
> >     except IOError:
> >         print "PDB file NOT FOUND!"
> This, at least, is somewhat less useless than the others. At least it is
> a valid exception, and if your intention is to skip missing files,
> catching IOError is a reasonable way to do it.
> But you don't just get IOError for *missing* files, but also for
> *unreadable* files, perhaps because you don't have permission to read
> them, or perhaps because the file is corrupt and can't be read.
Understood, but given that I am reading and processing are standard ASCII
text files,
there is no good reason (which I can think of) that the files would be
I verified that I had read/write permissions for all my files, which are
the default
access privileges anyway (for the owner).

> In any case, as usual, imagine yourself as the recipient of this
> message: "PDB file NOT FOUND!" -- what do you expect to do about it?
> Which file is missing or unreadable? How can you tell? Is this a
> problem? Are your results still valid without that PDB file's data?
Perhaps because I was writing the program I didn't think that this message
be confusing to others, but it did help in making clear that there was a
different error
(in this case, the absence of **filename = os.path.join(location,
filename)** to join
a filename to its pathway). Without the PDB file's data, there would be no
results - because
the program operates on each file of a directory successively (all files
are .pdb files) and uses
data in the file to build a list. So, since I was working on a directory
with only PDB files this error
says it hasn't found them - which points to a more basic error (the one
mentioned above).

> If this can be be ignored, IGNORE IT! Don't bother the user with scary
> messages that a problem occurred, if it isn't a problem! At *most*,
> print a notice that you have skipped a file:
>        print "Skipping file", filename
> (perhaps giving the reason for skipping it). Or even just ignore it
> completely:
>        pass
> >     else:
> >         continue
> This is pointless. All it does is what would have been done anyway: if
> no exception occurs, it continues to the next loop. Get rid of it: your
> code will be shorter and neater without this unnecessary two lines.
Yes, I see what you mean. Thank you for all the corrections!

