[Tutor] Parsing data from a set of files iteratively
Steven D'Aprano
steve at pearwood.info
Thu May 31 03:41:19 CEST 2012
Spyros Charonis wrote:
> On Wed, May 30, 2012 at 8:16 AM, Steven D'Aprano <steve at pearwood.info>wrote:
[...]
>> There is little as painful as a program which prints "An error occurred"
>> and then *keeps working*. What does this mean? Can I trust that the
>> program's final result is correct? How can it be correct if an error
>> occurred? What error occurred? How do I fix it?
>>
> My understanding is that an except clause will catch a relevant error and
> raise an exception if there is one, discontinuing program execution.
No, the opposite. An except clause will catch the exception and *continue*
execution past the end of the try...except block.
Python automatically raises exceptions and halts execution if you do nothing.
For example:
py> for x in (1, 0, 2):
... print(1/x)
...
1.0
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ZeroDivisionError: division by zero
Notice that the first time around the loop, 1.0 is printed. The second time,
an error occurs (1/0 is not defined), so Python raises an exception. Because I
don't catch that exception, it halts execution of the loop and prints the
traceback.
But a try...except clause will catch the exception and keep going:
py> for x in (1, 0, 2):
... try:
... print(1/x)
... except ZeroDivisionError:
... print('something bad happened')
...
1.0
something bad happened
0.5
There are good reasons for catching exceptions. Sometimes you can recover from
the error, or skip the bad data. Sometimes one calculation fails but you can
try another one. Or you might want to catch an exception of one type, and
replace it with a different exception with a more appropriate error message.
But all too often, I see beginners catching exceptions and just covering them
up, or replacing useful tracebacks which help with debugging with bland and
useless generic error messages like "An error occurred".
>>> except SyntaxError:
>>> print "Check Your Syntax!"
>> This except-clause is even more useless. SyntaxErrors happen when the
>> code is compiled, not run, so by the time the for-loop is entered, the
>> code has already been compiled and cannot possibly raise SyntaxError.
>>
> What I meant was, check the syntax of my pathname specification, i.e. check
> that I
> did not make a type when writing the path of the directory I want to scan
> over. I realize
> syntax has a much more specific meaning in the context of programming -
> code syntax!
That's not what SyntaxError does in Python. Python only understands one form
of syntax: *Python* syntax, not the syntax of pathnames to files. If you type
the wrong pathname:
pathname = "My Documents!letters!personal!letter to my mother+doc"
Python will not raise SyntaxError. It will try to open the file called
My Documents!letters!personal!letter to my mother+doc
*exactly* as you typed it, and either succeed (if by some unimaginable fluke
there happens to be a file of that name!) or fail. If it fails, you will get
an OSError or IOError, depending on the nature of the failure reported by the
operating system.
[...]
>> But you don't just get IOError for *missing* files, but also for
>> *unreadable* files, perhaps because you don't have permission to read
>> them, or perhaps because the file is corrupt and can't be read.
>>
> Understood, but given that I am reading and processing are standard ASCII
> text files,
> there is no good reason (which I can think of) that the files would be
> *unreadable*
*Any* file can be unreadable. The disk may develop a fault, and no longer be
able to read the file's data blocks. Or the file system may be corrupted and
the operating system can see that the file is there, but not where it is. If
the file is on a network share, the network may have gone down halfway through
reading the file. If it's on a USB stick or external hard drive, somebody
might have unplugged it, or the connector might be wobbly.
Normally, for a script like this, failure to read a file should be considered
a fatal error. If a file which *should* be there is no longer there, you
should report the problem and halt. I recommend that you don't catch the
exception at all, just let the traceback occur as normal.
> I verified that I had read/write permissions for all my files, which are
> the default access privileges anyway (for the owner).
Fine, but I'm talking in general rather than specific for you. In general,
"file not found" is not the only error you can get. There is a remarkably
large number of things that can go wrong when reading files, fortunately most
of them are very rare.
Consider what your code does. First, you ask the operating system for a list
of the files in a directory, using os.listdir. Then you expect that some of
those files might be missing, and try to catch the exception. Is this
reasonable? Do you actually expect the operating system will lie to you and
say that files are there that actually don't exist?
For a robust application that runs in an environment where it is possible that
files will routinely be created or destroyed at the same time that the
application is running, a more careful and paranoid approach is appropriate.
For a short script that you control, perhaps not so much.
--
Steven
More information about the Tutor
mailing list