jaroslav.dobrek at gmail.com
Thu Jul 26 12:51:34 CEST 2012
> And the cool thing is: you can! :)
> In Python 2.6 and later, the new Py3 open() function is a bit more hidden,
> but it's still available:
> from io import open
> filename = "somefile.txt"
> with open(filename, encoding="utf-8") as f:
> for line in f:
> process_line(line) # actually, I'd use "process_file(f)"
> except IOError, e:
> print("Reading file %s failed: %s" % (filename, e))
> except UnicodeDecodeError, e:
> print("Some error occurred decoding file %s: %s" % (filename, e))
Thanks. I might use this in the future.
> > try:
> > for line in f: # here text is decoded implicitly
> > do_something()
> > except UnicodeDecodeError():
> > do_something_different()
> > This isn't possible for syntactic reasons.
> Well, you'd normally want to leave out the parentheses after the exception
> type, but otherwise, that's perfectly valid Python code. That's how these
> things work.
You are right. Of course this is syntactically possible. I was too
rash, sorry. In confused
it with some other construction I once tried. I can't remember it
But the code above (without the brackets) is semantically bad: The
exception is not caught.
> > The problem is that vast majority of the thousands of files that I
> > process are correctly encoded. But then, suddenly, there is a bad
> > character in a new file. (This is so because most files today are
> > generated by people who don't know that there is such a thing as
> > encodings.) And then I need to rewrite my very complex program just
> > because of one single character in one single file.
> Why would that be the case? The places to change should be very local in
> your code.
This is the case in a program that has many different functions which
open and parse different
types of files. When I read and parse a directory with such different
types of files, a program that
for line in f:
will not exit with any hint as to where the error occurred. I just
exits with a UnicodeDecodeError. That
means I have to look at all functions that have some variant of
for line in f:
in them. And it is not sufficient to replace the "for line in f" part.
I would have to transform many functions that
work in terms of lines into functions that work in terms of decoded
That is why I usually solve the problem by moving fles around until I
find the bad file. Then I recode or repair
the bad file manually.
More information about the Python-list