catch UnicodeDecodeError

jaroslav.dobrek at jaroslav.dobrek at
Wed Jul 25 13:05:28 CEST 2012


very often I have the following problem: I write a program that processes many files which it assumes to be encoded in utf-8. Then, some day, I there is a non-utf-8 character in one of several hundred or thousand (new) files. The program exits with an error message like this:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 60: invalid continuation byte

I usually solve the problem by moving files around and by recoding them.

What I really want to do is use something like

    # open file, read line, or do something else, I don't care
except UnicodeDecodeError:
    sys.exit("Found a bad char in file " + file + " line " + str(line_number)

Yet, no matter where I put this try-except, it doesn't work.

How should I use try-except with UnicodeDecodeError?


More information about the Python-list mailing list