[Tutor] encoding question
Steven D'Aprano
steve at pearwood.info
Sun Jan 5 00:52:44 CET 2014
On Sat, Jan 04, 2014 at 11:26:35AM -0800, Alex Kleider wrote:
> Any suggestions as to a better way to handle the problem of encoding in
> the following context would be appreciated.
Python gives you lots of useful information when errors occur, but
unfortunately your code throws that information away and replaces it
with a totally useless message:
> try:
> country = item[9:].decode('utf-8')
> except:
> print("Exception raised.")
Oh great. An exception was raised. What sort of exception? What error
message did it have? Why did it happen? Nobody knows, because you throw
it away.
Never, never, never do this. If you don't understand an exception, you
have no business covering it up and hiding that it took place. Never use
a bare try...except, always catch the *smallest* number of specific
exception types that make sense. Better is to avoid catching exceptions
at all: an exception (usually) means something has gone wrong. You
should aim to fix the problem *before* it blows up, not after.
I'm reminded of a quote:
"I find it amusing when novice programmers believe their main job is
preventing programs from crashing. ... More experienced programmers
realize that correct code is great, code that crashes could use
improvement, but incorrect code that doesn't crash is a horrible
nightmare." -- Chris Smith
Your code is incorrect, it does the wrong thing, but it doesn't crash,
it just covers up the fact that an exception occured.
> The output I get on an Ubuntu 12.4LTS system is as follows:
> alex at x301:~/Python/Parse$ ./IP_info.py3
> Exception raised.
> IP address is 201.234.178.62:
> Country: COLOMBIA (CO); City: b'Bogot\xe1'.
> Lat/Long: 10.4/-75.2833
>
>
> I would have thought that utf-8 could handle the 'a-acute'.
Of course it can:
py> 'Bogotá'.encode('utf-8')
b'Bogot\xc3\xa1'
py> b'Bogot\xc3\xa1'.decode('utf-8')
'Bogotá'
But you don't have UTF-8. You have something else, and trying to decode
it using UTF-8 fails.
py> b'Bogot\xe1'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 5:
unexpected end of data
More to follow...
--
Steven
More information about the Tutor
mailing list