UnicodeEncodeError when not running script from IDE
Magnus Pettersson
magpettersson at gmail.com
Tue Feb 12 19:40:46 EST 2013
Thanks a lot Steven, you gave me a good AHA experience! :)
Now I understand why I had to use encoding when calling the urllib2! So basically Eclipse PyDev does this in the background for me, and its console supports utf-8, so thats why i never had to think about it before (and why some scripts tends to fail with unicode errors when run outside of the Eclipse IDE).
> Start here:
> "The Absolute Minimum Every Software Developer Absolutely, Positively Must
> Know About Unicode and Character Sets (No Excuses!)"
> http://www.joelonsoftware.com/articles/Unicode.html
> Basically, Unicode is an in-memory data format. Python knows about Unicode
> characters (to be technical: code points), but files on disk do not.
> Neither do network protocols, or terminals, or other simple devices. They
> only understand bytes.
> So when you have Unicode text, and you want to write it to a file on disk,
> or print it, or send it over the network to another machine, it has to be
> *encoded* into bytes, and then *decoded* back into Unicode when you read it
> from the file again. Sometimes the system will "helpfully" do that encoding
> and decoding automatically for you, which is fine when it works but when it
> doesn't it can be perplexing.
> There are many, many, many different *encoding schemes*. ASCII is one. UTF-8
> is another. And then there are about a bazillion legacy encodings which, if
> you are lucky, you will never need to care about. Only some encodings can
> deal with the entire range of Unicode characters, most can only deal with a
> (typically small) subset of possible characters. E.g. ASCII only knows
> about 127 characters out of the million-plus that Unicode deals with.
> Latin-1 can handle close to 256 different characters. If you have a say in
> the matter, always use UTF-8, since it can handle the full set of Unicode
> characters in the most efficient manner.
> --
> Steven
More information about the Python-list
mailing list