UnicodeEncodeError when not running script from IDE
Magnus Pettersson
magpettersson at gmail.com
Tue Feb 12 19:40:46 EST 2013
Thanks a lot Steven, you gave me a good AHA experience! :)
Now I understand why I had to use encoding when calling the urllib2! So basically Eclipse PyDev does this in the background for me, and its console supports utf-8, so thats why i never had to think about it before (and why some scripts tends to fail with unicode errors when run outside of the Eclipse IDE).
cheers
Magnus
> Start here:
>
>
>
> "The Absolute Minimum Every Software Developer Absolutely, Positively Must
>
> Know About Unicode and Character Sets (No Excuses!)"
>
>
>
> http://www.joelonsoftware.com/articles/Unicode.html
>
>
>
>
>
> Basically, Unicode is an in-memory data format. Python knows about Unicode
>
> characters (to be technical: code points), but files on disk do not.
>
> Neither do network protocols, or terminals, or other simple devices. They
>
> only understand bytes.
>
>
>
> So when you have Unicode text, and you want to write it to a file on disk,
>
> or print it, or send it over the network to another machine, it has to be
>
> *encoded* into bytes, and then *decoded* back into Unicode when you read it
>
> from the file again. Sometimes the system will "helpfully" do that encoding
>
> and decoding automatically for you, which is fine when it works but when it
>
> doesn't it can be perplexing.
>
>
>
> There are many, many, many different *encoding schemes*. ASCII is one. UTF-8
>
> is another. And then there are about a bazillion legacy encodings which, if
>
> you are lucky, you will never need to care about. Only some encodings can
>
> deal with the entire range of Unicode characters, most can only deal with a
>
> (typically small) subset of possible characters. E.g. ASCII only knows
>
> about 127 characters out of the million-plus that Unicode deals with.
>
> Latin-1 can handle close to 256 different characters. If you have a say in
>
> the matter, always use UTF-8, since it can handle the full set of Unicode
>
> characters in the most efficient manner.
>
>
>
>
>
> --
>
> Steven
More information about the Python-list
mailing list