UnicodeEncodeError when not running script from IDE

Magnus Pettersson magpettersson at gmail.com
Wed Feb 13 01:40:46 CET 2013


Thanks a lot Steven, you gave me a good AHA experience! :)

Now I understand why I had to use encoding when calling the urllib2! So basically Eclipse PyDev does this in the background for me, and its console supports utf-8, so thats why i never had to think about it before (and why some scripts tends to fail with unicode errors when run outside of the Eclipse IDE).

cheers
Magnus

> Start here:
> 
> 
> 
> "The Absolute Minimum Every Software Developer Absolutely, Positively Must
> 
> Know About Unicode and Character Sets (No Excuses!)"
> 
> 
> 
> http://www.joelonsoftware.com/articles/Unicode.html
> 
> 
> 
> 
> 
> Basically, Unicode is an in-memory data format. Python knows about Unicode
> 
> characters (to be technical: code points), but files on disk do not.
> 
> Neither do network protocols, or terminals, or other simple devices. They
> 
> only understand bytes.
> 
> 
> 
> So when you have Unicode text, and you want to write it to a file on disk,
> 
> or print it, or send it over the network to another machine, it has to be
> 
> *encoded* into bytes, and then *decoded* back into Unicode when you read it
> 
> from the file again. Sometimes the system will "helpfully" do that encoding
> 
> and decoding automatically for you, which is fine when it works but when it
> 
> doesn't it can be perplexing.
> 
> 
> 
> There are many, many, many different *encoding schemes*. ASCII is one. UTF-8
> 
> is another. And then there are about a bazillion legacy encodings which, if
> 
> you are lucky, you will never need to care about. Only some encodings can
> 
> deal with the entire range of Unicode characters, most can only deal with a
> 
> (typically small) subset of possible characters. E.g. ASCII only knows
> 
> about 127 characters out of the million-plus that Unicode deals with.
> 
> Latin-1 can handle close to 256 different characters. If you have a say in
> 
> the matter, always use UTF-8, since it can handle the full set of Unicode
> 
> characters in the most efficient manner.
> 
> 
> 
> 
> 
> -- 
> 
> Steven




More information about the Python-list mailing list