inserting Unicode character in dictionary - Python

Joe Strout joe at strout.net
Fri Oct 17 15:38:23 EDT 2008


Thanks for the answers.  That clears things up quite a bit.

>> What if your source file is set to utf-8?  Do you then have a proper
>> UTF-8 string, but the problem is that none of the standard Python
>> library methods know how to properly interpret UTF-8?
>
> Well, the decode method knows how to decode that bytes into a  
> `unicode`
> object if you call it with 'utf-8' as argument.

OK, good to know.

>> 4. In Python 3.0, this silliness goes away, because all strings are
>> Unicode by default.
>
> Yes and no.  The problem just shifts because at some point you get  
> into
> similar troubles, just in the other direction.  Data enters the  
> program
> as bytes and must leave it as bytes again, so you have to deal with
> encodings at those points.

Yes, but that's still much better than having to litter your code with  
'u' prefixes and .decode calls and so on.  If I'm using a UTF-8-savvy  
text editor (as we all should be doing in the 21st century!), and type  
"foo = '2π'", I should get a string containing a '2' and a pi  
character, and all the text operations (like counting characters,  
etc.) should Just Work.

When I read and write files or sockets or whatever, of course I'll  
have to think about what encoding the text should be... but internal  
to my own source code, I shouldn't have to.

I understand the need for a transition strategy, which is what we have  
in 2.x, and that's working well enough.  But I'll be glad when it's  
over.  :)

Cheers,
- Joe





More information about the Python-list mailing list