inserting Unicode character in dictionary - Python
Joe Strout
joe at strout.net
Fri Oct 17 15:38:23 EDT 2008
Thanks for the answers. That clears things up quite a bit.
>> What if your source file is set to utf-8? Do you then have a proper
>> UTF-8 string, but the problem is that none of the standard Python
>> library methods know how to properly interpret UTF-8?
>
> Well, the decode method knows how to decode that bytes into a
> `unicode`
> object if you call it with 'utf-8' as argument.
OK, good to know.
>> 4. In Python 3.0, this silliness goes away, because all strings are
>> Unicode by default.
>
> Yes and no. The problem just shifts because at some point you get
> into
> similar troubles, just in the other direction. Data enters the
> program
> as bytes and must leave it as bytes again, so you have to deal with
> encodings at those points.
Yes, but that's still much better than having to litter your code with
'u' prefixes and .decode calls and so on. If I'm using a UTF-8-savvy
text editor (as we all should be doing in the 21st century!), and type
"foo = '2π'", I should get a string containing a '2' and a pi
character, and all the text operations (like counting characters,
etc.) should Just Work.
When I read and write files or sockets or whatever, of course I'll
have to think about what encoding the text should be... but internal
to my own source code, I shouldn't have to.
I understand the need for a transition strategy, which is what we have
in 2.x, and that's working well enough. But I'll be glad when it's
over. :)
Cheers,
- Joe
More information about the Python-list
mailing list