encoding problem
Joe Strout
joe at strout.net
Fri Dec 19 17:20:08 EST 2008
Marc 'BlackJack' Rintsch wrote:
>> And because strings in Python, unlike in (say) REALbasic, do not know
>> their encoding -- they're just a string of bytes. If they were a string
>> of bytes PLUS an encoding, then every string would know what it is, and
>> things like conversion to another encoding, or concatenation of two
>> strings that may differ in encoding, could be handled automatically.
>>
>> I consider this one of the great shortcomings of Python, but it's mostly
>> just a temporary inconvenience -- the world is moving to Unicode, and
>> with Python 3, we won't have to worry about it so much.
>
> I don't see the shortcoming in Python <3.0. If you want real strings
> with characters instead of just a bunch of bytes simply use `unicode`
> objects instead of `str`.
Fair enough -- that certainly is the best policy. But working with any
other encoding (sometimes necessary when interfacing with any other
software), it's still a bit of a PITA.
> And does REALbasic really use byte strings plus an encoding!?
You betcha! Works like a dream.
> Sounds strange. When concatenating which encoding "wins"?
The one that is a superset of the other, or if neither is, then both are
converted to UTF-8 (which is the "standard" encoding in RB, though it
works comfily with any other too).
Cheers,
- Joe
More information about the Python-list
mailing list