encoding problem
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Fri Dec 19 18:02:08 EST 2008
On Fri, 19 Dec 2008 15:20:08 -0700, Joe Strout wrote:
> Marc 'BlackJack' Rintsch wrote:
>
>>> And because strings in Python, unlike in (say) REALbasic, do not know
>>> their encoding -- they're just a string of bytes. If they were a
>>> string of bytes PLUS an encoding, then every string would know what it
>>> is, and things like conversion to another encoding, or concatenation
>>> of two strings that may differ in encoding, could be handled
>>> automatically.
>>>
>>> I consider this one of the great shortcomings of Python, but it's
>>> mostly just a temporary inconvenience -- the world is moving to
>>> Unicode, and with Python 3, we won't have to worry about it so much.
>>
>> I don't see the shortcoming in Python <3.0. If you want real strings
>> with characters instead of just a bunch of bytes simply use `unicode`
>> objects instead of `str`.
>
> Fair enough -- that certainly is the best policy. But working with any
> other encoding (sometimes necessary when interfacing with any other
> software), it's still a bit of a PITA.
But it has to be. There is no automagic guessing possible.
>> And does REALbasic really use byte strings plus an encoding!?
>
> You betcha! Works like a dream.
IMHO a strange design decision. A lot more hassle compared to an opaque
unicode string type which uses some internal encoding that makes
operations like getting a character at a given index easy or
concatenating without the need to reencode.
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list