sjmachin at lexicon.net
Sat Dec 20 00:38:26 CET 2008
On Dec 20, 10:02 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> On Fri, 19 Dec 2008 15:20:08 -0700, Joe Strout wrote:
> > Marc 'BlackJack' Rintsch wrote:
> >>> And because strings in Python, unlike in (say) REALbasic, do not know
> >>> their encoding -- they're just a string of bytes. If they were a
> >>> string of bytes PLUS an encoding, then every string would know what it
> >>> is, and things like conversion to another encoding, or concatenation
> >>> of two strings that may differ in encoding, could be handled
> >>> automatically.
> >>> I consider this one of the great shortcomings of Python, but it's
> >>> mostly just a temporary inconvenience -- the world is moving to
> >>> Unicode, and with Python 3, we won't have to worry about it so much.
> >> I don't see the shortcoming in Python <3.0. If you want real strings
> >> with characters instead of just a bunch of bytes simply use `unicode`
> >> objects instead of `str`.
> > Fair enough -- that certainly is the best policy. But working with any
> > other encoding (sometimes necessary when interfacing with any other
> > software), it's still a bit of a PITA.
> But it has to be. There is no automagic guessing possible.
> >> And does REALbasic really use byte strings plus an encoding!?
> > You betcha! Works like a dream.
> IMHO a strange design decision. A lot more hassle compared to an opaque
> unicode string type which uses some internal encoding that makes
> operations like getting a character at a given index easy or
> concatenating without the need to reencode.
In general I quite agree with you ... hoever with Unicode "getting a
character at a given index" is fine unless and until you stray (or are
dragged!) outside the BMP and you have only a 16-bit Unicode
More information about the Python-list