Prothon should not borrow Python strings!

Michael Geary Mike at DeleteThis.Geary.com
Mon May 24 20:26:29 EDT 2004


Mark Hahn wrote:
> Wow, thanks, this is some really great stuff.  I'm going to have
> to go off and study up on it.
>
> This may be a stupid question, but couldn't I have many "types"
> of strings and some be 8-bits, some 16-bits, and some 32-bits?
> Couldn't normal method overloading handle the type conversion?
> Why is there all this confusion?  Isn't this what object-centric
> computing is designed for?

In fact, there are several different encodings for Unicode strings: UTF-8,
UTF-16, and UTF-32. UTF-16 and UTF-32 each come in big-endian and
little-endian variations, or a Byte Order Mark (BOM) at the beginning of the
string can be used to tell you which it is.

UTF-8 is pretty nice for a lot of purposes. It includes the 7-bit ASCII
character set unchanged and avoids the endian problems. You could specify
that Prothon source code uses UTF-8, although you'd still want to support
the other UTFs for data.

There's a good FAQ on the UTFs here:

http://www.unicode.org/unicode/faq/utf_bom.html

-Mike





More information about the Python-list mailing list