[Python-3000] characters data type

Talin talin at acm.org
Tue May 2 23:22:47 CEST 2006


Guido van Rossum <guido <at> python.org> writes:

> On 5/1/06, Talin <talin <at> acm.org> wrote:
> > Given that strings are going to be unicode, will there be a "characters"
> > data type to go along with the "bytes" data type?
> 
> No. I'm not sure what you mean by "characters" but the only characters
> that Python will support are Unicode characters. Python's 'str' and
> 'bytes' will be like String and byte[] in Java. But there won't be a
> separate "char" type to represent the elements of 'str' -- like
> before, a 1-char string will server nicely to represent a "character".
> And a byte is represented by a Python int -- there won't be a separate
> int-ish type constrained to range(0, 256).

It appears that my question has been misunderstood by everyone; I'll try
to phrase it better:

The short version is: will there be a mutable character array type? (which
I am calling "characters"?)

First, I do use array, not a lot but I do use it occasionally. One common
use case is equivalent to the Java StringBuffer class - that is, a means
for building up strings a character at a time, which otherwise would be
expensive to do with immutable strings.

Now, from the discussion of "bytes" I get the impression that it, too, is
a mutable type (someone said 'like list'). So given that 'characters' (i.e.
unicode characters) are now distinct from 'bytes', it makes sense to me to
declare a mutable character array. And to me, the most natural name for
such a type is 'characters', although I suppose you could also call it
"stringbuffer" or something.

BTW, is the internal encoding of unicode strings UTF-8, UTF-16, UCS-2, or
UTF-32? Just wondering...

-- Talin




More information about the Python-3000 mailing list