[Python-3000] Making more effective use of slice objects in Py3k

Guido van Rossum guido at python.org
Fri Sep 1 06:13:29 CEST 2006


On 8/31/06, Talin <talin at acm.org> wrote:
> > Here you are effectively voting against polymorphic strings. I believe
> > Fredrik has good reasons to doubt this assertion.
>
> Yes, that is correct. I'm just throwing it out there as a possibility,
> as it is by far the simplest solution. Its a question of trading memory
> use for simplicity of implementation. Having a single, flat, internal
> representation for all strings would be much less complex than having
> different string types.

I think you don't realize the significance of the immediate
enthusiastic +1 votes from several OSX developers.

These people are quite familiar with ObjectiveC. ObjectiveC has true
polymorphic strings, and the internal representation *can* be UTF-8.
These developers love that.

For most practical purposes the internal representation is abstracted
away from the application; *however* it is possible to go below this
level, especially for I/O (I believe). The net effect, if I understand
correctly, is that you can save yourself a lot of copying if you are
mostly just moving whole strings around and doing relatively little
slicing and dicing -- it avoids converting from UTF-8 (which is by far
the most common external representation) to UCS-2 or UCS-4 and back
again.

I don't think these advantages are maintained by your "narrowest
constant-width encoding that fits all the characters" proposal.

I'm not saying that we should definitely adopt this -- it may well be
that the ObjectiveC string API is significantly different from
Python's (e.g. it could have less emphasis on character indices and
character counts) so that the benefits would be lost in translation --
but I'm not sure that the added complexity of your proposal is
warranted if it still requires encoding and decoding on most I/O
operations.

BTW, in some sense Python 2.x *has* polymorphic strings -- str and
unicde have the same API (99% anyway) but different implementations,
and there's even a common abstract base class (basestring). But this
clearly isn't what the ObjectiveC folks want to see!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list