[Python-3000] Poll: Lazy Unicode Strings For Py3k

Josiah Carlson jcarlson at uci.edu
Wed Jan 31 23:00:50 CET 2007


"Jim Jewett" <jimjjewett at gmail.com> wrote:
> On 1/31/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > "Jim Jewett" <jimjjewett at gmail.com> wrote:
> > > On 1/31/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > > ... I believe that the added complexity to the *base type*
> > > > is too much, while a wrapper object would be free to do just about
> > > > anything
> 
> > > ... string-like object which doesn't claim to inherit from str, and has a
> > > subtly different API?  That seems to get the bad from both choices,
> 
> > Do you remember my "string view" post from last September/October or so?
> > It implemented almost all of the string API exactly as the string API
> > did, except that rather than returning strings, it returned views.
> 
> So there would be places where you couldn't safely use it, even though
> it had all the required functionality.

Almost certainly, but the point is that you could get back to what you
wanted via str(obj), unicode(obj), etc., which would incur (in the worst
case) the overhead you saved before, or raise a MemoryError exception 
(unless its linux, in which case you will likely segfault).


> How would you feel if it also
> 
> (1)  Claimed to be a subclass of str (though it might not actually
> inherit anything)
> (2)  Implemented the rest of the methods by delegation.  (Call str on
> itself, switch its "real" object to the new string, and delegate to
> that.)

I'm not terribly concerned about the implementation details of an object
I don't need to use.  As long as it works, it is fine.  I am concerned
about the implementation details of objects I will use.


> If these slight extensions are OK, then we agree that it should be
> possible to use multiple implementations of str, and I don't see the
> point of insisting that your strview isn't a real string.  (And then
> we move to the next step, where I ask that the API stop promising to
> preallocate the data in a particular place.)

I made the strview not a real string out of convenience of
implementation and what I knew about Pyrex at the time.  Whether or not
some *other* implementation of concatenation or view wrapper claims to
be a str subclass is up to them.

I believe the base type included with Python should allocate the memory
on creation.  Why?  Because the implementation is simple, and I believe
that a base type implementation should be as simple as possible.

    Simple is better than complex.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.

Are they simple?  Hard or easy to explain?  I don't know.  But
view/concatenations are *more complex* than what we have, and are
*harder to explain*.


 - Josiah



More information about the Python-3000 mailing list