[Python-3000] Lazy strings (was Re: Py3k release schedule worries)

Guido van Rossum guido at python.org
Sun Dec 31 19:43:07 CET 2006


On 12/30/06, Talin <talin at acm.org> wrote:
> Maybe this would be a good time to review, or at least restate, the
> specific plans for strings in Py3K? I know that there's been a great
> deal of discussion on this, but a lot of that discussion took place
> *before* Larry's work (specifically, before a number of people in this
> group drank the lazy-strings KoolAid.)

The *only* think that I care about is that we end up with a single
string type named 'str' that has the same semantics (and some of the
same performance) as the current Unicode strings. (I mention
performance because s[i] should remain an O(1) operation.)

I don't particularly care about preserving the C API except from a
practical standpoint -- changing all uses of strings would require
modifying nearly every other line of the interpreter code, so a decent
amount of backward compatibility will help us meet the deadline (of an
alpha release by the end of Q2 '07).

> I'm specifically concerned about avoiding confusion over the "lazy"
> aspect of strings, because there's two kinds of "laziness" that has been
> discussed here: Lazy string manipulation (slice and join), and lazy
> format conversion (8-bit, 16-bit, 32-bit.) Both are, I think, desirable.
> They are also inter-related, in that the design of one likely affects
> the other, so I don't think it makes sense to discuss these issues in
> isolation.
>
> Is there a PEP which defines what is going to happen? I specifically
> refer to issues of:
>
>     -- Internal representation of varying-width string encodings
>     -- On-the-fly encoding changes
>     -- C API changes
>     -- String 'views'
>     -- Lazy slicing and concatenation.
>     -- Performance expectations for all of the above.

I tuned out the endless discussions about this, and I doubt that
anyone is really eager to repeat them. At this point I prefer to let
code speak. If someone thinks they can do better, let them produce a
competing patch, and a benchmark that we can agree on. It seems
unlikely that we'll be able to address all the issues you mention
without doing a lot of implementation work; while I don't recommend
jumping in without doing some serious thinking, I really don't want to
get back in the mode of endless discussion without a basis in fact.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list