[Python-3000] Lazy strings (was Re: Py3k release schedule worries)

Guido van Rossum guido at python.org
Fri Jan 12 21:31:24 CET 2007


[Guido]
> > I don't understand what you mean by #3 and #4; change *which* length?
> > The phrasing of #3 using "hopefully-big-enough" and "odds" immediately
> > makes me think "buffer overflow attack" which is a non-starter.

On 1/12/07, Larry Hastings <larry at hastings.org> wrote:
> Change the length of the string.

But IIUC the string may already have been seen by other code, right?
This violates immutability, and that's not acceptable.

>  Example: our lazy string should be 4032 bytes long, but PyMem_NEW() returns
> NULL.  If I change the string's length to 0 and return a non-NULL string,
> then callers don't have to explicitly check for failure; all they need do to
> work correctly is ask for the length *after* asking for the pointer.  That
> way, they'll think the string is zero-length, they won't do any processing,
> and they'll exit normally.  (The "hopefully-big-enough" buffer was only in
> case they'd asked for the length *first*; it was just to give the caller
> something to examine.  If it's longer then the string they tried to render
> they won't crash.)  Truncating the string changes the meaning of the
> program, but I figured that's okay as we've already thrown a memory
> exception and the program has already failed.

That's not the right attitude towards memory exceptions. You're
supposed to be able to clean up in finally clauses and __del__
methods, and all that still requires that existing objects remain
intact.

You seem to be proposing to change the MemoryError exception into a
fatal error (which would mean *no* cleanup happens) but that seems a
real reduction in functionality.

>  Actually I think this approach could work pretty well if we were willing to
> change the API.

Changing the API is the only reasonable solution amongst all the
options I've seen.

> If PyUnicode_AS_UNICODE(self, p, len) gave you the
> Py_UNICODE * in p and the length in len, I could ensure that len was 0 if
> rendering failed.  The API's contract would require that if you call
> PyUnicode_AS_UNICODE(), you must use the length it gives you
> (previously-queried lengths might be "stale"), but this would be easy for
> callers to get right.  I can produce a variant of the patch that works this
> way if you think it'll help.

> > Finally (unrelated to the memory problem) I'd like to see some benchmarks to
> > prove that this is really worth it.

>  I'll try to post some today.  For 8-bit strings, concatenation was about as
> fast as the array-append-then-"".join() idiom, and the StringSlicing
> benchark in pystone was 65% faster.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list