<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Guido van Rossum wrote:
<blockquote
cite="midca471dc20701120716s7166f458le5eb632b1bf8345a@mail.gmail.com"
type="cite">
<blockquote type="cite">As discussed on that page, the current
version of the patch could cause
<br>
crashes in low-memory conditions. I welcome suggestions on how best to
<br>
resolve this problem. Apart from that fly in the ointment I'm pretty
<br>
happy with how it all turned out.
<br>
</blockquote>
What kind of crashes? The right thing to do is to raise MemoryError.
<br>
Is there anything besides sheer will power that prevents that?</blockquote>
Nothing *has* prevented that; the relevant code already calls
PyErr_NoMemory(). The problem is that *its* callers currently won't
notice, and continue on their merry way.<br>
<br>
My patch adds a new wrinkle to the API: PyUnicode_AS_UNICODE() can now
fail. And currently when it fails it returns NULL. (Why could it
fail? Under the covers, PyUnicode_AS_UNICODE() may attempt to allocate
memory.)<br>
<br>
Without the patch PyUnicode_AS_UNICODE() always works. Since no caller
ever expects it to fail, code looks like this:<br>
<blockquote>static<br>
int fixupper(PyUnicodeObject *self)<br>
{<br>
Py_ssize_t len = self->length;<br>
Py_UNICODE *s = PyUnicode_AS_UNICODE(self);<br>
int status = 0;<br>
<br>
while (len-- > 0) {<br>
register Py_UNICODE ch;<br>
<br>
ch = Py_UNICODE_TOUPPER(*s);<br>
...<br>
</blockquote>
And there you are; when s is NULL, Python crashes.<br>
<br>
In the patch comments I proposed four possible solutions for this
problem, listed in order of least-likely to most-likely. I just came
up with a fifth one, and I'll include it here.<br>
<ol>
<li>Redefine the API such that PyUnicode_AS_UNICODE() is allowed to
return NULL, and fix every place in the Python source tree that calls
it to check for a NULL return. Document this with strong language for
external C module authors.</li>
<li>Pre-allocate the str buffer used to render the lazy string
objects.
Update this buffer whenever the size of the string changes. That moves
the failure to a better place for error reporting; once again
PyUnicode_AS_UNICODE() can never fail. But this approach also negates
a
healthy chunk of what made the patch faster.</li>
<li>Change the length to 0 and return a constant empty string.
Suggest that users of the Unicode API ask for the pointer *first* and
the length *second*.</li>
<li>Change the length to 0 and return a previously-allocated buffer
of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that
even if the caller iterates over the buffer, odds are good they'll stop
before they hit the end. Again, suggest that users of the Unicode API
ask for the pointer *first* and the length *second*.</li>
<li>The patch is not accepted.</li>
</ol>
(You see what an optimist I am.)<br>
<br>
I'm open to suggestions (and patches!) of other approaches to solve
this problem.<br>
<br>
Cheers,<br>
<br>
<br>
<i>larry</i><br>
</body>
</html>