<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

Guido van Rossum wrote:

<blockquote

 cite="midca471dc20701120716s7166f458le5eb632b1bf8345a@mail.gmail.com"

 type="cite">

  <blockquote type="cite">As discussed on that page, the current

version of the patch could cause

    <br>

crashes in low-memory conditions.&nbsp; I welcome suggestions on how best to

    <br>

resolve this problem.&nbsp; Apart from that fly in the ointment I'm pretty

    <br>

happy with how it all turned out.

    <br>

  </blockquote>

What kind of crashes? The right thing to do is to raise MemoryError.

  <br>

Is there anything besides sheer will power that prevents that?</blockquote>

Nothing *has* prevented that; the relevant code already calls

PyErr_NoMemory().&nbsp; The problem is that *its* callers currently won't

notice, and continue on their merry way.<br>

<br>

My patch adds a new wrinkle to the API: PyUnicode_AS_UNICODE() can now

fail.&nbsp; And currently when it fails it returns NULL.&nbsp; (Why could it

fail?&nbsp; Under the covers, PyUnicode_AS_UNICODE() may attempt to allocate

memory.)<br>

<br>

Without the patch PyUnicode_AS_UNICODE() always works.&nbsp; Since no caller

ever expects it to fail, code looks like this:<br>

<blockquote>static<br>

int fixupper(PyUnicodeObject *self)<br>

{<br>

&nbsp;&nbsp;&nbsp; Py_ssize_t len = self-&gt;length;<br>

&nbsp;&nbsp;&nbsp; Py_UNICODE *s = PyUnicode_AS_UNICODE(self);<br>

&nbsp;&nbsp;&nbsp; int status = 0;<br>

  <br>

&nbsp;&nbsp;&nbsp; while (len-- &gt; 0) {<br>

&nbsp;&nbsp;&nbsp; register Py_UNICODE ch;<br>

  <br>

&nbsp;&nbsp;&nbsp; ch = Py_UNICODE_TOUPPER(*s);<br>

&nbsp;&nbsp;&nbsp; ...<br>

</blockquote>

And there you are; when s is NULL, Python crashes.<br>

<br>

In the patch comments I proposed four possible solutions for this

problem, listed in order of least-likely to most-likely.&nbsp; I just came

up with a fifth one, and I'll include it here.<br>

<ol>

  <li>Redefine the API such that PyUnicode_AS_UNICODE() is allowed to

return NULL, and fix every place in the Python source tree that calls

it to check for a NULL return.&nbsp; Document this with strong language for

external C module authors.</li>

  <li>Pre-allocate the str buffer used to render the lazy string

objects.&nbsp;

Update this buffer whenever the size of the string changes.&nbsp; That moves

the failure to a better place for error reporting; once again

PyUnicode_AS_UNICODE() can never fail.&nbsp; But this approach also negates

a

healthy chunk of what made the patch faster.</li>

  <li>Change the length to 0 and return a constant empty string.&nbsp;

Suggest that users of the Unicode API ask for the pointer *first* and

the length *second*.</li>

  <li>Change the length to 0 and return a previously-allocated buffer

of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that

even if the caller iterates over the buffer, odds are good they'll stop

before they hit the end.&nbsp; Again, suggest that users of the Unicode API

ask for the pointer *first* and the length *second*.</li>

  <li>The patch is not accepted.</li>

</ol>

(You see what an optimist I am.)<br>

<br>

I'm open to suggestions (and patches!) of other approaches to solve

this problem.<br>

<br>

Cheers,<br>

<br>

<br>

<i>larry</i><br>

</body>

</html>