[Python-Dev] [Python-checkins] cpython: pyexpat uses the new Unicode API

Tue Oct 4 11:48:42 CEST 2011

Le 03/10/2011 11:10, Amaury Forgeot d'Arc a écrit :
>> changeset:   72548:a1be34457ccf
>> user:        Victor Stinner<victor.stinner at haypocalc.com>
>> date:        Sat Oct 01 01:05:40 2011 +0200
>> summary:
>>    pyexat uses the new Unicode API
>>
>> files:
>>    Modules/pyexpat.c |  12 +++++++-----
>>    1 files changed, 7 insertions(+), 5 deletions(-)
>>
>>
>> diff --git a/Modules/pyexpat.c b/Modules/pyexpat.c
>> --- a/Modules/pyexpat.c
>> +++ b/Modules/pyexpat.c
>> @@ -1234,11 +1234,13 @@
>>   static PyObject *
>>   xmlparse_getattro(xmlparseobject *self, PyObject *nameobj)
>>   {
>> -    const Py_UNICODE *name;
>> +    Py_UCS4 first_char;
>>       int handlernum = -1;
>>
>>       if (!PyUnicode_Check(nameobj))
>>           goto generic;
>> +    if (PyUnicode_READY(nameobj))
>> +        return NULL;
>
> Why is this PyUnicode_READY necessary?
> Can tp_getattro pass unfinished unicode objects?
> I hope we don't have to update all extension modules?

The Unicode API is supposed to only deliver ready strings. But all 
extensions written for Python 3.2 use the "legacy" API 
(PyUnicode_FromUnicode and PyUnicode_FromString(NULL, size)) and so no 
string is ready.

But *no*, you don't have to update your extension reading strings to add 
a call to PyUnicode_READY. You only have to call PyUnicode_READY if you 
use the new API (e.g. PyUnicode_READ_CHAR), so if you modify your code. 
Another extract of my commit (on pyexpat):

-    name = PyUnicode_AS_UNICODE(nameobj);
+    first_char = PyUnicode_READ_CHAR(nameobj, 0);

Victor