[Numpy-discussion] Segfault with python 3.2 structured array non-existent field

Tue Mar 15 20:55:32 EDT 2011

Hi,

On Tue, Mar 15, 2011 at 5:30 PM, Christoph Gohlke <cgohlke at uci.edu> wrote:
>
>
> On 3/15/2011 5:13 PM, Matthew Brett wrote:
>> Hi,
>>
>> On Tue, Mar 15, 2011 at 10:23 AM, Matthew Brett<matthew.brett at gmail.com>  wrote:
>>> Hi,
>>>
>>> On Tue, Mar 15, 2011 at 10:12 AM, Pauli Virtanen<pav at iki.fi>  wrote:
>>>> Tue, 15 Mar 2011 10:06:09 -0700, Matthew Brett wrote:
>>>>> Sorry to ask, and I ask partly because I'm in the middle of a py3k port,
>>>>> but is this the right fix to this problem?  I was confused by the
>>>>> presence of the old PyString_AsString function.
>>>>
>>>> It's not a correct fix. The original code seems also wrong ("index" can
>>>> either be Unicode or Bytes/String object), and will probably bomb when
>>>> indexing with Unicode strings on Python 2. The right thing to do is to
>>>> make it show the repr of the "index" object.
>>>
>>> OK - I realize I'm being very lazy here but, do you mean:
>>>
>>>         PyErr_Format(PyExc_ValueError,
>>>>>                       "field named %s not found.",
>>>>>                       PyString_AsString(PyObject_Repr(index)));
>>
>> Being less lazy, and having read the cpython source, and read
>> Christoph's mail more carefully, I believe Christoph's patch is
>> correct...
>>
>> Unicode indexing of structured array fields doesn't raise an error on
>> python 2.x; I assume because PyString_AsString is returning a char*
>> using the Unicode default encoding, as per the docs.
>>
>
> I think the patch is correct for Python 3 but, as Pauli pointed out, the
> original code can crash also under Python 2.x when indexing with an
> unicode string that contains non-ascii7 characters, which seems much
> less likely and apparently has been undetected for quite a while. For
> example, this crashes for me on Python 2.7:
>
> import numpy as np
> a = np.zeros((1,), dtype=[('f1', 'f')])
> a[u's'] = 1  # works
> a[u'µ'] = 1  # crash
>
> So, the proposed patch works for Python 3, but there could be a better
> patch fixing also the corner case on Python 2.

Thanks.  How about something like the check further up the file:

       if (PyUnicode_Check(op)) {
            temp = PyUnicode_AsUnicodeEscapeString(op);
        }
        PyErr_Format(PyExc_ValueError,
                     "field named %s not found.",
                     PyBytes_AsString(temp));

? I'm happy to submit a pull request with tests if that kind of thing
looks sensible to y'all,

Matthew