[Numpy-discussion] Segfault with python 3.2 structured array non-existent field

Wed Mar 16 00:48:19 EDT 2011

Hi,

On Tue, Mar 15, 2011 at 5:55 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Tue, Mar 15, 2011 at 5:30 PM, Christoph Gohlke <cgohlke at uci.edu> wrote:
>>
>>
>> On 3/15/2011 5:13 PM, Matthew Brett wrote:
>>> Hi,
>>>
>>> On Tue, Mar 15, 2011 at 10:23 AM, Matthew Brett<matthew.brett at gmail.com>  wrote:
>>>> Hi,
>>>>
>>>> On Tue, Mar 15, 2011 at 10:12 AM, Pauli Virtanen<pav at iki.fi>  wrote:
>>>>> Tue, 15 Mar 2011 10:06:09 -0700, Matthew Brett wrote:
>>>>>> Sorry to ask, and I ask partly because I'm in the middle of a py3k port,
>>>>>> but is this the right fix to this problem?  I was confused by the
>>>>>> presence of the old PyString_AsString function.
>>>>>
>>>>> It's not a correct fix. The original code seems also wrong ("index" can
>>>>> either be Unicode or Bytes/String object), and will probably bomb when
>>>>> indexing with Unicode strings on Python 2. The right thing to do is to
>>>>> make it show the repr of the "index" object.
>>>>
>>>> OK - I realize I'm being very lazy here but, do you mean:
>>>>
>>>>         PyErr_Format(PyExc_ValueError,
>>>>>>                       "field named %s not found.",
>>>>>>                       PyString_AsString(PyObject_Repr(index)));
>>>
>>> Being less lazy, and having read the cpython source, and read
>>> Christoph's mail more carefully, I believe Christoph's patch is
>>> correct...
>>>
>>> Unicode indexing of structured array fields doesn't raise an error on
>>> python 2.x; I assume because PyString_AsString is returning a char*
>>> using the Unicode default encoding, as per the docs.
>>>
>>
>> I think the patch is correct for Python 3 but, as Pauli pointed out, the
>> original code can crash also under Python 2.x when indexing with an
>> unicode string that contains non-ascii7 characters, which seems much
>> less likely and apparently has been undetected for quite a while. For
>> example, this crashes for me on Python 2.7:
>>
>> import numpy as np
>> a = np.zeros((1,), dtype=[('f1', 'f')])
>> a[u's'] = 1  # works
>> a[u'µ'] = 1  # crash
>>
>> So, the proposed patch works for Python 3, but there could be a better
>> patch fixing also the corner case on Python 2.
>
> Thanks.  How about something like the check further up the file:
>
>       if (PyUnicode_Check(op)) {
>            temp = PyUnicode_AsUnicodeEscapeString(op);
>        }
>        PyErr_Format(PyExc_ValueError,
>                     "field named %s not found.",
>                     PyBytes_AsString(temp));
>
> ? I'm happy to submit a pull request with tests if that kind of thing
> looks sensible to y'all,

That fix works for 2.6 but crashes 3.2 when it is rejecting (by design
I think) field indexing with byte strings.

I've put a test function in this branch which exercises the routes I
could think of.  It will crash current 2.x because of the unicode
error that Pauli pointed out, but I think it corresponds to the
expected behavior:

https://github.com/matthew-brett/numpy/compare/numpy:master...matthew-brett:struct-arr-fields

Do these tests look correct?  I'd be happy to hear why the code above
does not work, it wasn't obvious to me.

See you,

Matthew