[Numpy-discussion] Segfault with python 3.2 structured array non-existent field

Matthew Brett matthew.brett at gmail.com
Wed Mar 16 00:48:19 EDT 2011


Hi,

On Tue, Mar 15, 2011 at 5:55 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Tue, Mar 15, 2011 at 5:30 PM, Christoph Gohlke <cgohlke at uci.edu> wrote:
>>
>>
>> On 3/15/2011 5:13 PM, Matthew Brett wrote:
>>> Hi,
>>>
>>> On Tue, Mar 15, 2011 at 10:23 AM, Matthew Brett<matthew.brett at gmail.com>  wrote:
>>>> Hi,
>>>>
>>>> On Tue, Mar 15, 2011 at 10:12 AM, Pauli Virtanen<pav at iki.fi>  wrote:
>>>>> Tue, 15 Mar 2011 10:06:09 -0700, Matthew Brett wrote:
>>>>>> Sorry to ask, and I ask partly because I'm in the middle of a py3k port,
>>>>>> but is this the right fix to this problem?  I was confused by the
>>>>>> presence of the old PyString_AsString function.
>>>>>
>>>>> It's not a correct fix. The original code seems also wrong ("index" can
>>>>> either be Unicode or Bytes/String object), and will probably bomb when
>>>>> indexing with Unicode strings on Python 2. The right thing to do is to
>>>>> make it show the repr of the "index" object.
>>>>
>>>> OK - I realize I'm being very lazy here but, do you mean:
>>>>
>>>>         PyErr_Format(PyExc_ValueError,
>>>>>>                       "field named %s not found.",
>>>>>>                       PyString_AsString(PyObject_Repr(index)));
>>>
>>> Being less lazy, and having read the cpython source, and read
>>> Christoph's mail more carefully, I believe Christoph's patch is
>>> correct...
>>>
>>> Unicode indexing of structured array fields doesn't raise an error on
>>> python 2.x; I assume because PyString_AsString is returning a char*
>>> using the Unicode default encoding, as per the docs.
>>>
>>
>> I think the patch is correct for Python 3 but, as Pauli pointed out, the
>> original code can crash also under Python 2.x when indexing with an
>> unicode string that contains non-ascii7 characters, which seems much
>> less likely and apparently has been undetected for quite a while. For
>> example, this crashes for me on Python 2.7:
>>
>> import numpy as np
>> a = np.zeros((1,), dtype=[('f1', 'f')])
>> a[u's'] = 1  # works
>> a[u'µ'] = 1  # crash
>>
>> So, the proposed patch works for Python 3, but there could be a better
>> patch fixing also the corner case on Python 2.
>
> Thanks.  How about something like the check further up the file:
>
>       if (PyUnicode_Check(op)) {
>            temp = PyUnicode_AsUnicodeEscapeString(op);
>        }
>        PyErr_Format(PyExc_ValueError,
>                     "field named %s not found.",
>                     PyBytes_AsString(temp));
>
> ? I'm happy to submit a pull request with tests if that kind of thing
> looks sensible to y'all,

That fix works for 2.6 but crashes 3.2 when it is rejecting (by design
I think) field indexing with byte strings.

I've put a test function in this branch which exercises the routes I
could think of.  It will crash current 2.x because of the unicode
error that Pauli pointed out, but I think it corresponds to the
expected behavior:

https://github.com/matthew-brett/numpy/compare/numpy:master...matthew-brett:struct-arr-fields

Do these tests look correct?  I'd be happy to hear why the code above
does not work, it wasn't obvious to me.

See you,

Matthew



More information about the NumPy-Discussion mailing list