[Numpy-discussion] Segfault with python 3.2 structured array non-existent field
Matthew Brett
matthew.brett at gmail.com
Tue Mar 15 20:55:32 EDT 2011
Hi,
On Tue, Mar 15, 2011 at 5:30 PM, Christoph Gohlke <cgohlke at uci.edu> wrote:
>
>
> On 3/15/2011 5:13 PM, Matthew Brett wrote:
>> Hi,
>>
>> On Tue, Mar 15, 2011 at 10:23 AM, Matthew Brett<matthew.brett at gmail.com> wrote:
>>> Hi,
>>>
>>> On Tue, Mar 15, 2011 at 10:12 AM, Pauli Virtanen<pav at iki.fi> wrote:
>>>> Tue, 15 Mar 2011 10:06:09 -0700, Matthew Brett wrote:
>>>>> Sorry to ask, and I ask partly because I'm in the middle of a py3k port,
>>>>> but is this the right fix to this problem? I was confused by the
>>>>> presence of the old PyString_AsString function.
>>>>
>>>> It's not a correct fix. The original code seems also wrong ("index" can
>>>> either be Unicode or Bytes/String object), and will probably bomb when
>>>> indexing with Unicode strings on Python 2. The right thing to do is to
>>>> make it show the repr of the "index" object.
>>>
>>> OK - I realize I'm being very lazy here but, do you mean:
>>>
>>> PyErr_Format(PyExc_ValueError,
>>>>> "field named %s not found.",
>>>>> PyString_AsString(PyObject_Repr(index)));
>>
>> Being less lazy, and having read the cpython source, and read
>> Christoph's mail more carefully, I believe Christoph's patch is
>> correct...
>>
>> Unicode indexing of structured array fields doesn't raise an error on
>> python 2.x; I assume because PyString_AsString is returning a char*
>> using the Unicode default encoding, as per the docs.
>>
>
> I think the patch is correct for Python 3 but, as Pauli pointed out, the
> original code can crash also under Python 2.x when indexing with an
> unicode string that contains non-ascii7 characters, which seems much
> less likely and apparently has been undetected for quite a while. For
> example, this crashes for me on Python 2.7:
>
> import numpy as np
> a = np.zeros((1,), dtype=[('f1', 'f')])
> a[u's'] = 1 # works
> a[u'µ'] = 1 # crash
>
> So, the proposed patch works for Python 3, but there could be a better
> patch fixing also the corner case on Python 2.
Thanks. How about something like the check further up the file:
if (PyUnicode_Check(op)) {
temp = PyUnicode_AsUnicodeEscapeString(op);
}
PyErr_Format(PyExc_ValueError,
"field named %s not found.",
PyBytes_AsString(temp));
? I'm happy to submit a pull request with tests if that kind of thing
looks sensible to y'all,
Matthew
More information about the NumPy-Discussion
mailing list