[Cython] Bug: Memoryview of struct with adjacent string fields does not detect string boundaries
Joshua Adelman
joshua.adelman at gmail.com
Thu Feb 6 16:21:04 CET 2014
This discussion was initially started on the cython user google group, but I wanted to move the issue over to the dev list, as per the suggestion on cython_trac.
Given a numpy recarray containing two or more fixed-length string fields, if those string fields are adjacent to one another cython does not properly detect the boundary between the string fields. A concise test case demonstrating the problem is:
```cython
cimport numpy as np
cdef packed struct tstruct:
np.float32_t a
np.int16_t b
char[6] c
char[4] d
def test_struct(tstruct[:] x):
pass
```
We then define some data on the python side:
```python
import numpy as np
a = np.recarray(3, dtype=[('a', np.float32), ('b', np.int16), ('c', '|S6'), ('d', '|S4')])
a[0] = (1.1, 1, 'abcde', 'fgh')
a[1] = (2.1, 2, 'ijklm', 'nop')
a[2] = (3.1, 3, 'qrstu', 'vwx')
test_struct(a)
```
This results in the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-ac01118a36a7> in <module>()
----> 1 test_struct(a)
ValueError: Expected a dimension of size 6, got 10
If we swap the order of the fields in the recarray and `tstruct` to (a,c,b,d) so that there is a numerical field between the string fields, then the function can parse the memory view correctly.
The relevant line of code that catches the incorrect value of `enc_count` is:
https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L468
```
if (ctx->enc_count != ctx->head->field->type->arraysize[0]) {
PyErr_Format(PyExc_ValueError,
"Expected a dimension of size %zu, got %zu",
ctx->head->field->type->arraysize[0], ctx->enc_count);
return -1;
}
```
My naive guess is that there is something going on in:
https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L738
since that appears to be the only place where `enc_count` is being incremented. That would seem like the place where a boundary between two string fields might not be properly handled (the comment in the line above "Continue pooling same type" is suggestive.
I'll cross-post this on the cython trac once I have access and will then submit a pull request on Github of a test case once I have the trac issue number.
Josh
More information about the cython-devel
mailing list