[Cython] Bug: Memoryview of struct with adjacent string fields does not detect string boundaries

Joshua Adelman joshua.adelman at gmail.com
Thu Feb 6 16:21:04 CET 2014


This discussion was initially started on the cython user google group, but I wanted to move the issue over to the dev list, as per the suggestion on cython_trac. 

Given a numpy recarray containing two or more fixed-length string fields, if those string fields are adjacent to one another cython does not properly detect the boundary between the string fields. A concise test case demonstrating the problem is:

```cython
cimport numpy as np

cdef packed struct tstruct:
    np.float32_t a
    np.int16_t b
    char[6] c
    char[4] d

def test_struct(tstruct[:] x):
    pass
```

We then define some data on the python side:

```python
import numpy as np

a = np.recarray(3, dtype=[('a', np.float32),  ('b', np.int16), ('c', '|S6'), ('d', '|S4')])
a[0] = (1.1, 1, 'abcde', 'fgh')
a[1] = (2.1, 2, 'ijklm', 'nop')
a[2] = (3.1, 3, 'qrstu', 'vwx')

test_struct(a)
```

This results in the error:

---------------------------------------------------------------------------
ValueError       Traceback (most recent call last)

<ipython-input-12-ac01118a36a7> in <module>()
----> 1 test_struct(a)

ValueError: Expected a dimension of size 6, got 10


If we swap the order of the fields in the recarray and `tstruct` to (a,c,b,d) so that there is a numerical field between the string fields, then the function can parse the memory view correctly. 

The relevant line of code that catches the incorrect value of `enc_count` is: 
https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L468 

``` 
if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { 
            PyErr_Format(PyExc_ValueError, 
                         "Expected a dimension of size %zu, got %zu", 
                         ctx->head->field->type->arraysize[0], ctx->enc_count); 
            return -1; 
        } 
``` 

My naive guess is that there is something going on in: 
https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L738 

since that appears to be the only place where `enc_count` is being incremented. That would seem like the place where a boundary between two string fields might not be properly handled (the comment in the line above "Continue pooling same type" is suggestive.

I'll cross-post this on the cython trac once I have access and will then submit a pull request on Github of a test case once I have the trac issue number.

Josh






More information about the cython-devel mailing list