[Cython] Bug: Memoryview of struct with adjacent string fields does not detect string boundaries

Stefan Behnel stefan_ml at behnel.de
Sat Feb 8 19:58:31 CET 2014


Joshua Adelman, 06.02.2014 16:21:
> This discussion was initially started on the cython user google group, but I wanted to move the issue over to the dev list, as per the suggestion on cython_trac. 

Original discussion is here:

http://thread.gmane.org/gmane.comp.python.cython.user/10646


> Given a numpy recarray containing two or more fixed-length string fields, if those string fields are adjacent to one another cython does not properly detect the boundary between the string fields. A concise test case demonstrating the problem is:
> 
> ```cython
> cimport numpy as np
> 
> cdef packed struct tstruct:
>     np.float32_t a
>     np.int16_t b
>     char[6] c
>     char[4] d
> 
> def test_struct(tstruct[:] x):
>     pass
> ```
> 
> We then define some data on the python side:
> 
> ```python
> import numpy as np
> 
> a = np.recarray(3, dtype=[('a', np.float32),  ('b', np.int16), ('c', '|S6'), ('d', '|S4')])
> a[0] = (1.1, 1, 'abcde', 'fgh')
> a[1] = (2.1, 2, 'ijklm', 'nop')
> a[2] = (3.1, 3, 'qrstu', 'vwx')
> 
> test_struct(a)
> ```
> 
> This results in the error:
> 
> ---------------------------------------------------------------------------
> ValueError       Traceback (most recent call last)
> 
> <ipython-input-12-ac01118a36a7> in <module>()
> ----> 1 test_struct(a)
> 
> ValueError: Expected a dimension of size 6, got 10
> 
> 
> If we swap the order of the fields in the recarray and `tstruct` to (a,c,b,d) so that there is a numerical field between the string fields, then the function can parse the memory view correctly. 
> 
> The relevant line of code that catches the incorrect value of `enc_count` is: 
> https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L468 
> 
> ``` 
> if (ctx->enc_count != ctx->head->field->type->arraysize[0]) { 
>             PyErr_Format(PyExc_ValueError, 
>                          "Expected a dimension of size %zu, got %zu", 
>                          ctx->head->field->type->arraysize[0], ctx->enc_count); 
>             return -1; 
>         } 
> ``` 
> 
> My naive guess is that there is something going on in: 
> https://github.com/cython/cython/blob/master/Cython/Utility/Buffer.c#L738

i.e. this code:

"""
      case 'O': case 's': case 'p':
        if (ctx->enc_type == *ts && got_Z == ctx->is_complex &&
            ctx->enc_packmode == ctx->new_packmode) {
          /* Continue pooling same type */
          ctx->enc_count += ctx->new_count;
        } else {
          /* New type */
          if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
          ctx->enc_count = ctx->new_count;
          ctx->enc_packmode = ctx->new_packmode;
          ctx->enc_type = *ts;
          ctx->is_complex = got_Z;
        }
        ++ts;
        ctx->new_count = 1;
        got_Z = 0;
        break;
"""

> since that appears to be the only place where `enc_count` is being incremented. That would seem like the place where a boundary between two string fields might not be properly handled (the comment in the line above "Continue pooling same type" is suggestive.

Yes. That code is basically unchanged since Dag committed it back in 2009
(CC-ing him). However, this parser is fairly complex (not surprisingly, the
buffer type format *is* complicated), so I'm not sure about the right fix.
My guess is that the first part of the conditional belongs more in the
"case 'Z'" fall-through section that precedes the code above. Hard to say.
Even when I disable it ("if (0 &&...)"), I don't get any test failures in
the format tests.

In any case, I agree with you that the "pooling same size" is the bit that
goes wrong here.

I've attached a test case for now.

Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: buffer_format_test.patch
Type: text/x-patch
Size: 665 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20140208/fbba7a5b/attachment.bin>


More information about the cython-devel mailing list