[Cython] buffer syntax vs. memory view syntax

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Tue May 8 10:31:32 CEST 2012

On 05/08/2012 10:18 AM, Stefan Behnel wrote:
> Dag Sverre Seljebotn, 08.05.2012 09:57:
>> On 05/07/2012 11:21 PM, mark florisson wrote:
>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote:
>>>> mark florisson wrote:
>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote:
>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote:
>>>>>>> Stefan Behnel, 07.05.2012 15:04:
>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48:
>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just
>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it
>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template types".
>>>>>>>>> That is,
>>>>>>>>> we disallow "object[int]" and require some special declarations in
>>>>>>>>> the relevant pxd files.
>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of
>>>>>>>> syntax now,
>>>>>>>> one that declares the item type before the brackets and one that
>>>>>>>> declares it afterwards.
>>>>>>> Should we consider the
>>>>>>> buffer interface syntax deprecated and focus on the memory view
>>>>>>> syntax?
>>>>>> I think that's the very-long-term intention. Then again, it may be
>>>>>> too early
>>>>>> to really tell yet, we just need to see how the memory views play out
>>>>>> in
>>>>>> real life and whether they'll be able to replace np.ndarray[double]
>>>>>> among real users. We don't want to shove things down users throats.
>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and
>>>>>> Mark agreed we'd put this proposal forward when we got around to it:
>>>>>>    - Deprecate the "object[double]" form, where [dtype] can be stuck on
>>>>>>    any extension type
>>>>>>    - But, do NOT (for the next year at least) deprecate
>>>>>>    np.ndarray[double],
>>>>>>    array.array[double], etc. Basically, there should be a magic flag in
>>>>>>    extension type declarations saying "I can be a buffer".
>>>>>> For one thing, that is sort of needed to open up things for templated
>>>>>> cdef classes/fused types cdef classes, if that is ever implemented.
>>>>> Deprecating is definitely a good start. I think at least if you only
>>>>> allow two types as buffers it will be at least reasonably clear when
>>>>> one is dealing with fused types or buffers.
>>>>> Basically, I think memoryviews should live up to demands of the users,
>>>>> which would mean there would be no reason to keep the buffer syntax.
>>>> But they are different approaches -- use a different type/API, or just
>>>> try to speed up parts of NumPy..
>>>>> One thing to do is make memoryviews coerce cheaply back to the
>>>>> original objects if wanted (which is likely). Writting
>>>>> np.asarray(mymemview) is kind of annoying.
>>>> It is going to be very confusing to have type(mymemview),
>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the
>>>> full API of NumPy. Unless you auto-convert on getattr to...
>>> Yeah, the idea is as very simple, as you mention, just keep the object
>>> around cached, and when you slice construct one lazily.
>>>> If you want to eradicate the distinction between the backing array and
>>>> the memory view and make it transparent, I really suggest you kick back
>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed
>>>> construction after slicing, and so on). Implementation much the same
>>>> either way, it is all about how it is presented to the user.
>>> You mean the buffer syntax?
>>>> Something like mymemview.asobject() could work though, and while not
>>>> much shorter, it would have some polymorphism that np.asarray does not
>>>> have (based probably on some custom PEP 3118 extension)
>>> I was thinking you could allow the user to register a callback, and
>>> use that to coerce from a memoryview back to an object (given a
>>> memoryview object). For numpy this would be np.asarray, and the
>>> implementation is allowed to cache the result (which it will).
>>> It may be too magicky though... but it will be convenient. The
>>> memoryview will act as a subclass, meaning that any of its methods
>>> will override methods of the converted object.
>> My point was that this seems *way* to magicky.
>> Beyond "confusing users" and so on that are sort of subjective, here's a
>> fundamental problem for you: We're making it very difficult to type-infer
>> memoryviews. Consider:
>> cdef double[:] x = ...
>> y = x
>> print y.shape
>> Now, because y is not typed, you're semantically throwing in a conversion
>> on line 2, so that line 3 says that you want the attribute access to be
>> invoked on "whatever object x coerced back to". And we have no idea what
>> kind of object that is.
>> If you don't transparently convert to object, it'd be safe to automatically
>> infer y as a double[:].
> Why can't y be inferred as the type of x due to the assignment?
>> On a related note, I've said before that I dislike the notion of
>> cdef double[:] mview = obj
>> I'd rather like
>> cdef double[:] mview = double[:](obj)
> Why? We currently allow
>      cdef char* s = some_py_bytes_string
> Auto-coercion is a serious part of the language, and I don't see the
> advantage of requiring the redundancy in the case above. It's clear enough
> to me what the typed assignment is intended to mean: get me a buffer view
> on the object, regardless of what it is.

Good point. I admit defeat.

There's slight difference in that there's more of a 1:1 between a bytes 
and a char*, whereas there's a many:1 for buffers. But it doesn't seem 
to matter, since "char*" doesn't coerce back to object automatically. 
(Though that fact is an argument against letting memoryviews coerce to 
objects automatically)

(Also I happen to not like this part of the language -- I think it's 
making us be further from Python than we would need to -- but that's not 
relevant in this thread at all, but rather in some pure Python mode thread.)

>> I support Robert in that "np.ndarray[double]" is the syntax to use when you
>> want this kind of transparent "be an object when I need to and a memory
>> view when I need to".
>> Proposal:
>>   1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in
>> the language. It means exactly what you would like double[:] to mean, i.e.
>> a variable that is memoryview when you need to and an object otherwise.
>> When you use this type, you bear the consequences of early-binding things
>> that could in theory be overridden.
>>   2) double[:] is for when you want to access data of *any* Python object in
>> a generic way. Raw PEP 3118. In those situations, access to the underlying
>> object is much less useful.
>>    2a) Therefore we require that you do "mview.asobject()" manually; doing
>> "mview.foo()" is a compile-time error
> Sounds good. I think that would clean up the current syntax overlap very
> nicely.
>>    2b) To drive the point home among users, and aid type inference and
>> overall language clarity, we REMOVE the auto-acquisition and require that
>> you do
>>      cdef double[:] mview = double[:](obj)
> I don't see the point, as noted above. Either "obj" is statically typed and
> the bare assignment becomes a no-op, or it's not typed and the assignment
> coerces by creating a view. As with all other typed assignments.

>>    2c) Perhaps: Do not even coerce to a Python memoryview and disallow
>> "print mview"; instead require that you do "print mview.asmemoryview()" or
>> "print memoryview(mview)" or somesuch.
> This seems to depend on 2b.
>> (A related proposal that's been up earlier has been that a variable can be
>> annotated with many interfaces; e.g.
>> cdef A|B|C obj
>> ...and then when you do "obj.method", it is first looked up in C, then B,
>> then A, then Python getattr. Not sure if we want to reopen that can of
>> worms...)
> Different topic - new thread?

It's very related, since np.ndarray[double] would essentially be 
"np.ndarray | double[:]".


More information about the cython-devel mailing list