[Cython] buffer syntax vs. memory view syntax

Tue May 8 11:21:04 CEST 2012

On 8 May 2012 09:49, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Dag Sverre Seljebotn, 08.05.2012 10:36:
>> On 05/08/2012 10:18 AM, Stefan Behnel wrote:
>>> Dag Sverre Seljebotn, 08.05.2012 09:57:
>>>> On 05/07/2012 11:21 PM, mark florisson wrote:
>>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote:
>>>>>> mark florisson wrote:
>>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote:
>>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote:
>>>>>>>>> Stefan Behnel, 07.05.2012 15:04:
>>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48:
>>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked about just
>>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather treat it
>>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template types".
>>>>>>>>>>> That is,
>>>>>>>>>>> we disallow "object[int]" and require some special declarations in
>>>>>>>>>>> the relevant pxd files.
>>>>>>>>>>
>>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of
>>>>>>>>>> syntax now,
>>>>>>>>>> one that declares the item type before the brackets and one that
>>>>>>>>>> declares it afterwards.
>>>>>>>>> Should we consider the
>>>>>>>>> buffer interface syntax deprecated and focus on the memory view
>>>>>>>>> syntax?
>>>>>>>>
>>>>>>>> I think that's the very-long-term intention. Then again, it may be
>>>>>>>> too early
>>>>>>>> to really tell yet, we just need to see how the memory views play out
>>>>>>>> in
>>>>>>>> real life and whether they'll be able to replace np.ndarray[double]
>>>>>>>> among real users. We don't want to shove things down users throats.
>>>>>>>>
>>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. Me and
>>>>>>>> Mark agreed we'd put this proposal forward when we got around to it:
>>>>>>>>
>>>>>>>>    - Deprecate the "object[double]" form, where [dtype] can be stuck on
>>>>>>>>    any extension type
>>>>>>>>
>>>>>>>>    - But, do NOT (for the next year at least) deprecate
>>>>>>>>    np.ndarray[double],
>>>>>>>>    array.array[double], etc. Basically, there should be a magic flag in
>>>>>>>>    extension type declarations saying "I can be a buffer".
>>>>>>>>
>>>>>>>> For one thing, that is sort of needed to open up things for templated
>>>>>>>> cdef classes/fused types cdef classes, if that is ever implemented.
>>>>>>>
>>>>>>> Deprecating is definitely a good start. I think at least if you only
>>>>>>> allow two types as buffers it will be at least reasonably clear when
>>>>>>> one is dealing with fused types or buffers.
>>>>>>>
>>>>>>> Basically, I think memoryviews should live up to demands of the users,
>>>>>>> which would mean there would be no reason to keep the buffer syntax.
>>>>>>
>>>>>> But they are different approaches -- use a different type/API, or just
>>>>>> try to speed up parts of NumPy..
>>>>>>
>>>>>>> One thing to do is make memoryviews coerce cheaply back to the
>>>>>>> original objects if wanted (which is likely). Writting
>>>>>>> np.asarray(mymemview) is kind of annoying.
>>>>>>
>>>>>> It is going to be very confusing to have type(mymemview),
>>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have the
>>>>>> full API of NumPy. Unless you auto-convert on getattr to...
>>>>>
>>>>> Yeah, the idea is as very simple, as you mention, just keep the object
>>>>> around cached, and when you slice construct one lazily.
>>>>>
>>>>>> If you want to eradicate the distinction between the backing array and
>>>>>> the memory view and make it transparent, I really suggest you kick back
>>>>>> alive np.ndarray (it can exist in some 'unrealized' state with delayed
>>>>>> construction after slicing, and so on). Implementation much the same
>>>>>> either way, it is all about how it is presented to the user.
>>>>>
>>>>> You mean the buffer syntax?
>>>>>
>>>>>> Something like mymemview.asobject() could work though, and while not
>>>>>> much shorter, it would have some polymorphism that np.asarray does not
>>>>>> have (based probably on some custom PEP 3118 extension)
>>>>>
>>>>> I was thinking you could allow the user to register a callback, and
>>>>> use that to coerce from a memoryview back to an object (given a
>>>>> memoryview object). For numpy this would be np.asarray, and the
>>>>> implementation is allowed to cache the result (which it will).
>>>>> It may be too magicky though... but it will be convenient. The
>>>>> memoryview will act as a subclass, meaning that any of its methods
>>>>> will override methods of the converted object.
>>>>
>>>> My point was that this seems *way* to magicky.
>>>>
>>>> Beyond "confusing users" and so on that are sort of subjective, here's a
>>>> fundamental problem for you: We're making it very difficult to type-infer
>>>> memoryviews. Consider:
>>>>
>>>> cdef double[:] x = ...
>>>> y = x
>>>> print y.shape
>>>>
>>>> Now, because y is not typed, you're semantically throwing in a conversion
>>>> on line 2, so that line 3 says that you want the attribute access to be
>>>> invoked on "whatever object x coerced back to". And we have no idea what
>>>> kind of object that is.
>>>>
>>>> If you don't transparently convert to object, it'd be safe to automatically
>>>> infer y as a double[:].
>>>
>>> Why can't y be inferred as the type of x due to the assignment?
>>>
>>>
>>>> On a related note, I've said before that I dislike the notion of
>>>>
>>>> cdef double[:] mview = obj
>>>>
>>>> I'd rather like
>>>>
>>>> cdef double[:] mview = double[:](obj)
>>>
>>> Why? We currently allow
>>>
>>>      cdef char* s = some_py_bytes_string
>>>
>>> Auto-coercion is a serious part of the language, and I don't see the
>>> advantage of requiring the redundancy in the case above. It's clear enough
>>> to me what the typed assignment is intended to mean: get me a buffer view
>>> on the object, regardless of what it is.
>>>
>>>
>>>> I support Robert in that "np.ndarray[double]" is the syntax to use when you
>>>> want this kind of transparent "be an object when I need to and a memory
>>>> view when I need to".
>>>>
>>>> Proposal:
>>>>
>>>>   1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that in
>>>> the language. It means exactly what you would like double[:] to mean, i.e.
>>>> a variable that is memoryview when you need to and an object otherwise.
>>>> When you use this type, you bear the consequences of early-binding things
>>>> that could in theory be overridden.
>>>>
>>>>   2) double[:] is for when you want to access data of *any* Python
>>>> object in
>>>> a generic way. Raw PEP 3118. In those situations, access to the underlying
>>>> object is much less useful.
>>>>
>>>>    2a) Therefore we require that you do "mview.asobject()" manually; doing
>>>> "mview.foo()" is a compile-time error
>>>
>>> Sounds good. I think that would clean up the current syntax overlap very
>>> nicely.
>>>
>>>
>>>>    2b) To drive the point home among users, and aid type inference and
>>>> overall language clarity, we REMOVE the auto-acquisition and require that
>>>> you do
>>>>
>>>>      cdef double[:] mview = double[:](obj)
>>>
>>> I don't see the point, as noted above. Either "obj" is statically typed and
>>> the bare assignment becomes a no-op, or it's not typed and the assignment
>>> coerces by creating a view. As with all other typed assignments.
>>>
>>>
>>>>    2c) Perhaps: Do not even coerce to a Python memoryview and disallow
>>>> "print mview"; instead require that you do "print mview.asmemoryview()" or
>>>> "print memoryview(mview)" or somesuch.
>>>
>>> This seems to depend on 2b.
>>
>> This I don't understand. The question of 2c) is the analogue to
>> auto-coercion of "char*" to bytes; approving 2c) would put memoryviews in
>> line with char*.
>>
>> Then again, we could in future auto-coerce char* to a ctypes pointer, and
>> in that case, coercing a memoryview to an object representing that
>> memoryview would be OK.
>>
>> Either way, you would never get back the same object that you coerced from!
>
> Ah, that's what you meant. I thought you were referring to getting a
> memoryview from an object.
>
> I agree that a buffer view shouldn't auto-coerce back to its owner (or to a
> Python object in general), that's the whole point of the syntax cleanup.
>
> In simple cases, buffer.obj would be the thing to talk to, except for
> memory views, where only the view knows the mapped memory layout but the
> underlying exporter has the methods to deal with the buffer. In that case,
> we may really want to leave it to the user to handle this. I don't think
> the compiler can do the right thing in all cases, and the user is really
> the only one who knows what kind of object should be used or even
> instantiated to wrap a buffer. Nothing we can do is shorter or more clearly
> readable than np.asarray() or whatever function a specific library has for
> this.
>
> So, what about just keeping buffer.obj visible and leaving everything else
> to users?

buffer.base gets you the original object.

> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel