[Cython] buffer syntax vs. memory view syntax

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Tue May 8 11:47:26 CEST 2012

On 05/08/2012 11:30 AM, Dag Sverre Seljebotn wrote:
> On 05/08/2012 11:22 AM, mark florisson wrote:
>> On 8 May 2012 09:36, Dag Sverre Seljebotn<d.s.seljebotn at astro.uio.no>
>> wrote:
>>> On 05/08/2012 10:18 AM, Stefan Behnel wrote:
>>>> Dag Sverre Seljebotn, 08.05.2012 09:57:
>>>>> On 05/07/2012 11:21 PM, mark florisson wrote:
>>>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote:
>>>>>>> mark florisson wrote:
>>>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote:
>>>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote:
>>>>>>>>>> Stefan Behnel, 07.05.2012 15:04:
>>>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48:
>>>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked
>>>>>>>>>>>> about just
>>>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather
>>>>>>>>>>>> treat it
>>>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template
>>>>>>>>>>>> types".
>>>>>>>>>>>> That is,
>>>>>>>>>>>> we disallow "object[int]" and require some special
>>>>>>>>>>>> declarations in
>>>>>>>>>>>> the relevant pxd files.
>>>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of
>>>>>>>>>>> syntax now,
>>>>>>>>>>> one that declares the item type before the brackets and one that
>>>>>>>>>>> declares it afterwards.
>>>>>>>>>> Should we consider the
>>>>>>>>>> buffer interface syntax deprecated and focus on the memory view
>>>>>>>>>> syntax?
>>>>>>>>> I think that's the very-long-term intention. Then again, it may be
>>>>>>>>> too early
>>>>>>>>> to really tell yet, we just need to see how the memory views
>>>>>>>>> play out
>>>>>>>>> in
>>>>>>>>> real life and whether they'll be able to replace
>>>>>>>>> np.ndarray[double]
>>>>>>>>> among real users. We don't want to shove things down users
>>>>>>>>> throats.
>>>>>>>>> But the use of the trailing-[] syntax needs some cleaning up.
>>>>>>>>> Me and
>>>>>>>>> Mark agreed we'd put this proposal forward when we got around
>>>>>>>>> to it:
>>>>>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck
>>>>>>>>> on
>>>>>>>>> any extension type
>>>>>>>>> - But, do NOT (for the next year at least) deprecate
>>>>>>>>> np.ndarray[double],
>>>>>>>>> array.array[double], etc. Basically, there should be a magic flag
>>>>>>>>> in
>>>>>>>>> extension type declarations saying "I can be a buffer".
>>>>>>>>> For one thing, that is sort of needed to open up things for
>>>>>>>>> templated
>>>>>>>>> cdef classes/fused types cdef classes, if that is ever
>>>>>>>>> implemented.
>>>>>>>> Deprecating is definitely a good start. I think at least if you
>>>>>>>> only
>>>>>>>> allow two types as buffers it will be at least reasonably clear
>>>>>>>> when
>>>>>>>> one is dealing with fused types or buffers.
>>>>>>>> Basically, I think memoryviews should live up to demands of the
>>>>>>>> users,
>>>>>>>> which would mean there would be no reason to keep the buffer
>>>>>>>> syntax.
>>>>>>> But they are different approaches -- use a different type/API, or
>>>>>>> just
>>>>>>> try to speed up parts of NumPy..
>>>>>>>> One thing to do is make memoryviews coerce cheaply back to the
>>>>>>>> original objects if wanted (which is likely). Writting
>>>>>>>> np.asarray(mymemview) is kind of annoying.
>>>>>>> It is going to be very confusing to have type(mymemview),
>>>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have
>>>>>>> the
>>>>>>> full API of NumPy. Unless you auto-convert on getattr to...
>>>>>> Yeah, the idea is as very simple, as you mention, just keep the
>>>>>> object
>>>>>> around cached, and when you slice construct one lazily.
>>>>>>> If you want to eradicate the distinction between the backing
>>>>>>> array and
>>>>>>> the memory view and make it transparent, I really suggest you
>>>>>>> kick back
>>>>>>> alive np.ndarray (it can exist in some 'unrealized' state with
>>>>>>> delayed
>>>>>>> construction after slicing, and so on). Implementation much the same
>>>>>>> either way, it is all about how it is presented to the user.
>>>>>> You mean the buffer syntax?
>>>>>>> Something like mymemview.asobject() could work though, and while not
>>>>>>> much shorter, it would have some polymorphism that np.asarray
>>>>>>> does not
>>>>>>> have (based probably on some custom PEP 3118 extension)
>>>>>> I was thinking you could allow the user to register a callback, and
>>>>>> use that to coerce from a memoryview back to an object (given a
>>>>>> memoryview object). For numpy this would be np.asarray, and the
>>>>>> implementation is allowed to cache the result (which it will).
>>>>>> It may be too magicky though... but it will be convenient. The
>>>>>> memoryview will act as a subclass, meaning that any of its methods
>>>>>> will override methods of the converted object.
>>>>> My point was that this seems *way* to magicky.
>>>>> Beyond "confusing users" and so on that are sort of subjective,
>>>>> here's a
>>>>> fundamental problem for you: We're making it very difficult to
>>>>> type-infer
>>>>> memoryviews. Consider:
>>>>> cdef double[:] x = ...
>>>>> y = x
>>>>> print y.shape
>>>>> Now, because y is not typed, you're semantically throwing in a
>>>>> conversion
>>>>> on line 2, so that line 3 says that you want the attribute access
>>>>> to be
>>>>> invoked on "whatever object x coerced back to". And we have no idea
>>>>> what
>>>>> kind of object that is.
>>>>> If you don't transparently convert to object, it'd be safe to
>>>>> automatically
>>>>> infer y as a double[:].
>>>> Why can't y be inferred as the type of x due to the assignment?
>>>>> On a related note, I've said before that I dislike the notion of
>>>>> cdef double[:] mview = obj
>>>>> I'd rather like
>>>>> cdef double[:] mview = double[:](obj)
>>>> Why? We currently allow
>>>> cdef char* s = some_py_bytes_string
>>>> Auto-coercion is a serious part of the language, and I don't see the
>>>> advantage of requiring the redundancy in the case above. It's clear
>>>> enough
>>>> to me what the typed assignment is intended to mean: get me a buffer
>>>> view
>>>> on the object, regardless of what it is.
>>>>> I support Robert in that "np.ndarray[double]" is the syntax to use
>>>>> when
>>>>> you
>>>>> want this kind of transparent "be an object when I need to and a
>>>>> memory
>>>>> view when I need to".
>>>>> Proposal:
>>>>> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping
>>>>> that in
>>>>> the language. It means exactly what you would like double[:] to mean,
>>>>> i.e.
>>>>> a variable that is memoryview when you need to and an object
>>>>> otherwise.
>>>>> When you use this type, you bear the consequences of early-binding
>>>>> things
>>>>> that could in theory be overridden.
>>>>> 2) double[:] is for when you want to access data of *any* Python
>>>>> object
>>>>> in
>>>>> a generic way. Raw PEP 3118. In those situations, access to the
>>>>> underlying
>>>>> object is much less useful.
>>>>> 2a) Therefore we require that you do "mview.asobject()" manually;
>>>>> doing
>>>>> "mview.foo()" is a compile-time error
>>>> Sounds good. I think that would clean up the current syntax overlap
>>>> very
>>>> nicely.
>>>>> 2b) To drive the point home among users, and aid type inference and
>>>>> overall language clarity, we REMOVE the auto-acquisition and
>>>>> require that
>>>>> you do
>>>>> cdef double[:] mview = double[:](obj)
>>>> I don't see the point, as noted above. Either "obj" is statically typed
>>>> and
>>>> the bare assignment becomes a no-op, or it's not typed and the
>>>> assignment
>>>> coerces by creating a view. As with all other typed assignments.
>>>>> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow
>>>>> "print mview"; instead require that you do "print
>>>>> mview.asmemoryview()"
>>>>> or
>>>>> "print memoryview(mview)" or somesuch.
>>>> This seems to depend on 2b.
>>> This I don't understand. The question of 2c) is the analogue to
>>> auto-coercion of "char*" to bytes; approving 2c) would put
>>> memoryviews in
>>> line with char*.
>>> Then again, we could in future auto-coerce char* to a ctypes pointer,
>>> and in
>>> that case, coercing a memoryview to an object representing that
>>> memoryview
>>> would be OK.
>> Character pointers coerce to strings. Hell, even structs coerce to and
>> from python dicts, so disallowing the same for memoryviews would just
>> be inconsistent and inconvenient.
> OK, but even structs don't coerce back to some arbitrary type, it's
> always a dict. I don't necesarrily oppose coercing memoryviews to some
> Python memoryview object (not necesarrily the builtin).
> I agree that some mview.asobject() triggering a callback defined by some
> CEP 1xxx ("cross-language CEP") would be really useful; and that could
> form the basis of a new, improved np.ndarray[double] that allows fast
> slicing etc. (where that is used automatically whenever needed).

After some thinking I believe I can see more clearly where Mark is 
coming from. To sum up, it's either

A) Keep both np.ndarray[double] and double[:] around, with clearly 
defined and separate roles. np.ndarray[double] implementation is 
revamped to allow fast slicing etc., based on the double[:] implementation.

B) Deprecate np.ndarray[double] sooner rather than later, but make 
double[:] have functionality that is *really* close to what 
np.ndarray[double] currently does. In most cases one should be able to 
basically replace np.ndarray[double] with double[:] and the code should 
continue to work just like before; difference is that if you pass in 
anything else than a NumPy array, it will likely fail with a runtime 
AttributeError at some point rather than fail a PyType_Check.

Between those two I believe it's a matter of design taste, not so much 
rational argument, and I don't know where I stand yet. And I'm going to 
stop thinking about it until I see what Robert says...


More information about the cython-devel mailing list