[Cython] Fwd: Re: [cython-users] checking for "None" in nogil function

mark florisson markflorisson88 at gmail.com
Mon May 7 18:03:43 CEST 2012


On 7 May 2012 12:51, Dag Sverre Seljebotn <d.s.seljebotn at astro.uio.no> wrote:
> On 05/07/2012 01:48 PM, Dag Sverre Seljebotn wrote:
>>
>> On 05/07/2012 01:10 PM, Stefan Behnel wrote:
>>>
>>> Dag Sverre Seljebotn, 07.05.2012 12:40:
>>>>
>>>> moving to dev list
>>>
>>>
>>> Makes sense.
>>>
>>>> On 05/07/2012 11:17 AM, Stefan Behnel wrote:
>>>>>
>>>>> Dag Sverre Seljebotn, 07.05.2012 10:44:
>>>>>>
>>>>>> On 05/07/2012 07:48 AM, Stefan Behnel wrote:
>>>>>>>
>>>>>>> I wonder why a memory view should be allowed to be None in the first
>>>>>>> place.
>>>>>>> Buffer arguments aren't (because they get unpacked on entry), so why
>>>>>>> should memory views?
>>>>>>
>>>>>>
>>>>>> ? At least when I implemented it, buffers get unpacked but the case
>>>>>> of a
>>>>>> None buffer is treated specially, and you're fully allowed (and
>>>>>> segfault if
>>>>>> you [] it).
>>>>>
>>>>>
>>>>> Hmm, ok, maybe I just got confused by the code then.
>>>>>
>>>>> I think the docs should state that buffer arguments are best used
>>>>> together
>>>>> with the "not None" declaration then.
>>>
>>>
>>> ... which made me realise that that wasn't even supported. I can't
>>> believe
>>> no-one ever reported that as a bug...
>>>
>>>
>>> https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b
>>>
>>>
>>> It's still not supported for memory views.
>>>
>>> BTW, is there a reason why we shouldn't allow a "not None" declaration
>>> for
>>> cdef functions? Obviously, the caller would have to do the check in that
>>> case. Hmm, maybe it's not that important, because None checks are best
>>> done
>>> at entry points from user code, which usually means Python code. It seems
>>> like "not None" is not supported on cpdef functions, though.
>>>
>>>
>>>> I use them with "=None" default values all the time... then do a
>>>> None-check manually.
>>>
>>>
>>> Interesting. Could you given an example? What's the advantage over
>>> letting
>>> Cython raise an error for you? And, since you are using it as a default
>>> argument, why would someone want to call your code entirely without a
>>> buffer argument?
>>
>>
>> Here you go:
>>
>> def foo(np.ndarray[double] a, np.ndarray[double] out=None):
>> if out is None:
>> out = np.empty_like(a)
>> # compute result in out
>> return out
>>
>> The pattern of handing in the memory area to write to is one of the
>> fundamental basics of numerical computing; you often just can't
>> implement an algorithm if the called function returns the result in a
>> newly-allocated array. I can explain why that is in detail, but I'd
>> rather you just trusted the testimony of somebody doing numerical
>> computation...
>>
>> It's just a convenience, but often (in particular when testing) it's
>> incredibly convenient to not have to bother with allocating the output
>> array.
>>
>> Another pattern is:
>>
>> def do_something(np.ndarray[double] a,
>> np.ndarray[double] sin_of_a=None):
>> ...
>>
>> so if your caller happened to already have computed something, the
>> function uses it, but OTOH the "something" is a function of the inputs
>> and can be computed on the fly. AND, sometimes it can be computed on the
>> fly in ways more efficient than what the caller could have done, because
>> of memory bus issues etc. etc.
>>
>> Both of these can be "fixed" by a) not allowing the convenient
>> shorthand, or b) declare the argument "object" first and then type it
>> after the "preamble".
>>
>> So the REAL reason I'm arguing this case is consistency with cdef classes.
>>
>>
>>
>>>
>>>
>>>> It's really no different from cdef classes.
>>>
>>>
>>> I find it at least a bit more surprising because a buffer unpacking
>>> argument is a rather strong hint that you expect something that supports
>>> this protocol. The fact that you type your function argument with it
>>> hints
>>> at the intention to properly unpack it on entry. I'm sure there are
>>> lots of
>>> users who were or will be surprised when they realise that that doesn't
>>> exclude None values.
>>
>>
>> Whereas I think there would be more users surprised by the opposite.
>>
>> So there -- we won't know who's right without actually finding some
>> users. And chances are we are both right, since users are different from
>> one another.
>>
>>>
>>>
>>>>> And I remember that we wanted to change the default settings for
>>>>> extension
>>>>> type arguments from "or None" to "not None" years ago but never
>>>>> actually
>>>>> did it.
>>>>
>>>>
>>>> I remember that there was such a debate, but I certainly don't remember
>>>> that this was the conclusion :-)
>>>
>>>
>>> Maybe not, yes.
>>>
>>>
>>>> I didn't agree with that view then and
>>>> I don't now. I don't remember what Robert's view was...
>>>>
>>>> As far as I can remember (which might be biased towards my personal
>>>> view), the conclusion was that we left the current semantics in place,
>>>> relying on better control flow analysis to make None-checks cheaper, and
>>>> when those are cheap enough, make the nonecheck directive default to
>>>> True
>>>
>>>
>>> At least for buffer arguments, it silently corrupts data or segfaults in
>>> the current state of affairs, as you pointed out. Not exactly ideal.
>>
>>
>> No different than writing to a field in a cdef class...
>
>
> Also, I believe that in the strided case, the strides are all set to 0, and
> the data-pointer is NULL, so you will never corrupt data, you will always
> try to access *NULL and segfault.
>
> Though If you put mode='c' and a very high index you'll corrupt data.
>
> Dag
>

If you have boundschecking on, you'll get an out of bounds error,
which is pretty weird :)

>>
>>>
>>> That's another reason why I see a difference between the behaviour of
>>> extension types and that of buffer arguments. Buffer indexing is also way
>>> more performance critical than the average method call or attribute
>>> access
>>> on a cdef class.
>>
>>
>> Perhaps, but that's a bit hand-wavy to turn into a principle of language
>> design? "This is performance critical, so therefore we suddenly invert
>> the normal rule"?
>>
>> I just think we should be consistent, not have more special rules for
>> buffers than we need to.
>>
>> The intention all the time was that "np.ndarray[double]" is just a
>> glorified "np.ndarray". People expect it to behave like an optimized
>> "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"?
>>
>> BTW, with the coming of memoryviews, me and Mark talked about just
>> deprecating the "mytype[...]" meaning buffers, and rather treat it as
>> np.ndarray, array.array etc. being some sort of "template types". That
>> is, we disallow "object[int]" and require some special declarations in
>> the relevant pxd files.
>>
>>>> (Java is sort of prior art that this can indeed be done?).
>>>
>>>
>>> Java was designed to have a JIT compiler underneath which handles
>>> external
>>> parameters, and its compilers are way smarter than Cython. I agree that
>>> there is still a lot we can do based on better static analysis, but there
>>> will always be limits.
>>
>>
>> Any static analysis will be able to get you to the point of "not None"
>> if the user has a manual test. And the Python way is often to just spell
>> things out rather than brevity; I think an explicit if-test is much more
>> newbie friendly than "not None", "or None", etc.
>>
>> Performance beyond that is rather theoretical for the moment.
>>
>> I agree that for memoryviews that can be passed in acquired-state to
>> cdef functions there is the question of eliminating an extra branch or
>> so, but that is still far-fetched, and I'd rather Mark raise the issue
>> if it comes an issue than the two of us bikeshedding over it.
>>
>> I'll try to make this my last post to this thread, I feel we're slipping
>> into Dag-and-Stefan-endless-thread territory...
>>
>> Dag
>
>
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel


More information about the cython-devel mailing list