[Cython] memoryview slices can't be None?

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Sat Feb 4 20:39:29 CET 2012


On 02/03/2012 07:26 PM, mark florisson wrote:
> On 3 February 2012 18:15, Dag Sverre Seljebotn
> <d.s.seljebotn at astro.uio.no>  wrote:
>> On 02/03/2012 07:07 PM, mark florisson wrote:
>>>
>>> On 3 February 2012 18:06, mark florisson<markflorisson88 at gmail.com>
>>>   wrote:
>>>>
>>>> On 3 February 2012 17:53, Dag Sverre Seljebotn
>>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>>
>>>>> On 02/03/2012 12:09 AM, mark florisson wrote:
>>>>>>
>>>>>>
>>>>>> On 2 February 2012 21:38, Dag Sverre Seljebotn
>>>>>> <d.s.seljebotn at astro.uio.no>      wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 02/02/2012 10:16 PM, mark florisson wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn
>>>>>>>> <d.s.seljebotn at astro.uio.no>        wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I just realized that
>>>>>>>>>
>>>>>>>>> cdef int[:] a = None
>>>>>>>>>
>>>>>>>>> raises an exception; even though I'd argue that 'a' is of the
>>>>>>>>> "reference"
>>>>>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass b
>>>>>>>>> =
>>>>>>>>> None"
>>>>>>>>> is allowed even if type(None) is NoneType). Is this a bug or not,
>>>>>>>>> and
>>>>>>>>> is
>>>>>>>>> it
>>>>>>>>> possible to do something about it?
>>>>>>>>>
>>>>>>>>> Dag Sverre
>>>>>>>>> _______________________________________________
>>>>>>>>> cython-devel mailing list
>>>>>>>>> cython-devel at python.org
>>>>>>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yeah I disabled that quite early. It was supposed to be working but
>>>>>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I was
>>>>>>>> trying to get rid of all the segfaults and get the basic
>>>>>>>> functionality
>>>>>>>> working, so I disabled it. Personally, I have never liked how things
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Well, you can segfault quite easily with
>>>>>>>
>>>>>>> cdef MyClass a = None
>>>>>>> print a.field
>>>>>>>
>>>>>>> so it doesn't make sense to slices different from cdef classes IMO.
>>>>>>>
>>>>>>>
>>>>>>>> can be None unchecked. I personally prefer to write
>>>>>>>>
>>>>>>>> cdef foo(obj=None):
>>>>>>>>      cdef int[:] a
>>>>>>>>      if obj is None:
>>>>>>>>          obj = ...
>>>>>>>>      a = obj
>>>>>>>>
>>>>>>>> Often you forget to write 'not None' when declaring the parameter
>>>>>>>> (and
>>>>>>>> apparently that it only allowed for 'def' functions).
>>>>>>>>
>>>>>>>> As such, I never bothered to re-enable it. However, it does support
>>>>>>>> control flow with uninitialized slices, and will raise an error if it
>>>>>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> When in doubt, go for consistency. So +1 for that reason. I do believe
>>>>>>> that
>>>>>>> setting stuff to None is rather vital in Python.
>>>>>>>
>>>>>>> What I typically do is more like this:
>>>>>>>
>>>>>>> def f(double[:] input, double[:] out=None):
>>>>>>>     if out is None:
>>>>>>>         out = np.empty_like(input)
>>>>>>>     ...
>>>>>>>
>>>>>>> Having to use another variable name is a bit of a pain. (Come on -- do
>>>>>>> you
>>>>>>> use "a" in real code? What do you actually call "the other obj"? I
>>>>>>> sometimes
>>>>>>> end up with "out_" and so on, but it creates smelly code quite
>>>>>>> quickly.)
>>>>>>
>>>>>>
>>>>>>
>>>>>> No, it was just a contrived example.
>>>>>>
>>>>>>> It's easy to segfault with cdef classes anyway, so decent nonechecking
>>>>>>> should be implemented at some point, and then memoryviews would use
>>>>>>> the
>>>>>>> same
>>>>>>> mechanisms. Java has decent null-checking...
>>>>>>>
>>>>>>
>>>>>> The problem with none checking is that it has to occur at every point.
>>>>>
>>>>>
>>>>>
>>>>> Well, using control flow analysis etc. it doesn't really. E.g.,
>>>>>
>>>>> for i in range(a.shape[0]):
>>>>>     print i
>>>>>     a[i] *= 3
>>>>>
>>>>> can be unrolled and none-checks inserted as
>>>>>
>>>>> print 0
>>>>> if a is None: raise ....
>>>>> a[0] *= 3
>>>>> for i in range(1, a.shape[0]):
>>>>>     print i
>>>>>     a[i] *= 3 # no need for none-check
>>>>>
>>>>> It's very similar to what you'd want to do to pull boundschecking out of
>>>>> the
>>>>> loop...
>>>>>
>>>>
>>>> Oh, definitely. Both optimizations may not always be possible to do,
>>>> though. The optimization (for boundschecking) is easier for prange()
>>>> than range(), as you can immediately raise an exception as the
>>>> exceptional condition may be issued at any iteration.  What do you do
>>>> with bounds checking when some accesses are in-bound, and some are
>>>> out-of-bound? Do you immediately raise the exception? Are we fine with
>>>> aborting (like Fortran compilers do when you ask them for bounds
>>>> checking)? And how do you detect that the code doesn't already raise
>>>> an exception or break out of the loop itself to prevent the
>>>> out-of-bound access? (Unless no exceptions are propagating and no
>>>> break/return is used, but exceptions are so very common).
>>>>
>>>>>> With initialized slices the control flow knows when the slices are
>>>>>> initialized, or when they might not be (and it can raise a
>>>>>> compile-time or runtime error, instead of a segfault if you're lucky).
>>>>>> I'm fine with implementing the behaviour, I just always left it at the
>>>>>> bottom of my todo list.
>>>>>
>>>>>
>>>>>
>>>>> Wasn't saying you should do it, just checking.
>>>>>
>>>>> I'm still not sure about this. I think what I'd really like is
>>>>>
>>>>>   a) Stop cdef classes from being None as well
>>>>>
>>>>>   b) Sort-of deprecate cdef in favor of cast/assertion type statements
>>>>> that
>>>>> help the type inferences:
>>>>>
>>>>> def f(arr):
>>>>>     if arr is None:
>>>>>         arr = ...
>>>>>     arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but
>>>>>                       # acts as statement, with a specific point
>>>>>                       # for the none-check
>>>>>     ...
>>>>>
>>>>> or even:
>>>>>
>>>>> def f(arr):
>>>>>     if arr is None:
>>>>>         return 'foo'
>>>>>     else:
>>>>>         arr = int[:](arr) # takes effect *here*, does none-check
>>>>>         ...
>>>>>     # arr still typed as int[:] here
>>>>>
>>>>> If we can make this work well enough with control flow analysis I'd
>>>>> never
>>>>> cdef declare local vars again :-)
>>>>
>>>>
>>>> Hm, what about the following?
>>>>
>>>> def f(arr):
>>>>     if arr is None:
>>>>         return 'foo'
>>>>
>>>>     cdef int[:] arr # arr may not be None
>>>
>>>
>>> The above would work in general, until the declaration is lexically
>>> encountered, the object is typed as object.
>>
>>
>> This was actually going to be my first proposal :-) That would finally
>> define how "cdef" inside of if-statements etc. behave too (simply use
>> control flow analysis and treat it like a statement).
>
> Block-local declarations are definitely something we want, although I
> think it would require some more (non-trivial) changes to the
> compiler.

Note that my proposal was actually not about block-local declarations.

Block-local:

{
    int x = 4;
}
/* x not available here */

My idea was much more like hints to control flow analysis. That is, I 
wanted to have this raise an error:

x = 'adf'
if foo():
     cdef int x = y
print x # type of x not known

This is OK:

if foo():
     cdef int x = y
else:
     cdef int x = 4
print x # ok, type the same anyway -- so type "escapes" block

And I would allow

cdef str x = y
if foo:
     cdef int x = int(x)
     return g(x) # x must be int
print x # x must be str at this point


The reason for this madness is simply that control statements do NOT 
create blocks in Python, and making it so in Cython is just confusing. 
It would bring too much of C into the language for my taste.

I think that in my Cython-utopia, Symtab.py is only responsible for 
resolving the scope of *names*, and types of things are not bound to 
blocks, just to the state at control flow points.

Of course, implementing this would be a nightmare.

> Maybe the cleanup code from functions, as well as the temp handling
> etc could be re-factored to a BlockNode, that all block nodes could
> subclass. They'd have to instantiate new symbol table environments as
> well. I'm not yet entirely sure what else would be involved in the
> implementation of that.
>
>> But I like int[:] as a way of making it pure Python syntax compatible as
>> well. Perhaps the two are orthogonal -- a) make variable declaration a
>> statement, b) make cython.int[:](x) do, essentially, a cdef declaration, for
>> Python compatability.
>>
>
> Don't we have cython.declare() for that? e.g.
>
>      arr = cython.declare(cython.int[:])
>
> That would also be treated as a statement like normal declarations (if
> and when implemented).

This was what I said, but it wasn't what I meant. Sorry. I'll try to 
explain better:

1)  There's no way to have the above actually do the right thing in 
Python. With "arr = cython.int[:](arr)" one could actually return a 
NumPy or NumPy-like array that works in Python (since "arr" might not 
have the "shape" attribute before the conversion, all we know is that it 
exports the buffer interface...).

2) I don't like the fact that we overload the assignment operator to 
acquire a view. "cdef np.ndarray[int] x = y" is fine since if you do 
"x.someattr" then a NumPy subclass could provide someattr and it works 
fine. Acquiring a view is just something different.

3) Hence I guess I like "arr = int[:](arr)" better both for Cython and 
Python; at least if "arr" is always type-inferred to be int[:], even if 
arr was an "object" further up in the code (really, if you do "x = f(x)" 
at the top-level of the function, then x can just take the identity of 
another variable from that point on -- I don't know if the current 
control flow analysis and type inferences does this though?)

Dag Sverre


More information about the cython-devel mailing list