[Cython] memoryview slices can't be None?

Sun Feb 5 22:56:18 CET 2012

On 4 February 2012 19:39, Dag Sverre Seljebotn
<d.s.seljebotn at astro.uio.no> wrote:
> On 02/03/2012 07:26 PM, mark florisson wrote:
>>
>> On 3 February 2012 18:15, Dag Sverre Seljebotn
>> <d.s.seljebotn at astro.uio.no>  wrote:
>>>
>>> On 02/03/2012 07:07 PM, mark florisson wrote:
>>>>
>>>>
>>>> On 3 February 2012 18:06, mark florisson<markflorisson88 at gmail.com>
>>>>  wrote:
>>>>>
>>>>>
>>>>> On 3 February 2012 17:53, Dag Sverre Seljebotn
>>>>> <d.s.seljebotn at astro.uio.no>    wrote:
>>>>>>
>>>>>>
>>>>>> On 02/03/2012 12:09 AM, mark florisson wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2 February 2012 21:38, Dag Sverre Seljebotn
>>>>>>> <d.s.seljebotn at astro.uio.no>      wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 02/02/2012 10:16 PM, mark florisson wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2 February 2012 12:19, Dag Sverre Seljebotn
>>>>>>>>> <d.s.seljebotn at astro.uio.no>        wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I just realized that
>>>>>>>>>>
>>>>>>>>>> cdef int[:] a = None
>>>>>>>>>>
>>>>>>>>>> raises an exception; even though I'd argue that 'a' is of the
>>>>>>>>>> "reference"
>>>>>>>>>> kind of type where Cython usually allow None (i.e., "cdef MyClass
>>>>>>>>>> b
>>>>>>>>>> =
>>>>>>>>>> None"
>>>>>>>>>> is allowed even if type(None) is NoneType). Is this a bug or not,
>>>>>>>>>> and
>>>>>>>>>> is
>>>>>>>>>> it
>>>>>>>>>> possible to do something about it?
>>>>>>>>>>
>>>>>>>>>> Dag Sverre
>>>>>>>>>> _______________________________________________
>>>>>>>>>> cython-devel mailing list
>>>>>>>>>> cython-devel at python.org
>>>>>>>>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yeah I disabled that quite early. It was supposed to be working but
>>>>>>>>> gave a lot of trouble in cases (segfaults, mainly). At the time I
>>>>>>>>> was
>>>>>>>>> trying to get rid of all the segfaults and get the basic
>>>>>>>>> functionality
>>>>>>>>> working, so I disabled it. Personally, I have never liked how
>>>>>>>>> things
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Well, you can segfault quite easily with
>>>>>>>>
>>>>>>>> cdef MyClass a = None
>>>>>>>> print a.field
>>>>>>>>
>>>>>>>> so it doesn't make sense to slices different from cdef classes IMO.
>>>>>>>>
>>>>>>>>
>>>>>>>>> can be None unchecked. I personally prefer to write
>>>>>>>>>
>>>>>>>>> cdef foo(obj=None):
>>>>>>>>>     cdef int[:] a
>>>>>>>>>     if obj is None:
>>>>>>>>>         obj = ...
>>>>>>>>>     a = obj
>>>>>>>>>
>>>>>>>>> Often you forget to write 'not None' when declaring the parameter
>>>>>>>>> (and
>>>>>>>>> apparently that it only allowed for 'def' functions).
>>>>>>>>>
>>>>>>>>> As such, I never bothered to re-enable it. However, it does support
>>>>>>>>> control flow with uninitialized slices, and will raise an error if
>>>>>>>>> it
>>>>>>>>> is uninitialized. Do we want this behaviour (e.g. for consistency)?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> When in doubt, go for consistency. So +1 for that reason. I do
>>>>>>>> believe
>>>>>>>> that
>>>>>>>> setting stuff to None is rather vital in Python.
>>>>>>>>
>>>>>>>> What I typically do is more like this:
>>>>>>>>
>>>>>>>> def f(double[:] input, double[:] out=None):
>>>>>>>>    if out is None:
>>>>>>>>        out = np.empty_like(input)
>>>>>>>>    ...
>>>>>>>>
>>>>>>>> Having to use another variable name is a bit of a pain. (Come on --
>>>>>>>> do
>>>>>>>> you
>>>>>>>> use "a" in real code? What do you actually call "the other obj"? I
>>>>>>>> sometimes
>>>>>>>> end up with "out_" and so on, but it creates smelly code quite
>>>>>>>> quickly.)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> No, it was just a contrived example.
>>>>>>>
>>>>>>>> It's easy to segfault with cdef classes anyway, so decent
>>>>>>>> nonechecking
>>>>>>>> should be implemented at some point, and then memoryviews would use
>>>>>>>> the
>>>>>>>> same
>>>>>>>> mechanisms. Java has decent null-checking...
>>>>>>>>
>>>>>>>
>>>>>>> The problem with none checking is that it has to occur at every
>>>>>>> point.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, using control flow analysis etc. it doesn't really. E.g.,
>>>>>>
>>>>>> for i in range(a.shape[0]):
>>>>>>    print i
>>>>>>    a[i] *= 3
>>>>>>
>>>>>> can be unrolled and none-checks inserted as
>>>>>>
>>>>>> print 0
>>>>>> if a is None: raise ....
>>>>>> a[0] *= 3
>>>>>> for i in range(1, a.shape[0]):
>>>>>>    print i
>>>>>>    a[i] *= 3 # no need for none-check
>>>>>>
>>>>>> It's very similar to what you'd want to do to pull boundschecking out
>>>>>> of
>>>>>> the
>>>>>> loop...
>>>>>>
>>>>>
>>>>> Oh, definitely. Both optimizations may not always be possible to do,
>>>>> though. The optimization (for boundschecking) is easier for prange()
>>>>> than range(), as you can immediately raise an exception as the
>>>>> exceptional condition may be issued at any iteration.  What do you do
>>>>> with bounds checking when some accesses are in-bound, and some are
>>>>> out-of-bound? Do you immediately raise the exception? Are we fine with
>>>>> aborting (like Fortran compilers do when you ask them for bounds
>>>>> checking)? And how do you detect that the code doesn't already raise
>>>>> an exception or break out of the loop itself to prevent the
>>>>> out-of-bound access? (Unless no exceptions are propagating and no
>>>>> break/return is used, but exceptions are so very common).
>>>>>
>>>>>>> With initialized slices the control flow knows when the slices are
>>>>>>> initialized, or when they might not be (and it can raise a
>>>>>>> compile-time or runtime error, instead of a segfault if you're
>>>>>>> lucky).
>>>>>>> I'm fine with implementing the behaviour, I just always left it at
>>>>>>> the
>>>>>>> bottom of my todo list.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Wasn't saying you should do it, just checking.
>>>>>>
>>>>>> I'm still not sure about this. I think what I'd really like is
>>>>>>
>>>>>>  a) Stop cdef classes from being None as well
>>>>>>
>>>>>>  b) Sort-of deprecate cdef in favor of cast/assertion type statements
>>>>>> that
>>>>>> help the type inferences:
>>>>>>
>>>>>> def f(arr):
>>>>>>    if arr is None:
>>>>>>        arr = ...
>>>>>>    arr = int[:](arr) # equivalent to "cdef int[:] arr = arr", but
>>>>>>                      # acts as statement, with a specific point
>>>>>>                      # for the none-check
>>>>>>    ...
>>>>>>
>>>>>> or even:
>>>>>>
>>>>>> def f(arr):
>>>>>>    if arr is None:
>>>>>>        return 'foo'
>>>>>>    else:
>>>>>>        arr = int[:](arr) # takes effect *here*, does none-check
>>>>>>        ...
>>>>>>    # arr still typed as int[:] here
>>>>>>
>>>>>> If we can make this work well enough with control flow analysis I'd
>>>>>> never
>>>>>> cdef declare local vars again :-)
>>>>>
>>>>>
>>>>>
>>>>> Hm, what about the following?
>>>>>
>>>>> def f(arr):
>>>>>    if arr is None:
>>>>>        return 'foo'
>>>>>
>>>>>    cdef int[:] arr # arr may not be None
>>>>
>>>>
>>>>
>>>> The above would work in general, until the declaration is lexically
>>>> encountered, the object is typed as object.
>>>
>>>
>>>
>>> This was actually going to be my first proposal :-) That would finally
>>> define how "cdef" inside of if-statements etc. behave too (simply use
>>> control flow analysis and treat it like a statement).
>>
>>
>> Block-local declarations are definitely something we want, although I
>> think it would require some more (non-trivial) changes to the
>> compiler.
>
>
> Note that my proposal was actually not about block-local declarations.
>
> Block-local:
>
> {
>   int x = 4;
> }
> /* x not available here */
>
> My idea was much more like hints to control flow analysis. That is, I wanted
> to have this raise an error:
>
> x = 'adf'
> if foo():
>    cdef int x = y
> print x # type of x not known
>
> This is OK:
>
> if foo():
>    cdef int x = y
> else:
>    cdef int x = 4
> print x # ok, type the same anyway -- so type "escapes" block

Seeing that it doesn't work that way in any language with block
scopes, I find that pretty surprising behaviour. Why would you not
simply mandate that the user declares 'x' outside of the blocks?

> And I would allow
>
> cdef str x = y
> if foo:
>    cdef int x = int(x)
>    return g(x) # x must be int
> print x # x must be str at this point
>
>
> The reason for this madness is simply that control statements do NOT create
> blocks in Python, and making it so in Cython is just confusing. It would
> bring too much of C into the language for my taste.

And yet it can be very useful and intuitive in several contexts, just
not for objects (which aren't typed anyway!). Block-local declarations
are useful when a variable is only used in the block and it can be
useful to make variables private in the cython.parallel context
("assignment makes private" is really not as intuitive). It's not a
very important feature though, and it's indeed more a thing from
static languages than Python.

> I think that in my Cython-utopia, Symtab.py is only responsible for
> resolving the scope of *names*, and types of things are not bound to blocks,
> just to the state at control flow points.
>
> Of course, implementing this would be a nightmare.
>
>
>> Maybe the cleanup code from functions, as well as the temp handling
>> etc could be re-factored to a BlockNode, that all block nodes could
>> subclass. They'd have to instantiate new symbol table environments as
>> well. I'm not yet entirely sure what else would be involved in the
>> implementation of that.
>>
>>> But I like int[:] as a way of making it pure Python syntax compatible as
>>> well. Perhaps the two are orthogonal -- a) make variable declaration a
>>> statement, b) make cython.int[:](x) do, essentially, a cdef declaration,
>>> for
>>> Python compatability.
>>>
>>
>> Don't we have cython.declare() for that? e.g.
>>
>>     arr = cython.declare(cython.int[:])
>>
>> That would also be treated as a statement like normal declarations (if
>> and when implemented).
>
>
> This was what I said, but it wasn't what I meant. Sorry. I'll try to explain
> better:
>
> 1)  There's no way to have the above actually do the right thing in Python.
> With "arr = cython.int[:](arr)" one could actually return a NumPy or
> NumPy-like array that works in Python (since "arr" might not have the
> "shape" attribute before the conversion, all we know is that it exports the
> buffer interface...).

Right, but the same thing goes for other types as well. E.g. I can
type something int with cython.declare() and then use strings instead.

> 2) I don't like the fact that we overload the assignment operator to acquire
> a view. "cdef np.ndarray[int] x = y" is fine since if you do "x.someattr"
> then a NumPy subclass could provide someattr and it works fine. Acquiring a
> view is just something different.

Yeah it's kind of overloaded, but in a good way :) It's the language
that does the overloading, which means it's not very surprising. And
the memoryview slices coerce to numpy-like (although somewhat
incapable) objects and support some of their attributes. I like the
simplicity of assignment here, you don't really care that it takes a
view, you just want to access and operate on the data.

What do you think of allowing the user to register a
conversion-to-object function? And perhaps the default should be that
if a view was never sliced, it just returns the original object
(although that might mean you get back objects with incompatible
interfaces...).

> 3) Hence I guess I like "arr = int[:](arr)" better both for Cython and
> Python; at least if "arr" is always type-inferred to be int[:], even if arr
> was an "object" further up in the code (really, if you do "x = f(x)" at the
> top-level of the function, then x can just take the identity of another
> variable from that point on -- I don't know if the current control flow
> analysis and type inferences does this though?)
>
>
> Dag Sverre
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel