[Cython] Use of long type for intermediate integral variables

Thu Jul 2 08:30:15 CEST 2015

Robert McGibbon schrieb am 01.07.2015 um 11:12:
> I noticed an issue on Windows when debugging an issue in scipy
> <https://github.com/scipy/scipy/issues/4907>, but I think it might be a
> little more general.  In some places in the generated code, it looks like
> intermediate integral variables are declared with type long, even when long
> is too small to hold necessary value. For example, with the code pasted
> below, the value n+1 is stored in a variable of type long (using Cython
> 0.22.1) before being supplied to F.__getitem__.
> 
> This is especially pertinent on Windows (32 bit and 64 bit) and 32-bit
> linux, where longs are 32-bits, so you get an overflow for a program like
> the example below. The result is that it prints 1 instead of the expected
> value, 2**53+1 = 9007199254740993. But this same issue comes up basically
> whenever you do arithmetic on an array index in 64-bit Windows, for indices
> larger than 2**31-1, since sizeof(long) << sizeof(void*).
> 
> ```
> from libc.stdint cimport int64_t
> 
> class F(object):
>     def __getitem__(self, i):
>         print(i)
> 
> cdef int64_t n = 2**53
> f = F()
> f[n+1]
> ```

Thanks for the report and the investigation. I can imagine why this is the
case. "libc.stdint.int64_t" is hand-wavingly declared as "long" and the
literal 1 is also of type "long" in Cython, so it infers that using "long"
is good enough to hold the result of the sum.

You can work around this by casting the 1 to <int64_t>, but that's clumsy
and error prone. The problem is that Cython doesn't know the exact type of
typedefs at translation time, only the C compilers will eventually know and
might have diverging ideas about it. Your specific issue could be helped by
preferring typedefs over standard integer types in the decision which type
to use for arithmetic expressions, but that would then break the case where
the typedef-ed type happens to be smaller than the standard one, e.g.

    cdef extern from "...":
        ctypedef long long SomeInt  # declared large enough, just in case

    def test(SomeInt x):
        cdef long long y = 1
        return x + y

If Cython inferred "int64" for the type of the result, the C code would be
correct if sizeof(SomeInt) >= sizeof(long long), but not if it's smaller.

Also, what should happen in expressions that used two different user
provided typedefs of the same declared base type? The decision here must
necessarily be arbitrary.

So, I agree that what you have found is a problem, it feels like fixing it
by preferring typedefs would generally be a good idea, but on the other
hand, it might break existing code (which usually means that it *will*
break someone's code), and it would not fix all possible problematic cases.

Not an easy decision...

Stefan