From szport at gmail.com Fri Mar 1 07:31:14 2013 From: szport at gmail.com (ZS) Date: Fri, 1 Mar 2013 09:31:14 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <512FC919.4010702@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: 2013/3/1 Stefan Behnel : > ZS, 28.02.2013 21:07: >> 2013/2/28 Stefan Behnel: >>>> This allows to write unicode text parsing code almost at C speed >>>> mostly in python (+ .pxd defintions). >>> >>> I suggest simply adding a constant flag argument to the existing function >>> that states if checking should be done or not. Inlining will let the C >>> compiler drop the corresponding code, which may or may nor make it a little >>> faster. >> >> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { >> Py_ssize_t length; >> #if CYTHON_PEP393_ENABLED >> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >> #endif >> if (flag) { >> length = __Pyx_PyUnicode_GET_LENGTH(ustring); >> if ((0 <= i) & (i < length)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } else if ((-length <= i) & (i < 0)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); >> } else { >> PyErr_SetString(PyExc_IndexError, "string index out of range"); >> return (Py_UCS4)-1; >> } >> } else { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } >> } > > I think you could even pass in two flags, one for wraparound and one for > boundscheck, and then just evaluate them appropriately in the existing "if" > tests above. That should allow both features to be supported independently > in a fast way. > > >> Here are timings: >> >> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >> mytests.unicode_index import test_1" "test_1()" >> 50 loops, best of 5: 152 msec per loop >> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >> mytests.unicode_index import test_2" "test_2()" >> 50 loops, best of 5: 86.5 msec per loop >> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >> mytests.unicode_index import test_3" "test_3()" >> 50 loops, best of 5: 86.5 msec per loop >> >> So your suggestion would be preferable. > > Nice. Yes, looks like it' worth it. > Sure that same could be applied to unicode slicing too. Zaur Shibzukhov From szport at gmail.com Fri Mar 1 07:43:34 2013 From: szport at gmail.com (ZS) Date: Fri, 1 Mar 2013 09:43:34 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <512FC919.4010702@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: 2013/3/1 Stefan Behnel : > ZS, 28.02.2013 21:07: >> 2013/2/28 Stefan Behnel: >>>> This allows to write unicode text parsing code almost at C speed >>>> mostly in python (+ .pxd defintions). >>> >>> I suggest simply adding a constant flag argument to the existing function >>> that states if checking should be done or not. Inlining will let the C >>> compiler drop the corresponding code, which may or may nor make it a little >>> faster. >> >> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { >> Py_ssize_t length; >> #if CYTHON_PEP393_ENABLED >> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >> #endif >> if (flag) { >> length = __Pyx_PyUnicode_GET_LENGTH(ustring); >> if ((0 <= i) & (i < length)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } else if ((-length <= i) & (i < 0)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); >> } else { >> PyErr_SetString(PyExc_IndexError, "string index out of range"); >> return (Py_UCS4)-1; >> } >> } else { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } >> } > > I think you could even pass in two flags, one for wraparound and one for > boundscheck, and then just evaluate them appropriately in the existing "if" > tests above. That should allow both features to be supported independently > in a fast way. > Intresting, could C compilers in optimization mode to eliminate unused evaluation path in nested if statements with constant conditional expressions? From stefan_ml at behnel.de Fri Mar 1 07:46:30 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Mar 2013 07:46:30 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: <51304EC6.9050300@behnel.de> ZS, 01.03.2013 07:43: > 2013/3/1 Stefan Behnel: >> ZS, 28.02.2013 21:07: >>> 2013/2/28 Stefan Behnel: >>>>> This allows to write unicode text parsing code almost at C speed >>>>> mostly in python (+ .pxd defintions). >>>> >>>> I suggest simply adding a constant flag argument to the existing function >>>> that states if checking should be done or not. Inlining will let the C >>>> compiler drop the corresponding code, which may or may nor make it a little >>>> faster. >>> >>> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { >>> Py_ssize_t length; >>> #if CYTHON_PEP393_ENABLED >>> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >>> #endif >>> if (flag) { >>> length = __Pyx_PyUnicode_GET_LENGTH(ustring); >>> if ((0 <= i) & (i < length)) { >>> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >>> } else if ((-length <= i) & (i < 0)) { >>> return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); >>> } else { >>> PyErr_SetString(PyExc_IndexError, "string index out of range"); >>> return (Py_UCS4)-1; >>> } >>> } else { >>> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >>> } >>> } >> >> I think you could even pass in two flags, one for wraparound and one for >> boundscheck, and then just evaluate them appropriately in the existing "if" >> tests above. That should allow both features to be supported independently >> in a fast way. >> > Intresting, could C compilers in optimization mode to eliminate unused > evaluation path in nested if statements with constant conditional > expressions? They'd be worthless if they didn't do that. (Even Cython does it, BTW.) Stefan From szport at gmail.com Fri Mar 1 07:54:56 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 1 Mar 2013 09:54:56 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <51304EC6.9050300@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <51304EC6.9050300@behnel.de> Message-ID: >>> >>> I think you could even pass in two flags, one for wraparound and one for >>> boundscheck, and then just evaluate them appropriately in the existing "if" >>> tests above. That should allow both features to be supported independently >>> in a fast way. >>> >> Intresting, could C compilers in optimization mode to eliminate unused >> evaluation path in nested if statements with constant conditional >> expressions? > > They'd be worthless if they didn't do that. (Even Cython does it, BTW.) > Then it can simplify writing utility code in order to support different optimization flags in other cases too. From robertwb at gmail.com Fri Mar 1 08:25:09 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 28 Feb 2013 23:25:09 -0800 Subject: [Cython] Be more forgiving about memoryview strides In-Reply-To: References: <1362064397.2663.14.camel@sebastian-laptop> Message-ID: On Thu, Feb 28, 2013 at 11:12 AM, Nathaniel Smith wrote: > On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw wrote: >> On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg >> wrote: >>> Hey, >>> >>> Maybe someone here already saw it (I don't have a track account, or I >>> would just create a ticket), but it would be nice if Cython was more >>> forgiving about contiguous requirements on strides. In the future this >>> would make it easier for numpy to go forward with changing the >>> contiguous flags to be more reasonable for its purpose, and second also >>> to allow old (and maybe for the moment remaining) corner cases in numpy >>> to slip past (as well as possibly the same for other programs...). An >>> example is (see also https://github.com/numpy/numpy/issues/2956 and the >>> PR linked there for more details): >>> >>> def add_one(array): >>> cdef double[::1] a = array >>> a[0] += 1. >>> return array >>> >>> giving: >>> >>>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) >>> ValueError: Buffer and memoryview are not contiguous in the same >>> dimension. >>> >>> This could easily be changed if MemoryViews check the strides as "can be >>> interpreted as contiguous". That means that if shape[i] == 1, then >>> strides[i] are arbitrary (you can just change them if you like). This is >>> also the case for 0-sized arrays, which are arguably always contiguous, >>> no matter their strides are! >> >> I was under the impression that the primary value for contiguous is >> that it a foo[::1] can be interpreted as a foo*. Letting strides be >> arbitrary completely breaks this, right? > > Nope. The natural definition of "C contiguous" is "the array entries > are arranged in memory in the same way they would be if they were a > multidimensional C array" (i.e., what you said.) But it turns out that > this is *not* the definition that numpy and cython use! > > The issue is that the above definition is a constraint on the actual > locations of items in memory, i.e., given a shape, it tells you that > for every index, > (a) sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * itemsize) > Obviously this equality holds if > (b) strides == cumprod(shape[::-1])[::-1] * itemsize > (Or for F-contiguity, we have > (b') strides == cumprod(shape) * itemsize > ) > > (a) is the natural definition of "C contiguous". (b) is the definition > of "C contiguous" used by numpy and cython. (b) implies (a). But (a) > does not imply (b), i.e., there are arrays that are C-contiguous which > numpy and cython think are discontiguous. (Also in numpy there are > some weird cases where numpy accidentally uses the correct definition, > I think, which is the point of Sebastian's example.) > > In particular, if shape[i] == 1, then the value of stride[i] really > should be irrelevant to judging contiguity, because the only thing you > can do with strides[i] is multiply it by index[i], and if shape[i] == > 1 then index[i] is always 0. So an array of int8's with shape = (10, > 1), strides = (1, 73) is contiguous according to (a), but not > according to (b). Also if shape[i] is 0 for any i, then the entire > contents of the strides array becomes irrelevant to judging > contiguity; all zero-sized arrays are contiguous according to (a), but > not (b). Thanks for clarifying. Yes, I think it makes a lot of sense to loosen our definition for Cython. Internally, I think the only way we use this assumption is in not requiring that the first/final index be multiplied by the stride, which should be totally fine. But this merits closer inspection as there may be something else. > (This is really annoying for numpy because given, say, a column vector > with shape (n, 1), it is impossible to be both C- and F-contiguous > according to the (b)-style definition. But people expect expect > various operations to preserve C versus F contiguity, so there are > heuristics in numpy that try to guess whether various result arrays > should pretend to be C- or F-contiguous, and we don't even have a > consistent idea of what it would mean for this code to be working > correctly, never mind test it and keep it working. OTOH if we just fix > numpy to use the (a) definition, then it turns out a bunch of > third-party code breaks, like, for example, cython.) Can you give some examples? - Robert From szport at gmail.com Fri Mar 1 08:37:00 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 1 Mar 2013 10:37:00 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: 2013/3/1 ZS : > 2013/3/1 Stefan Behnel : >> ZS, 28.02.2013 21:07: >>> 2013/2/28 Stefan Behnel: >>>>> This allows to write unicode text parsing code almost at C speed >>>>> mostly in python (+ .pxd defintions). >>>> >>>> I suggest simply adding a constant flag argument to the existing function >>>> that states if checking should be done or not. Inlining will let the C >>>> compiler drop the corresponding code, which may or may nor make it a little >>>> faster. >>> >>> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { >>> Py_ssize_t length; >>> #if CYTHON_PEP393_ENABLED >>> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >>> #endif >>> if (flag) { >>> length = __Pyx_PyUnicode_GET_LENGTH(ustring); >>> if ((0 <= i) & (i < length)) { >>> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >>> } else if ((-length <= i) & (i < 0)) { >>> return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); >>> } else { >>> PyErr_SetString(PyExc_IndexError, "string index out of range"); >>> return (Py_UCS4)-1; >>> } >>> } else { >>> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >>> } >>> } >> >> I think you could even pass in two flags, one for wraparound and one for >> boundscheck, and then just evaluate them appropriately in the existing "if" >> tests above. That should allow both features to be supported independently >> in a fast way. >> >> >>> Here are timings: >>> >>> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >>> mytests.unicode_index import test_1" "test_1()" >>> 50 loops, best of 5: 152 msec per loop >>> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >>> mytests.unicode_index import test_2" "test_2()" >>> 50 loops, best of 5: 86.5 msec per loop >>> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >>> mytests.unicode_index import test_3" "test_3()" >>> 50 loops, best of 5: 86.5 msec per loop >>> >>> So your suggestion would be preferable. >> >> Nice. Yes, looks like it' worth it. >> > > Sure that same could be applied to unicode slicing too. > I had to verify myself first. So here is the test... unicode_slice.h --------------------- #include "unicodeobject.h" static inline PyObject* unicode_slice( PyObject* text, Py_ssize_t start, Py_ssize_t stop); /////////////// PyUnicode_Substring /////////////// /* CURRENT */ static inline PyObject* unicode_slice( PyObject* text, Py_ssize_t start, Py_ssize_t stop) { Py_ssize_t length; #if CYTHON_PEP393_ENABLED if (PyUnicode_READY(text) == -1) return NULL; length = PyUnicode_GET_LENGTH(text); #else length = PyUnicode_GET_SIZE(text); #endif if (start < 0) { start += length; if (start < 0) start = 0; } if (stop < 0) stop += length; else if (stop > length) stop = length; length = stop - start; if (length <= 0) return PyUnicode_FromUnicode(NULL, 0); #if CYTHON_PEP393_ENABLED return PyUnicode_FromKindAndData(PyUnicode_KIND(text), PyUnicode_1BYTE_DATA(text) + start*PyUnicode_KIND(text), stop-start); #else return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(text)+start, stop-start); #endif } static inline PyObject* unicode_slice2( PyObject* text, Py_ssize_t start, Py_ssize_t stop, int flag); /////////////// PyUnicode_Substring /////////////// /* CHANGED */ static inline PyObject* unicode_slice2( PyObject* text, Py_ssize_t start, Py_ssize_t stop, int flag) { Py_ssize_t length; #if CYTHON_PEP393_ENABLED if (PyUnicode_READY(text) == -1) return NULL; #endif if (flag) { #if CYTHON_PEP393_ENABLED length = PyUnicode_GET_LENGTH(text); #else length = PyUnicode_GET_SIZE(text); #endif if (start < 0) { start += length; if (start < 0) start = 0; } if (stop < 0) stop += length; else if (stop > length) stop = length; length = stop - start; if (length <= 0) return PyUnicode_FromUnicode(NULL, 0); } #if CYTHON_PEP393_ENABLED return PyUnicode_FromKindAndData(PyUnicode_KIND(text), PyUnicode_1BYTE_DATA(text) + start*PyUnicode_KIND(text), stop-start); #else return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(text)+start, stop-start); #endif } unicode_slice.pyx ------------------------ cdef extern from 'unicode_slice.h': inline unicode unicode_slice(unicode ustring, int start, int stop) inline unicode unicode_slice2(unicode ustring, int start, int stop, int flag) cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz" cdef long f_1(unicode text): cdef int i, j cdef int n = len(text) cdef int val cdef long S = 0 for j in range(100000): for i in range(n): val = len(unicode_slice(text, 0, i)) S += val * j return S cdef long f_2(unicode text): cdef int i, j cdef int n = len(text) cdef int val cdef long S = 0 for j in range(100000): for i in range(n): val = len(unicode_slice2(text, 0, i, 0)) S += val * j return S def test_1(): f_1(text) def test_2(): f_2(text) Here are timings: (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from mytests.unicode_slice import test_1" "test_1()" 50 loops, best of 5: 534 msec per loop (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from mytests.unicode_slice import test_2" "test_2()" 50 loops, best of 5: 523 msec per loop Only 2% Zaur Shibzukhov From stefan_ml at behnel.de Fri Mar 1 08:56:21 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Mar 2013 08:56:21 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: <51305F25.5040805@behnel.de> Zaur Shibzukhov, 01.03.2013 08:37: > unicode_slice.h > --------------------- > > #include "unicodeobject.h" > > static inline PyObject* unicode_slice( > PyObject* text, Py_ssize_t start, Py_ssize_t stop); > > /////////////// PyUnicode_Substring /////////////// > > /* CURRENT */ > > static inline PyObject* unicode_slice( > PyObject* text, Py_ssize_t start, Py_ssize_t stop) { > Py_ssize_t length; > #if CYTHON_PEP393_ENABLED > if (PyUnicode_READY(text) == -1) return NULL; > length = PyUnicode_GET_LENGTH(text); > #else > length = PyUnicode_GET_SIZE(text); > #endif > if (start < 0) { > start += length; > if (start < 0) > start = 0; > } > if (stop < 0) > stop += length; > else if (stop > length) > stop = length; > length = stop - start; > if (length <= 0) > return PyUnicode_FromUnicode(NULL, 0); > #if CYTHON_PEP393_ENABLED > return PyUnicode_FromKindAndData(PyUnicode_KIND(text), > PyUnicode_1BYTE_DATA(text) + start*PyUnicode_KIND(text), stop-start); > #else > return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(text)+start, stop-start); > #endif > } > > static inline PyObject* unicode_slice2( > PyObject* text, Py_ssize_t start, Py_ssize_t stop, int flag); > > /////////////// PyUnicode_Substring /////////////// > > /* CHANGED */ > > static inline PyObject* unicode_slice2( > PyObject* text, Py_ssize_t start, Py_ssize_t stop, int flag) { > Py_ssize_t length; > > #if CYTHON_PEP393_ENABLED > if (PyUnicode_READY(text) == -1) return NULL; > #endif > > if (flag) { > #if CYTHON_PEP393_ENABLED > length = PyUnicode_GET_LENGTH(text); > #else > length = PyUnicode_GET_SIZE(text); > #endif > if (start < 0) { > start += length; > if (start < 0) > start = 0; > } > if (stop < 0) > stop += length; > else if (stop > length) > stop = length; > length = stop - start; > if (length <= 0) > return PyUnicode_FromUnicode(NULL, 0); > } > > #if CYTHON_PEP393_ENABLED > return PyUnicode_FromKindAndData(PyUnicode_KIND(text), > PyUnicode_1BYTE_DATA(text) + start*PyUnicode_KIND(text), stop-start); > #else > return PyUnicode_FromUnicode(PyUnicode_AS_UNICODE(text)+start, stop-start); > #endif > } > > unicode_slice.pyx > ------------------------ > > cdef extern from 'unicode_slice.h': > inline unicode unicode_slice(unicode ustring, int start, int stop) > inline unicode unicode_slice2(unicode ustring, int start, int > stop, int flag) > > cdef unicode text = u"abcdefghigklmnopqrstuvwxyzabcdefghigklmnopqrstuvwxyz" > > cdef long f_1(unicode text): > cdef int i, j > cdef int n = len(text) > cdef int val > cdef long S = 0 > > for j in range(100000): > for i in range(n): > val = len(unicode_slice(text, 0, i)) > S += val * j > > return S > > cdef long f_2(unicode text): > cdef int i, j > cdef int n = len(text) > cdef int val > cdef long S = 0 > > for j in range(100000): > for i in range(n): > val = len(unicode_slice2(text, 0, i, 0)) > S += val * j > > return S > > > def test_1(): > f_1(text) > > def test_2(): > f_2(text) > > Here are timings: > > (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from > mytests.unicode_slice import test_1" "test_1()" > 50 loops, best of 5: 534 msec per loop > (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from > mytests.unicode_slice import test_2" "test_2()" > 50 loops, best of 5: 523 msec per loop > > Only 2% That's to be expected. Creating a Unicode string object is the highly dominating operation here, including memory allocation, object type selection and what not. Stefan From stefan_ml at behnel.de Fri Mar 1 09:00:02 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Mar 2013 09:00:02 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <51304EC6.9050300@behnel.de> Message-ID: <51306002.8000701@behnel.de> Zaur Shibzukhov, 01.03.2013 07:54: >>>> I think you could even pass in two flags, one for wraparound and one for >>>> boundscheck, and then just evaluate them appropriately in the existing "if" >>>> tests above. That should allow both features to be supported independently >>>> in a fast way. >>>> >>> Intresting, could C compilers in optimization mode to eliminate unused >>> evaluation path in nested if statements with constant conditional >>> expressions? >> >> They'd be worthless if they didn't do that. (Even Cython does it, BTW.) >> > Then it can simplify writing utility code in order to support > different optimization flags in other cases too. Usually, yes. Look at the dict iteration code, for example, which makes pretty heavy use of it. This may not work in all cases, because the C compiler can decide to *not* inline a function, for example, or may not be capable of cutting down the code sufficiently in some specific cases. I agree in general, but I wouldn't say that it's worth changing existing (and working) code. Stefan From szport at gmail.com Fri Mar 1 09:07:42 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 1 Mar 2013 11:07:42 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <51306002.8000701@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <51304EC6.9050300@behnel.de> <51306002.8000701@behnel.de> Message-ID: >> Then it can simplify writing utility code in order to support >> different optimization flags in other cases too. > > Usually, yes. Look at the dict iteration code, for example, which makes > pretty heavy use of it. > > This may not work in all cases, because the C compiler can decide to *not* > inline a function, for example, or may not be capable of cutting down the > code sufficiently in some specific cases. > > I agree in general, but I wouldn't say that it's worth changing existing > (and working) code. > Thus the strategy of specialization of code to handle special cases, while maintaining the existing code, which works well in general, is the preferred? From robertwb at gmail.com Fri Mar 1 09:25:15 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 1 Mar 2013 00:25:15 -0800 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <51304EC6.9050300@behnel.de> Message-ID: On Thu, Feb 28, 2013 at 10:54 PM, Zaur Shibzukhov wrote: >>>> >>>> I think you could even pass in two flags, one for wraparound and one for >>>> boundscheck, and then just evaluate them appropriately in the existing "if" >>>> tests above. That should allow both features to be supported independently >>>> in a fast way. >>>> >>> Intresting, could C compilers in optimization mode to eliminate unused >>> evaluation path in nested if statements with constant conditional >>> expressions? >> >> They'd be worthless if they didn't do that. (Even Cython does it, BTW.) >> > Then it can simplify writing utility code in order to support > different optimization flags in other cases too. The one thing you don't have much control over is whether the C compiler will actually inline the function (CYTHON_INLINE is just a hint). In particular, it may decide the function is too large to inline before realizing how small it would become given the constant arguments. I'm actually not sure how much of a problem this is in practice... - Robert From szport at gmail.com Fri Mar 1 10:46:39 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 1 Mar 2013 12:46:39 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <512FC919.4010702@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: 2013/3/1 Stefan Behnel : > ZS, 28.02.2013 21:07: >> 2013/2/28 Stefan Behnel: >>>> This allows to write unicode text parsing code almost at C speed >>>> mostly in python (+ .pxd defintions). >>> >>> I suggest simply adding a constant flag argument to the existing function >>> that states if checking should be done or not. Inlining will let the C >>> compiler drop the corresponding code, which may or may nor make it a little >>> faster. >> >> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { >> Py_ssize_t length; >> #if CYTHON_PEP393_ENABLED >> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >> #endif >> if (flag) { >> length = __Pyx_PyUnicode_GET_LENGTH(ustring); >> if ((0 <= i) & (i < length)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } else if ((-length <= i) & (i < 0)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); >> } else { >> PyErr_SetString(PyExc_IndexError, "string index out of range"); >> return (Py_UCS4)-1; >> } >> } else { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } >> } > > I think you could even pass in two flags, one for wraparound and one for > boundscheck, and then just evaluate them appropriately in the existing "if" > tests above. That should allow both features to be supported independently > in a fast way. > > >> Here are timings: >> >> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >> mytests.unicode_index import test_1" "test_1()" >> 50 loops, best of 5: 152 msec per loop >> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >> mytests.unicode_index import test_2" "test_2()" >> 50 loops, best of 5: 86.5 msec per loop >> (py33) zbook:mytests $ python3.3 -m timeit -n 50 -r 5 -s "from >> mytests.unicode_index import test_3" "test_3()" >> 50 loops, best of 5: 86.5 msec per loop >> >> So your suggestion would be preferable. > > Nice. Yes, looks like it' worth it. > Could I help in order to include this in 19.0? Zaur Shibzukhov From stefan_ml at behnel.de Fri Mar 1 11:47:59 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 01 Mar 2013 11:47:59 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: <5130875F.7090406@behnel.de> Zaur Shibzukhov, 01.03.2013 10:46: > Could I help in order to include this in 19.0? I like pull requests. ;) Stefan From szport at gmail.com Fri Mar 1 11:54:47 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 1 Mar 2013 13:54:47 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <5130875F.7090406@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <5130875F.7090406@behnel.de> Message-ID: 2013/3/1 Stefan Behnel : > Zaur Shibzukhov, 01.03.2013 10:46: >> Could I help in order to include this in 19.0? > > I like pull requests. ;) > OK From sebastian at sipsolutions.net Fri Mar 1 16:56:39 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 16:56:39 +0100 Subject: [Cython] Be more forgiving about memoryview strides In-Reply-To: References: <1362064397.2663.14.camel@sebastian-laptop> Message-ID: <1362153399.13987.74.camel@sebastian-laptop> On Thu, 2013-02-28 at 23:25 -0800, Robert Bradshaw wrote: > On Thu, Feb 28, 2013 at 11:12 AM, Nathaniel Smith wrote: > > On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw wrote: > >> On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg > >> wrote: > >>> Hey, > >>> > >>> Maybe someone here already saw it (I don't have a track account, or I > >>> would just create a ticket), but it would be nice if Cython was more > >>> forgiving about contiguous requirements on strides. In the future this > >>> would make it easier for numpy to go forward with changing the > >>> contiguous flags to be more reasonable for its purpose, and second also > >>> to allow old (and maybe for the moment remaining) corner cases in numpy > >>> to slip past (as well as possibly the same for other programs...). An > >>> example is (see also https://github.com/numpy/numpy/issues/2956 and the > >>> PR linked there for more details): > >>> > >>> def add_one(array): > >>> cdef double[::1] a = array > >>> a[0] += 1. > >>> return array > >>> > >>> giving: > >>> > >>>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) > >>> ValueError: Buffer and memoryview are not contiguous in the same > >>> dimension. > >>> > >>> This could easily be changed if MemoryViews check the strides as "can be > >>> interpreted as contiguous". That means that if shape[i] == 1, then > >>> strides[i] are arbitrary (you can just change them if you like). This is > >>> also the case for 0-sized arrays, which are arguably always contiguous, > >>> no matter their strides are! > >> > >> I was under the impression that the primary value for contiguous is > >> that it a foo[::1] can be interpreted as a foo*. Letting strides be > >> arbitrary completely breaks this, right? > > > > Nope. The natural definition of "C contiguous" is "the array entries > > are arranged in memory in the same way they would be if they were a > > multidimensional C array" (i.e., what you said.) But it turns out that > > this is *not* the definition that numpy and cython use! > > > > The issue is that the above definition is a constraint on the actual > > locations of items in memory, i.e., given a shape, it tells you that > > for every index, > > (a) sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * itemsize) > > Obviously this equality holds if > > (b) strides == cumprod(shape[::-1])[::-1] * itemsize > > (Or for F-contiguity, we have > > (b') strides == cumprod(shape) * itemsize > > ) > > > > (a) is the natural definition of "C contiguous". (b) is the definition > > of "C contiguous" used by numpy and cython. (b) implies (a). But (a) > > does not imply (b), i.e., there are arrays that are C-contiguous which > > numpy and cython think are discontiguous. (Also in numpy there are > > some weird cases where numpy accidentally uses the correct definition, > > I think, which is the point of Sebastian's example.) > > > > In particular, if shape[i] == 1, then the value of stride[i] really > > should be irrelevant to judging contiguity, because the only thing you > > can do with strides[i] is multiply it by index[i], and if shape[i] == > > 1 then index[i] is always 0. So an array of int8's with shape = (10, > > 1), strides = (1, 73) is contiguous according to (a), but not > > according to (b). Also if shape[i] is 0 for any i, then the entire > > contents of the strides array becomes irrelevant to judging > > contiguity; all zero-sized arrays are contiguous according to (a), but > > not (b). > > Thanks for clarifying. > > Yes, I think it makes a lot of sense to loosen our definition for > Cython. Internally, I think the only way we use this assumption is in > not requiring that the first/final index be multiplied by the stride, > which should be totally fine. But this merits closer inspection as > there may be something else. The only problem I saw was code that used strides[-1] instead of the itemsize (e.g. using strides[i]/strides[-1] to then index the typed buffer instead of using strides[i]/itemsize). But that should be easy to check, numpy had two or so cases of that itself... > > > (This is really annoying for numpy because given, say, a column vector > > with shape (n, 1), it is impossible to be both C- and F-contiguous > > according to the (b)-style definition. But people expect expect > > various operations to preserve C versus F contiguity, so there are > > heuristics in numpy that try to guess whether various result arrays > > should pretend to be C- or F-contiguous, and we don't even have a > > consistent idea of what it would mean for this code to be working > > correctly, never mind test it and keep it working. OTOH if we just fix > > numpy to use the (a) definition, then it turns out a bunch of > > third-party code breaks, like, for example, cython.) > > Can you give some examples? > Not sure for what :). Maybe this is an example: In [1]: a = np.asmatrix(np.arange(9).reshape(3,3).T) In [2]: a.flags.f_contiguous Out[2]: True In [3]: a[:,0].flags Out[3]: C_CONTIGUOUS : True F_CONTIGUOUS : False ... Where that view could just as well be F-contiguous, and the fact that numpy, when in doubt, prefers C-contiguous might be surprising. And since it would be less strict to begin with, numpy may safe a copy here or there (without adding weird stride fixing code). Examples for code breakage would be this check as well as scikit-learn and scipy in 3 or 4 cases making the assumption above of itemsize == strides[-1] for c-contiguous arrays. > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From robertwb at gmail.com Fri Mar 1 21:17:27 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 1 Mar 2013 12:17:27 -0800 Subject: [Cython] Be more forgiving about memoryview strides In-Reply-To: <1362153399.13987.74.camel@sebastian-laptop> References: <1362064397.2663.14.camel@sebastian-laptop> <1362153399.13987.74.camel@sebastian-laptop> Message-ID: On Fri, Mar 1, 2013 at 7:56 AM, Sebastian Berg wrote: > On Thu, 2013-02-28 at 23:25 -0800, Robert Bradshaw wrote: >> On Thu, Feb 28, 2013 at 11:12 AM, Nathaniel Smith wrote: >> > On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw wrote: >> >> On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg >> >> wrote: >> >>> Hey, >> >>> >> >>> Maybe someone here already saw it (I don't have a track account, or I >> >>> would just create a ticket), but it would be nice if Cython was more >> >>> forgiving about contiguous requirements on strides. In the future this >> >>> would make it easier for numpy to go forward with changing the >> >>> contiguous flags to be more reasonable for its purpose, and second also >> >>> to allow old (and maybe for the moment remaining) corner cases in numpy >> >>> to slip past (as well as possibly the same for other programs...). An >> >>> example is (see also https://github.com/numpy/numpy/issues/2956 and the >> >>> PR linked there for more details): >> >>> >> >>> def add_one(array): >> >>> cdef double[::1] a = array >> >>> a[0] += 1. >> >>> return array >> >>> >> >>> giving: >> >>> >> >>>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) >> >>> ValueError: Buffer and memoryview are not contiguous in the same >> >>> dimension. >> >>> >> >>> This could easily be changed if MemoryViews check the strides as "can be >> >>> interpreted as contiguous". That means that if shape[i] == 1, then >> >>> strides[i] are arbitrary (you can just change them if you like). This is >> >>> also the case for 0-sized arrays, which are arguably always contiguous, >> >>> no matter their strides are! >> >> >> >> I was under the impression that the primary value for contiguous is >> >> that it a foo[::1] can be interpreted as a foo*. Letting strides be >> >> arbitrary completely breaks this, right? >> > >> > Nope. The natural definition of "C contiguous" is "the array entries >> > are arranged in memory in the same way they would be if they were a >> > multidimensional C array" (i.e., what you said.) But it turns out that >> > this is *not* the definition that numpy and cython use! >> > >> > The issue is that the above definition is a constraint on the actual >> > locations of items in memory, i.e., given a shape, it tells you that >> > for every index, >> > (a) sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * itemsize) >> > Obviously this equality holds if >> > (b) strides == cumprod(shape[::-1])[::-1] * itemsize >> > (Or for F-contiguity, we have >> > (b') strides == cumprod(shape) * itemsize >> > ) >> > >> > (a) is the natural definition of "C contiguous". (b) is the definition >> > of "C contiguous" used by numpy and cython. (b) implies (a). But (a) >> > does not imply (b), i.e., there are arrays that are C-contiguous which >> > numpy and cython think are discontiguous. (Also in numpy there are >> > some weird cases where numpy accidentally uses the correct definition, >> > I think, which is the point of Sebastian's example.) >> > >> > In particular, if shape[i] == 1, then the value of stride[i] really >> > should be irrelevant to judging contiguity, because the only thing you >> > can do with strides[i] is multiply it by index[i], and if shape[i] == >> > 1 then index[i] is always 0. So an array of int8's with shape = (10, >> > 1), strides = (1, 73) is contiguous according to (a), but not >> > according to (b). Also if shape[i] is 0 for any i, then the entire >> > contents of the strides array becomes irrelevant to judging >> > contiguity; all zero-sized arrays are contiguous according to (a), but >> > not (b). >> >> Thanks for clarifying. >> >> Yes, I think it makes a lot of sense to loosen our definition for >> Cython. Internally, I think the only way we use this assumption is in >> not requiring that the first/final index be multiplied by the stride, >> which should be totally fine. But this merits closer inspection as >> there may be something else. > > The only problem I saw was code that used strides[-1] instead of the > itemsize (e.g. using strides[i]/strides[-1] to then index the typed > buffer instead of using strides[i]/itemsize). But that should be easy to > check, numpy had two or so cases of that itself... I'd be surprised if we do that, but the only way to be sure would be to look at the code. >> > (This is really annoying for numpy because given, say, a column vector >> > with shape (n, 1), it is impossible to be both C- and F-contiguous >> > according to the (b)-style definition. But people expect expect >> > various operations to preserve C versus F contiguity, so there are >> > heuristics in numpy that try to guess whether various result arrays >> > should pretend to be C- or F-contiguous, and we don't even have a >> > consistent idea of what it would mean for this code to be working >> > correctly, never mind test it and keep it working. OTOH if we just fix >> > numpy to use the (a) definition, then it turns out a bunch of >> > third-party code breaks, like, for example, cython.) >> >> Can you give some examples? >> > > Not sure for what :). I meant examples of possible breakage. > Maybe this is an example: > > In [1]: a = np.asmatrix(np.arange(9).reshape(3,3).T) > > In [2]: a.flags.f_contiguous > Out[2]: True > > In [3]: a[:,0].flags > Out[3]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > ... > > Where that view could just as well be F-contiguous, and the fact that > numpy, when in doubt, prefers C-contiguous might be surprising. And > since it would be less strict to begin with, numpy may safe a copy here > or there (without adding weird stride fixing code). > > Examples for code breakage would be this check as well as scikit-learn > and scipy in 3 or 4 cases making the assumption above of itemsize == > strides[-1] for c-contiguous arrays. Ah. So, assuming Cython itself isn't making such assumptions, what support do you want from Cython? I can see (1) accepting as c/f contiguous arrays that meet this looser definition and (2) setting these flags in memoryviews we produce under this looser definition. Is there anything else? - Robert From sebastian at sipsolutions.net Fri Mar 1 22:14:53 2013 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 01 Mar 2013 22:14:53 +0100 Subject: [Cython] Be more forgiving about memoryview strides In-Reply-To: References: <1362064397.2663.14.camel@sebastian-laptop> <1362153399.13987.74.camel@sebastian-laptop> Message-ID: <1362172493.13987.137.camel@sebastian-laptop> On Fri, 2013-03-01 at 12:17 -0800, Robert Bradshaw wrote: > On Fri, Mar 1, 2013 at 7:56 AM, Sebastian Berg > wrote: > > On Thu, 2013-02-28 at 23:25 -0800, Robert Bradshaw wrote: > >> On Thu, Feb 28, 2013 at 11:12 AM, Nathaniel Smith wrote: > >> > On Thu, Feb 28, 2013 at 5:50 PM, Robert Bradshaw wrote: > >> >> On Thu, Feb 28, 2013 at 7:13 AM, Sebastian Berg > >> >> wrote: > >> >>> Hey, > >> >>> > >> >>> Maybe someone here already saw it (I don't have a track account, or I > >> >>> would just create a ticket), but it would be nice if Cython was more > >> >>> forgiving about contiguous requirements on strides. In the future this > >> >>> would make it easier for numpy to go forward with changing the > >> >>> contiguous flags to be more reasonable for its purpose, and second also > >> >>> to allow old (and maybe for the moment remaining) corner cases in numpy > >> >>> to slip past (as well as possibly the same for other programs...). An > >> >>> example is (see also https://github.com/numpy/numpy/issues/2956 and the > >> >>> PR linked there for more details): > >> >>> > >> >>> def add_one(array): > >> >>> cdef double[::1] a = array > >> >>> a[0] += 1. > >> >>> return array > >> >>> > >> >>> giving: > >> >>> > >> >>>>>> add_one(np.ascontiguousarray(np.arange(10.)[::100])) > >> >>> ValueError: Buffer and memoryview are not contiguous in the same > >> >>> dimension. > >> >>> > >> >>> This could easily be changed if MemoryViews check the strides as "can be > >> >>> interpreted as contiguous". That means that if shape[i] == 1, then > >> >>> strides[i] are arbitrary (you can just change them if you like). This is > >> >>> also the case for 0-sized arrays, which are arguably always contiguous, > >> >>> no matter their strides are! > >> >> > >> >> I was under the impression that the primary value for contiguous is > >> >> that it a foo[::1] can be interpreted as a foo*. Letting strides be > >> >> arbitrary completely breaks this, right? > >> > > >> > Nope. The natural definition of "C contiguous" is "the array entries > >> > are arranged in memory in the same way they would be if they were a > >> > multidimensional C array" (i.e., what you said.) But it turns out that > >> > this is *not* the definition that numpy and cython use! > >> > > >> > The issue is that the above definition is a constraint on the actual > >> > locations of items in memory, i.e., given a shape, it tells you that > >> > for every index, > >> > (a) sum(index * strides) == sum(index * cumprod(shape[::-1])[::-1] * itemsize) > >> > Obviously this equality holds if > >> > (b) strides == cumprod(shape[::-1])[::-1] * itemsize > >> > (Or for F-contiguity, we have > >> > (b') strides == cumprod(shape) * itemsize > >> > ) > >> > > >> > (a) is the natural definition of "C contiguous". (b) is the definition > >> > of "C contiguous" used by numpy and cython. (b) implies (a). But (a) > >> > does not imply (b), i.e., there are arrays that are C-contiguous which > >> > numpy and cython think are discontiguous. (Also in numpy there are > >> > some weird cases where numpy accidentally uses the correct definition, > >> > I think, which is the point of Sebastian's example.) > >> > > >> > In particular, if shape[i] == 1, then the value of stride[i] really > >> > should be irrelevant to judging contiguity, because the only thing you > >> > can do with strides[i] is multiply it by index[i], and if shape[i] == > >> > 1 then index[i] is always 0. So an array of int8's with shape = (10, > >> > 1), strides = (1, 73) is contiguous according to (a), but not > >> > according to (b). Also if shape[i] is 0 for any i, then the entire > >> > contents of the strides array becomes irrelevant to judging > >> > contiguity; all zero-sized arrays are contiguous according to (a), but > >> > not (b). > >> > >> Thanks for clarifying. > >> > >> Yes, I think it makes a lot of sense to loosen our definition for > >> Cython. Internally, I think the only way we use this assumption is in > >> not requiring that the first/final index be multiplied by the stride, > >> which should be totally fine. But this merits closer inspection as > >> there may be something else. > > > > The only problem I saw was code that used strides[-1] instead of the > > itemsize (e.g. using strides[i]/strides[-1] to then index the typed > > buffer instead of using strides[i]/itemsize). But that should be easy to > > check, numpy had two or so cases of that itself... > > I'd be surprised if we do that, but the only way to be sure would be > to look at the code. > > >> > (This is really annoying for numpy because given, say, a column vector > >> > with shape (n, 1), it is impossible to be both C- and F-contiguous > >> > according to the (b)-style definition. But people expect expect > >> > various operations to preserve C versus F contiguity, so there are > >> > heuristics in numpy that try to guess whether various result arrays > >> > should pretend to be C- or F-contiguous, and we don't even have a > >> > consistent idea of what it would mean for this code to be working > >> > correctly, never mind test it and keep it working. OTOH if we just fix > >> > numpy to use the (a) definition, then it turns out a bunch of > >> > third-party code breaks, like, for example, cython.) > >> > >> Can you give some examples? > >> > > > > Not sure for what :). > > I meant examples of possible breakage. > > > Maybe this is an example: > > > > In [1]: a = np.asmatrix(np.arange(9).reshape(3,3).T) > > > > In [2]: a.flags.f_contiguous > > Out[2]: True > > > > In [3]: a[:,0].flags > > Out[3]: > > C_CONTIGUOUS : True > > F_CONTIGUOUS : False > > ... > > > > Where that view could just as well be F-contiguous, and the fact that > > numpy, when in doubt, prefers C-contiguous might be surprising. And > > since it would be less strict to begin with, numpy may safe a copy here > > or there (without adding weird stride fixing code). > > > > Examples for code breakage would be this check as well as scikit-learn > > and scipy in 3 or 4 cases making the assumption above of itemsize == > > strides[-1] for c-contiguous arrays. > > Ah. > > So, assuming Cython itself isn't making such assumptions, what support > do you want from Cython? I can see (1) accepting as c/f contiguous > arrays that meet this looser definition and (2) setting these flags in > memoryviews we produce under this looser definition. Is there anything > else? > Just accepting it would be cool. I am not aware that (2) would matter for numpy, so just do whatever you feel best. I doubt numpy will change them in a release version any time soon, but will be nice to know that it can without breaking cython based code! I am wondering if there is a way to work around/warn users doing this (this is what sk-learn had): cdef np.ndarray[ndim=2, mode='c'] a = array step = a.strides[0]/a.strides[1] # Then using a.data[step] but I am not sure. I first thought that if it is easy, you could point a.strides to the buffers strides, allowing numpy to fix those. But just realized that it would be weird since ndarray.strides is an attribute that can be set. And since as I understand this is discouraged already, it is probably not worth it to think about it much. - Sebastian > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > From nikita at nemkin.ru Sat Mar 2 07:52:44 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Sat, 02 Mar 2013 12:52:44 +0600 Subject: [Cython] Two minor bugs Message-ID: Hi, I'm new to this list and to Cython internals. Reporting two recently found bugs: 1. Explicit cast fails unexpectedly: ctypedef char* LPSTR cdef LPSTR c_str = b"ascii" c_str # Failure: Python objects cannot be cast from pointers of primitive types The problem is CTypedefType not delegating can_coerce_to_pyobject() to the original type. (because BaseType.can_coerce_to_pyobject takes precedence over __getattr__). Patch+test case and attached. Interestingly, implicit casts use a different code path and are not affected. There is potential for similar bugs in the future, because __getattr__ delegation is inherently brittle in the presence of the base class (BaseType). 2. This recently added code does not compile with MSVC: https://github.com/cython/cython/blob/master/Cython/Utility/TypeConversion.c#L140-142 Interleaving declarations and statements is not allowed in C90... Best Regards, Nikita Nemkin -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Fixed-explicit-coercion-of-ctypedef-ed-C-types.patch Type: application/octet-stream Size: 2293 bytes Desc: not available URL: From stefan_ml at behnel.de Sat Mar 2 11:52:34 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Mar 2013 11:52:34 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <51304EC6.9050300@behnel.de> Message-ID: <5131D9F2.5070608@behnel.de> Robert Bradshaw, 01.03.2013 09:25: > On Thu, Feb 28, 2013 at 10:54 PM, Zaur Shibzukhov wrote: >>>>> I think you could even pass in two flags, one for wraparound and one for >>>>> boundscheck, and then just evaluate them appropriately in the existing "if" >>>>> tests above. That should allow both features to be supported independently >>>>> in a fast way. >>>>> >>>> Intresting, could C compilers in optimization mode to eliminate unused >>>> evaluation path in nested if statements with constant conditional >>>> expressions? >>> >>> They'd be worthless if they didn't do that. (Even Cython does it, BTW.) >>> >> Then it can simplify writing utility code in order to support >> different optimization flags in other cases too. > > The one thing you don't have much control over is whether the C > compiler will actually inline the function (CYTHON_INLINE is just a > hint). In particular, it may decide the function is too large to > inline before realizing how small it would become given the constant > arguments. I'm actually not sure how much of a problem this is in > practice... I tried it out for the Get/Set/DelItemInt() utility functions and took a look at the generated assembly (gcc -O3). It does look as expected and sometimes also better than what we currently generate. So I think it's worth it. https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94 I'd be happy if someone else could give this change a review to make sure I got all conditions right. Stefan From stefan_ml at behnel.de Sat Mar 2 11:56:13 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Mar 2013 11:56:13 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <512FC919.4010702@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> Message-ID: <5131DACD.6050402@behnel.de> Stefan Behnel, 28.02.2013 22:16: > ZS, 28.02.2013 21:07: >> 2013/2/28 Stefan Behnel: >>>> This allows to write unicode text parsing code almost at C speed >>>> mostly in python (+ .pxd defintions). >>> >>> I suggest simply adding a constant flag argument to the existing function >>> that states if checking should be done or not. Inlining will let the C >>> compiler drop the corresponding code, which may or may nor make it a little >>> faster. >> >> static inline Py_UCS4 unicode_char2(PyObject* ustring, Py_ssize_t i, int flag) { >> Py_ssize_t length; >> #if CYTHON_PEP393_ENABLED >> if (PyUnicode_READY(ustring) < 0) return (Py_UCS4)-1; >> #endif >> if (flag) { >> length = __Pyx_PyUnicode_GET_LENGTH(ustring); >> if ((0 <= i) & (i < length)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } else if ((-length <= i) & (i < 0)) { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i + length); >> } else { >> PyErr_SetString(PyExc_IndexError, "string index out of range"); >> return (Py_UCS4)-1; >> } >> } else { >> return __Pyx_PyUnicode_READ_CHAR(ustring, i); >> } >> } > > I think you could even pass in two flags, one for wraparound and one for > boundscheck, and then just evaluate them appropriately in the existing "if" > tests above. That should allow both features to be supported independently > in a fast way. Done. https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94 Stefan From stefan_ml at behnel.de Sat Mar 2 12:15:50 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Mar 2013 12:15:50 +0100 Subject: [Cython] To Add datetime.pxd to cython.cpython In-Reply-To: References: Message-ID: <5131DF66.6030403@behnel.de> Hi, the last pull request looks good to me now. https://github.com/cython/cython/pull/189 Any more comments on it? Stefan From szport at gmail.com Sat Mar 2 18:55:46 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sat, 2 Mar 2013 20:55:46 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <5131DACD.6050402@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <5131DACD.6050402@behnel.de> Message-ID: 2013/3/2 Stefan Behnel : >> I think you could even pass in two flags, one for wraparound and one for >> boundscheck, and then just evaluate them appropriately in the existing "if" >> tests above. That should allow both features to be supported independently >> in a fast way. > > https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94 It seems to include the following directive at the beginning of the tests (which tests indices for lists, tuples and unicode): #cython: boundscheck=True #cython: wraparound=True as default mode for testing? -- ? ?????????, ???????? ?.?. From stefan_ml at behnel.de Sat Mar 2 19:47:01 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 02 Mar 2013 19:47:01 +0100 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <5131DACD.6050402@behnel.de> Message-ID: <51324925.50901@behnel.de> Zaur Shibzukhov, 02.03.2013 18:55: > 2013/3/2 Stefan Behnel: >>> I think you could even pass in two flags, one for wraparound and one for >>> boundscheck, and then just evaluate them appropriately in the existing "if" >>> tests above. That should allow both features to be supported independently >>> in a fast way. >> >> https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94 > > It seems to include the following directive at the beginning of the > tests (which tests indices for lists, tuples and unicode): > > #cython: boundscheck=True > #cython: wraparound=True > > as default mode for testing? Yes, although they would appear redundant here. Stefan From nikita at nemkin.ru Sun Mar 3 08:39:50 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Sun, 03 Mar 2013 13:39:50 +0600 Subject: [Cython] Py_UNICODE* string support Message-ID: Hi, Please review my feature proposal to add Py_UNICODE* string support for better Windows interoperability: https://github.com/cython/cython/pull/191 This is motivated by my current work that involves calling lots of Windows APIs. If people are interested I can elaborate on some important points, like the choice of base type (Py_UNICODE vs wchar_t) or the nature of Py_UNICODE* literals or why this feature is necessary at all. Best regards, Nikita Nemkin From robertwb at gmail.com Sun Mar 3 08:45:53 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 2 Mar 2013 23:45:53 -0800 Subject: [Cython] Two minor bugs In-Reply-To: References: Message-ID: On Fri, Mar 1, 2013 at 10:52 PM, Nikita Nemkin wrote: > Hi, > > I'm new to this list and to Cython internals. > > Reporting two recently found bugs: > > 1. Explicit cast fails unexpectedly: > > ctypedef char* LPSTR > cdef LPSTR c_str = b"ascii" > c_str # Failure: Python objects cannot be cast from pointers > of primitive types > > The problem is CTypedefType not delegating can_coerce_to_pyobject() to > the original type. > (because BaseType.can_coerce_to_pyobject takes precedence over > __getattr__). > Patch+test case and attached. Thanks! Applied. > Interestingly, implicit casts use a different code path and are not > affected. > > There is potential for similar bugs in the future, because __getattr__ > delegation is inherently brittle in the presence of the base class > (BaseType). Yes, very true. > 2. This recently added code does not compile with MSVC: > > https://github.com/cython/cython/blob/master/Cython/Utility/TypeConversion.c#L140-142 > Interleaving declarations and statements is not allowed in C90... Fixed https://github.com/cython/cython/commit/24f56e14194e14c706beb6d0ee58a58e77b0b03e - Robert From stefan_ml at behnel.de Sun Mar 3 08:52:49 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Mar 2013 08:52:49 +0100 Subject: [Cython] Py_UNICODE* string support In-Reply-To: References: Message-ID: <51330151.5080300@behnel.de> Nikita Nemkin, 03.03.2013 08:39: > Please review my feature proposal to add Py_UNICODE* string support > for better Windows interoperability: > https://github.com/cython/cython/pull/191 > > This is motivated by my current work that involves calling lots of Windows > APIs. > > If people are interested I can elaborate on some important points, like > the choice of base type (Py_UNICODE vs wchar_t) or the nature of > Py_UNICODE* literals or why this feature is necessary at all. Are you aware that Py_UNICODE is deprecated as of Py3.3? http://docs.python.org/3.4/c-api/unicode.html Your changes look a bit excessive for supporting something that's inefficient in recent Python versions and basically "dead". Stefan From nikita at nemkin.ru Sun Mar 3 09:25:33 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Sun, 03 Mar 2013 14:25:33 +0600 Subject: [Cython] Py_UNICODE* string support In-Reply-To: <51330151.5080300@behnel.de> References: <51330151.5080300@behnel.de> Message-ID: On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel wrote: > Are you aware that Py_UNICODE is deprecated as of Py3.3? > > http://docs.python.org/3.4/c-api/unicode.html > > Your changes look a bit excessive for supporting something that's > inefficient in recent Python versions and basically "dead". Yes, I'm well aware of Py3.3 changes, but consider this: 1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in the form of zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is _always_ aliased to wchar_t specifically for this reason). Whatever happens to Python internals, the need to interoperate with UTF-16 based platforms won't go away. 2. PY_UNICODE family of APIs remains the recommended way to interoperate with Windows. (So said the autor of PEP393 himself, I could find the relevant discussion in python-dev.) 3. It is not _that_ inefficient. Actually, it has the same efficiency as the UTF8-related APIs (which have to be used on UTF-8 platforms like most *nix systems). UTF8 allows sharing of ASCII buffer and has to convert USC2/UCS4, Py_UNICODE shares UCS2 buffer (assuming narrow build) and has to convert ASCII. One alternative to Py_UNICODE that I have rejected is using Python's wchar_t support. It's practicaly useless for these reasons: 1) wchar_t APIs do not exist in Py2 and have to be implemented for compatibility. 2) Implementing them brings in all the pain of nonportable wchar_t type (on *nix systems in general), whereas it's the primary users would target Windows, where (pretty horrible) wchar_t portability workarounds would be dead code. 3) wchar_t APIs do not offer a zero-copy option and do not manage the memory for us. The changes are some 50 lines of code, not counting the tests. I wouldn't call that excessive. And they mostly mirror existing code, no trickery of any kind. Inbuilt Py_UNICODE* support also means that the users would be shielded from 3.3 changes and Cython is free to optimize sting handling in the future. Believe me, nobody calls Py_UNICODE APIs because they want to, they just have to. Best regards, Nikita Nemkin From stefan_ml at behnel.de Sun Mar 3 10:32:36 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Mar 2013 10:32:36 +0100 Subject: [Cython] Py_UNICODE* string support In-Reply-To: References: <51330151.5080300@behnel.de> Message-ID: <513318B4.3080803@behnel.de> Nikita Nemkin, 03.03.2013 09:25: > On Sun, 03 Mar 2013 13:52:49 +0600, Stefan Behnel wrote: >> Are you aware that Py_UNICODE is deprecated as of Py3.3? >> >> http://docs.python.org/3.4/c-api/unicode.html >> >> Your changes look a bit excessive for supporting something that's >> inefficient in recent Python versions and basically "dead". > > Yes, I'm well aware of Py3.3 changes, but consider this: > > 1. _All_ system APIs on Windows, old, new and in-between, use UTF-16 in the > form of zero-terminated 2-byte wchar_t* strings (on Windows Py_UNICODE is > _always_ aliased to wchar_t specifically for this reason). > Whatever happens to Python internals, the need to interoperate with > UTF-16 based platforms won't go away. Ok, fine with me. Your changes look fairly reasonable, especially for a first try. I have the following comments. 1) I would like to get rid of UnicodeConst. A Py_UNICODE* is not different from any other C array, except that it can coerce to and from Unicode strings. So the representation of a literal should be a (properly reference counted) Python Unicode object, and users would be allowed to cast them to , just as we support it for and bytes. 2) non-BMP literals should be supported by representing them as normal Unicode strings and creating the Py_UNICODE representation at need (i.e. explicitly through a cast, at runtime). Py_UNICODE[] literals are simply not portable. 3) __Pyx_Py_UNICODE_strlen() is ok, but only for the special case that all we have is a Py_UNICODE*. As long as we are dealing with Unicode string objects, that won't be needed, so len() should be constant time in the normal case instead of linear time. 4) most of the changes in PyrexTypes.py and ExprNodes.py look ok. I would eventually like to see a couple of refactorings on these sections (because the special cases add up over time), but that's not required for this change. So, the basic idea would be to use Unicode strings and their (optional) internal representation as Py_UNICODE[] instead of making Py_UNICODE[] a first class data type. And then go from there and optimise certain things to use the unpacked array directly, so that users won't need to put explicit C-API calls into their code. Stefan From szport at gmail.com Sun Mar 3 10:49:11 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sun, 3 Mar 2013 12:49:11 +0300 Subject: [Cython] About IndexNode and unicode[index] In-Reply-To: <5131DACD.6050402@behnel.de> References: <512FAF8C.7020008@behnel.de> <512FC919.4010702@behnel.de> <5131DACD.6050402@behnel.de> Message-ID: 2013/3/2 Stefan Behnel : > Stefan Behnel, 28.02.2013 22:16: > > https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94 > > Stefan > I tried to build with that change. Tests `unicode_indexing` and `index` are passed. From nikita at nemkin.ru Sun Mar 3 14:40:54 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Sun, 03 Mar 2013 19:40:54 +0600 Subject: [Cython] Py_UNICODE* string support In-Reply-To: <513318B4.3080803@behnel.de> References: <51330151.5080300@behnel.de> <513318B4.3080803@behnel.de> Message-ID: On Sun, 03 Mar 2013 15:32:36 +0600, Stefan Behnel wrote: > 1) I would like to get rid of UnicodeConst. A Py_UNICODE* is not > different > from any other C array, except that it can coerce to and from Unicode > strings. So the representation of a literal should be a (properly > reference > counted) Python Unicode object, and users would be allowed to cast them > to , just as we support it for and bytes. I understand the idea. Since Python unicode literals are implicitly coercible to Py_UNICODE*, there appears to be no need for C-level Py_UNICODE[] literals. Indeed, client code will look exactly (!) the same whether they are supported or not. Except when it comes to nogil. (For example, native callbacks are almost guaranteed to be nogil.) Hiding Python operations in what appears to be pure C-level code will break users' assumptions. This is #1 reason why I went for C-level literals. #2 reason is efficiency on Py3.3. C-level literals don't need conversions and don't call any conversion APIs. > 2) non-BMP literals should be supported by representing them as normal > Unicode strings and creating the Py_UNICODE representation at need (i.e. > explicitly through a cast, at runtime). Py_UNICODE[] literals are simply > not portable. Py_UNICODE[] literals can be made fully portable if non-BMP ones are wrapped like this: #ifdef Py_UNICODE_WIDE static const k_xxx[] = { , 0 }; #else static const k_xxx[] = { , 0 }; #endif Literals containing only BMP chars are already portable and don't need this wrapping. > 3) __Pyx_Py_UNICODE_strlen() is ok, but only for the special case that > all we have is a Py_UNICODE*. As long as we are dealing with Unicode > string > objects, that won't be needed, so len() should be constant time in the > normal case instead of linear time. len(Py_UNICODE*) simply mirrors len(char*). Its putpose is to provide platform-independent Py_UNICODE_strlen (which is Py3 only and deprecated in 3.3). > So, the basic idea would be to use Unicode strings and their (optional) > internal representation as Py_UNICODE[] instead of making Py_UNICODE[] a > first class data type. And then go from there and optimise certain things > to use the unpacked array directly, so that users won't need to put > explicit C-API calls into their code. Please reconsider your decision wrt C-level literals. I believe that nogil code and a bit of efficiency (on 3.3) justify their existence. (char* literals do have C-level literals, Py_UNICODE* is in the same basket when it comes to Windows code). The code to support them is also small and well-contained. I've updated my pull request to fully support for non-BMP Py_UNICODE[] literals. If you are still not convinced, so be it, I'll drop C-level literal support. Best regards, Nikita Nemkin PS. I made a false claim in the previous mail. (Some of) Python's wchar_t APIs do exist in Py2. But they won't manage the memory automatically anyway. From szport at gmail.com Sun Mar 3 15:52:10 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sun, 3 Mar 2013 17:52:10 +0300 Subject: [Cython] To Add datetime.pxd to cython.cpython In-Reply-To: <5131DF66.6030403@behnel.de> References: <5131DF66.6030403@behnel.de> Message-ID: 2013/3/2 Stefan Behnel : > Hi, > > the last pull request looks good to me now. > > https://github.com/cython/cython/pull/189 > > Any more comments on it? > > Stefan > As was suggested earlier, I added `import_datetime` inline function to initialize PyDateTime C API instead of direct usage of "non-native" C macros from datetime.h. Now you call `import_array ()` first in the same way as is done with `numpy`. This approach looks natural in the light of experience with numpy. Zaur Shibzukhov From szport at gmail.com Sun Mar 3 20:11:42 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sun, 3 Mar 2013 22:11:42 +0300 Subject: [Cython] To Add datetime.pxd to cython.cpython In-Reply-To: References: <5131DF66.6030403@behnel.de> Message-ID: 2013/3/3 Zaur Shibzukhov : > 2013/3/2 Stefan Behnel : >> Hi, >> >> the last pull request looks good to me now. >> >> https://github.com/cython/cython/pull/189 >> >> Any more comments on it? > > As was suggested earlier, I added `import_datetime` inline function to > initialize PyDateTime C API instead of direct usage of "non-native" C > macros from datetime.h. > Now you call `import_array ()` first in the same way as is done with `numpy`. > This approach looks natural in the light of experience with numpy. > I make some performance comparisons. Here example for dates. # test_date.pyx -------------------- Here test code: from cpython.datetime cimport import_datetime, date_new, date import_datetime() from datetime import date as pydate def test_date1(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = pydate(year, month, day) lst.append(d) return lst def test_date2(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date(year, month, day) lst.append(d) return lst def test_date3(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date_new(year, month, day) lst.append(d) return lst def test1(): l = test_date1() return l def test2(): l = test_date2() return l def test3(): l = test_date3() return l Here are timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test1" "test1()" 50 loops, best of 5: 83.2 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test2" "test2()" 50 loops, best of 5: 74.7 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test3" "test3()" 50 loops, best of 5: 20.9 msec per loop OSX 10.6.8 64 bit python 3.2 Shibzukhov Zaur From szport at gmail.com Sun Mar 3 20:16:43 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sun, 3 Mar 2013 22:16:43 +0300 Subject: [Cython] To Add datetime.pxd to cython.cpython In-Reply-To: References: <5131DF66.6030403@behnel.de> Message-ID: 2013/3/3 Zaur Shibzukhov : > 2013/3/3 Zaur Shibzukhov : >> 2013/3/2 Stefan Behnel : >>> Hi, >>> >>> the last pull request looks good to me now. >>> >>> https://github.com/cython/cython/pull/189 >>> >>> Any more comments on it? >> >> As was suggested earlier, I added `import_datetime` inline function to >> initialize PyDateTime C API instead of direct usage of "non-native" C >> macros from datetime.h. >> Now you call `import_array ()` first in the same way as is done with `numpy`. >> This approach looks natural in the light of experience with numpy. >> > I make some performance comparisons. Here example for dates. > > # test_date.pyx > -------------------- > > Here test code: > > from cpython.datetime cimport import_datetime, date_new, date > > import_datetime() > > from datetime import date as pydate > > def test_date1(): > cdef list lst = [] > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = pydate(year, month, day) > lst.append(d) > return lst > > > def test_date2(): > cdef list lst = [] > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date(year, month, day) > lst.append(d) > return lst > > def test_date3(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date_new(year, month, day) > lst.append(d) > return lst > > def test1(): > l = test_date1() > return l > > def test2(): > l = test_date2() > return l > > def test3(): > l = test_date3() > return l > > Here are timings: > > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test1" "test1()" > 50 loops, best of 5: 83.2 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test2" "test2()" > 50 loops, best of 5: 74.7 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test3" "test3()" > 50 loops, best of 5: 20.9 msec per loop > > OSX 10.6.8 64 bit python 3.2 > More acurate test... # coding: utf-8 from cpython.datetime cimport import_datetime, date_new, date import_datetime() from datetime import date as pydate def test_date1(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = pydate(year, month, day) lst.append(d) return lst def test_date2(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date(year, month, day) lst.append(d) return lst def test_date3(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date_new(year, month, day) lst.append(d) return lst def test1(): l = test_date1() return l def test2(): l = test_date2() return l def test3(): l = test_date3() return l Timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test1" "test1()" 50 loops, best of 5: 83.3 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test2" "test2()" 50 loops, best of 5: 74.6 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_date import test3" "test3()" 50 loops, best of 5: 20.8 msec per loop Shibzukhov Zaur From stefan_ml at behnel.de Sun Mar 3 20:41:04 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Mar 2013 20:41:04 +0100 Subject: [Cython] Py_UNICODE* string support In-Reply-To: References: <51330151.5080300@behnel.de> <513318B4.3080803@behnel.de> Message-ID: <5133A750.5000203@behnel.de> Nikita Nemkin, 03.03.2013 14:40: > Please reconsider your decision wrt C-level literals. > I believe that nogil code and a bit of efficiency (on 3.3) justify their > existence. (char* literals do have C-level literals, Py_UNICODE* is in > the same basket when it comes to Windows code). > The code to support them is also small and well-contained. > I've updated my pull request to fully support for non-BMP Py_UNICODE[] > literals. Ok, I think it's ok now. I can accept the special casing of Py_UNICODE literals, it actually adds a value. As one little nit-pick, may I ask you to rename the new name references to "unicode" into "py_unicode" in your code? For example, "is_unicode", "get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is no longer the native equivalent of Python's unicode type in Py3.3, I'd like to avoid confusion in the code. The name "unicode" is much more likely to refer to the builtin Python type than to a native C type when it appears in Cython's sources. Stefan From stefan_ml at behnel.de Sun Mar 3 20:56:59 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 03 Mar 2013 20:56:59 +0100 Subject: [Cython] Py_UNICODE* string support In-Reply-To: <5133A750.5000203@behnel.de> References: <51330151.5080300@behnel.de> <513318B4.3080803@behnel.de> <5133A750.5000203@behnel.de> Message-ID: <5133AB0B.4060208@behnel.de> Stefan Behnel, 03.03.2013 20:41: > Nikita Nemkin, 03.03.2013 14:40: >> Please reconsider your decision wrt C-level literals. >> I believe that nogil code and a bit of efficiency (on 3.3) justify their >> existence. (char* literals do have C-level literals, Py_UNICODE* is in >> the same basket when it comes to Windows code). >> The code to support them is also small and well-contained. >> I've updated my pull request to fully support for non-BMP Py_UNICODE[] >> literals. > > Ok, I think it's ok now. I can accept the special casing of Py_UNICODE > literals, it actually adds a value. > > As one little nit-pick, may I ask you to rename the new name references to > "unicode" into "py_unicode" in your code? For example, "is_unicode", > "get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is > no longer the native equivalent of Python's unicode type in Py3.3, I'd like > to avoid confusion in the code. The name "unicode" is much more likely to > refer to the builtin Python type than to a native C type when it appears in > Cython's sources. Oh, and yet another thing: could you write up some documentation for this in docs/src/tutorial/strings.rst ? Basically a Windows/wchar_t related section, that also warns about the inefficiency in Py3.3, so that users don't accidentally assume it's efficient for anything that needs to be portable. Stefan From szport at gmail.com Mon Mar 4 07:24:30 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Mon, 4 Mar 2013 09:24:30 +0300 Subject: [Cython] To Add datetime.pxd to cython.cpython In-Reply-To: References: <5131DF66.6030403@behnel.de> Message-ID: 2013/3/3 Zaur Shibzukhov : > 2013/3/3 Zaur Shibzukhov : >> 2013/3/3 Zaur Shibzukhov : >>> 2013/3/2 Stefan Behnel : >>>> Hi, >>>> >>>> the last pull request looks good to me now. >>>> >>>> https://github.com/cython/cython/pull/189 >>>> >>>> Any more comments on it? >>> >>> As was suggested earlier, I added `import_datetime` inline function to >>> initialize PyDateTime C API instead of direct usage of "non-native" C >>> macros from datetime.h. >>> Now you call `import_array ()` first in the same way as is done with `numpy`. >>> This approach looks natural in the light of experience with numpy. >>> >> I make some performance comparisons. Here example for dates. >> >> # test_date.pyx >> -------------------- >> >> Here test code: >> >> from cpython.datetime cimport import_datetime, date_new, date >> >> import_datetime() >> >> from datetime import date as pydate >> >> def test_date1(): >> cdef list lst = [] >> for year in range(1000, 2001): >> for month in range(1,13): >> for day in range(1, 20): >> d = pydate(year, month, day) >> lst.append(d) >> return lst >> >> >> def test_date2(): >> cdef list lst = [] >> for year in range(1000, 2001): >> for month in range(1,13): >> for day in range(1, 20): >> d = date(year, month, day) >> lst.append(d) >> return lst >> >> def test_date3(): >> cdef list lst = [] >> cdef int year, month, day >> for year in range(1000, 2001): >> for month in range(1,13): >> for day in range(1, 20): >> d = date_new(year, month, day) >> lst.append(d) >> return lst >> >> def test1(): >> l = test_date1() >> return l >> >> def test2(): >> l = test_date2() >> return l >> >> def test3(): >> l = test_date3() >> return l >> >> Here are timings: >> >> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from >> mytests.test_date import test1" "test1()" >> 50 loops, best of 5: 83.2 msec per loop >> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from >> mytests.test_date import test2" "test2()" >> 50 loops, best of 5: 74.7 msec per loop >> (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from >> mytests.test_date import test3" "test3()" >> 50 loops, best of 5: 20.9 msec per loop >> >> OSX 10.6.8 64 bit python 3.2 >> > > More acurate test... > > # coding: utf-8 > > from cpython.datetime cimport import_datetime, date_new, date > > import_datetime() > > from datetime import date as pydate > > def test_date1(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = pydate(year, month, day) > lst.append(d) > return lst > > > def test_date2(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date(year, month, day) > lst.append(d) > return lst > > def test_date3(): > cdef list lst = [] > cdef int year, month, day > for year in range(1000, 2001): > for month in range(1,13): > for day in range(1, 20): > d = date_new(year, month, day) > lst.append(d) > return lst > > def test1(): > l = test_date1() > return l > > def test2(): > l = test_date2() > return l > > def test3(): > l = test_date3() > return l > > Timings: > > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test1" "test1()" > 50 loops, best of 5: 83.3 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test2" "test2()" > 50 loops, best of 5: 74.6 msec per loop > (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from > mytests.test_date import test3" "test3()" > 50 loops, best of 5: 20.8 msec per loop Yet another performance comparison for `time`: # coding: utf-8 from cpython.datetime cimport import_datetime, time_new, time import_datetime() from datetime import time as pytime def test_time1(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0, 100000, 50000): d = pytime(hour, minute, second, microsecond) lst.append(d) return lst def test_time2(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0, 100000, 50000): d = time(hour, minute, second, microsecond) lst.append(d) return lst def test_time3(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0, 100000, 50000): d = time_new(hour, minute, second, microsecond, None) lst.append(d) return lst def test1(): l = test_time1() return l def test2(): l = test_time2() return l def test3(): l = test_time3() return l Timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_time import test1" "test1()" 50 loops, best of 5: 72.2 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_time import test2" "test2()" 50 loops, best of 5: 64.7 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s "from mytests.test_time import test3" "test3()" 50 loops, best of 5: 13 msec per loop Sure the same kind of results might expect for `datetime` too. Shibzukhov Zaur From sturla at molden.no Mon Mar 4 11:32:02 2013 From: sturla at molden.no (Sturla Molden) Date: Mon, 4 Mar 2013 11:32:02 +0100 Subject: [Cython] PR on refcounting memoryview buffers In-Reply-To: <15C80BD0-302E-4576-ACF3-C0FFD700569B@molden.no> References: <512273C8.4000005@molden.no> <15C80BD0-302E-4576-ACF3-C0FFD700569B@molden.no> Message-ID: Den 20. feb. 2013 kl. 11:55 skrev Sturla Molden : > > Den 18. feb. 2013 kl. 19:32 skrev Sturla Molden : > >> The problem this addresses is when GCC does not use atomic builtins and emits __synch_fetch_and_add_4 and __synch_fetch_and_sub_4 when Cython are internally refcounting memoryview buffers. For some reason it can even happen on x86 and amd64. > > Specifically, atomic builtins are not used when compiling for i386, which is MinGWs default target architecture (unless we specify a different -march). GCC will always encounter this problem when targeting i386. > > Thus the correct fix is to use fallback when GCC is targeting i386 ? not when GCC is targeting MS Windows. > > So I am closing this PR. But Mark's fix must be corrected, because it does not really address the problem (which is i386, not MinGW)! > Please consider this pull-request: https://github.com/cython/cython/pull/190 Sturla Molden -------------- next part -------------- An HTML attachment was scrubbed... URL: From nikita at nemkin.ru Mon Mar 4 18:39:36 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Mon, 04 Mar 2013 23:39:36 +0600 Subject: [Cython] Py_UNICODE* string support In-Reply-To: <5133AB0B.4060208@behnel.de> References: <51330151.5080300@behnel.de> <513318B4.3080803@behnel.de> <5133A750.5000203@behnel.de> <5133AB0B.4060208@behnel.de> Message-ID: On Mon, 04 Mar 2013 01:56:59 +0600, Stefan Behnel wrote: > As one little nit-pick, may I ask you to rename the new name references > to "unicode" into "py_unicode" in your code? For example, "is_unicode", > "get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is > no longer the native equivalent of Python's unicode type in Py3.3, I'd > like to avoid confusion in the code. The name "unicode" is much more > likely to > refer to the builtin Python type than to a native C type when it appears > in Cython's sources. Actually, "py_unicode" is even more likely to be mistaken for Python-level unicode. There are already pairs of methods like get_string_const (C-level) + get_py_string_const (Py-level). I suggest one of "py_unicode_ptr", "py_unicode_str", "wstring", "wide_string", "ustring", "unicode_string" to unambiguously refer to Py_UNICODE* variables and constants. Take yout pick. > Oh, and yet another thing: could you write up some documentation for this > in docs/src/tutorial/strings.rst ? Basically a Windows/wchar_t related > section, that also warns about the inefficiency in Py3.3, so that users > don't accidentally assume it's efficient for anything that needs to be > portable. Sure, I'm writing the docs now. Best regards, Nikita Nemkin From stefan_ml at behnel.de Mon Mar 4 18:58:34 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 04 Mar 2013 18:58:34 +0100 Subject: [Cython] Py_UNICODE* string support In-Reply-To: References: <51330151.5080300@behnel.de> <513318B4.3080803@behnel.de> <5133A750.5000203@behnel.de> <5133AB0B.4060208@behnel.de> Message-ID: <5134E0CA.2060106@behnel.de> Nikita Nemkin, 04.03.2013 18:39: > On Mon, 04 Mar 2013 01:56:59 +0600, Stefan Behnel wrote: >> As one little nit-pick, may I ask you to rename the new name references >> to "unicode" into "py_unicode" in your code? For example, "is_unicode", >> "get_unicode_const", "unicode_const_index", etc. Given that Py_UNICODE is >> no longer the native equivalent of Python's unicode type in Py3.3, I'd >> like to avoid confusion in the code. The name "unicode" is much more >> likely to >> refer to the builtin Python type than to a native C type when it appears >> in Cython's sources. > > Actually, "py_unicode" is even more likely to be mistaken for Python-level > unicode. There are already pairs of methods like > get_string_const (C-level) + get_py_string_const (Py-level). Agreed. > I suggest one of "py_unicode_ptr", "py_unicode_str", "wstring", "wide_string", > "ustring", "unicode_string" to unambiguously refer to Py_UNICODE* variables > and constants. Take yout pick. I think "pyunicode_ptr" or even just "pyunicode" makes it quite clear what it's about and specifically that "pyunicode" is actually a type name, not a "py_something". Even "pyunicode_array" would work, although it might suggest that we know more at compile time than we do, such as the length. I'll let you choose between these three, although I'm leaning slightly towards an order of preference as they appear above. >> Oh, and yet another thing: could you write up some documentation for this >> in docs/src/tutorial/strings.rst ? Basically a Windows/wchar_t related >> section, that also warns about the inefficiency in Py3.3, so that users >> don't accidentally assume it's efficient for anything that needs to be >> portable. > > Sure, I'm writing the docs now. Nice. Stefan From szport at gmail.com Tue Mar 5 07:21:21 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Tue, 5 Mar 2013 09:21:21 +0300 Subject: [Cython] nonecheck and as_none_safe_node method Message-ID: In ExprNodes.py there are several places where method `as_none_safe_node` was applied in order to wrap a node by NoneCheckNode. I think it would be resonable to apply that mostly only in cases when noncheck=True. Here are possible changes in ExprNodes.py: https://github.com/intellimath/cython/commit/bd041680b78067007ad6b9894a2f2c18514e397c Zaur Shibzukhov -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Tue Mar 5 07:24:42 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Tue, 5 Mar 2013 09:24:42 +0300 Subject: [Cython] nonecheck and as_none_safe_node method In-Reply-To: References: Message-ID: 2013/3/5 Zaur Shibzukhov > In ExprNodes.py there are several places where method `as_none_safe_node` > was applied in order to wrap a node by NoneCheckNode. > I think it would be resonable to apply that mostly only in cases when > noncheck=True. > > Here are possible changes in ExprNodes.py: > > https://github.com/intellimath/cython/commit/bd041680b78067007ad6b9894a2f2c18514e397c > > This change would prevent generation of None checking of an objects (lists, tuples, unicode) when nonecheck=True. Any adeas? Zaur Shibzukhov -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Tue Mar 5 07:26:52 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Tue, 5 Mar 2013 09:26:52 +0300 Subject: [Cython] nonecheck and as_none_safe_node method In-Reply-To: References: Message-ID: 2013/3/5 Zaur Shibzukhov > 2013/3/5 Zaur Shibzukhov > >> In ExprNodes.py there are several places where method `as_none_safe_node` >> was applied in order to wrap a node by NoneCheckNode. >> I think it would be resonable to apply that mostly only in cases when >> noncheck=True. >> >> Here are possible changes in ExprNodes.py: >> >> https://github.com/intellimath/cython/commit/bd041680b78067007ad6b9894a2f2c18514e397c >> >> This change would prevent generation of None checking of an objects > (lists, tuples, unicode) when nonecheck=True. > Sorry... when nonecheck=False > Any adeas? > > Zaur Shibzukhov -- ? ?????????, ???????? ?.?. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Tue Mar 5 08:12:16 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 05 Mar 2013 08:12:16 +0100 Subject: [Cython] nonecheck and as_none_safe_node method In-Reply-To: References: Message-ID: <51359AD0.50209@behnel.de> Zaur Shibzukhov, 05.03.2013 07:21: > In ExprNodes.py there are several places where method `as_none_safe_node` > was applied in order to wrap a node by NoneCheckNode. > I think it would be resonable to apply that mostly only in cases when > noncheck=True. > > Here are possible changes in ExprNodes.py: > https://github.com/intellimath/cython/commit/bd041680b78067007ad6b9894a2f2c18514e397c I consider the nonecheck option a quirk. In many, many cases, it's not obvious to a user to what constructs it applies. For example, we use it to guard against crashes when we optimise code, e.g. by inlining parts of a C-API function, when iterating over builtins, etc. In most of these cases, it depends on more than one parameter if the optimised code will be applied (and thus no None check) or the fallback, which usually does its own complete set of safety checks. So it's one of those options that may work safely in all unit tests and then crash in production. Remember, most cases where we leave a None check in the code are not those where it's obvious that a variable cannot be None because it was just assigned a non-None value. Most cases are about function arguments, i.e. values that come from outside of the current function, and thus are not "obviously" correct even for the human reader or author of the code. Also, I'm yet to see a case where a None check really makes a difference in performance. Often enough, the C compiler will be able to move them out of loops or drop them completely because it already saw a None check against the same local variable earlier on. In those cases, it's just Cython not being smart enough to drop them itself, but without an impact on runtime performance. And even if the C compiler is not smart enough either, at least the branch prediction of the processor will strike in the relevant cases (i.e. inside of loops) and reduce the overhead to "pretty much zero". All of this makes em think that we should be very careful when we consider this option for the generated code. Stefan From szport at gmail.com Tue Mar 5 08:30:26 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Tue, 5 Mar 2013 10:30:26 +0300 Subject: [Cython] nonecheck and as_none_safe_node method In-Reply-To: <51359AD0.50209@behnel.de> References: <51359AD0.50209@behnel.de> Message-ID: 2013/3/5 Stefan Behnel > Zaur Shibzukhov, 05.03.2013 07:21: > > In ExprNodes.py there are several places where method `as_none_safe_node` > > was applied in order to wrap a node by NoneCheckNode. > > I think it would be resonable to apply that mostly only in cases when > > noncheck=True. > > > > Here are possible changes in ExprNodes.py: > > > https://github.com/intellimath/cython/commit/bd041680b78067007ad6b9894a2f2c18514e397c > > I consider the nonecheck option a quirk. In many, many cases, it's not > obvious to a user to what constructs it applies. For example, we use it to > guard against crashes when we optimise code, e.g. by inlining parts of a > C-API function, when iterating over builtins, etc. In most of these cases, > it depends on more than one parameter if the optimised code will be applied > (and thus no None check) or the fallback, which usually does its own > complete set of safety checks. So it's one of those options that may work > safely in all unit tests and then crash in production. > > Remember, most cases where we leave a None check in the code are not those > where it's obvious that a variable cannot be None because it was just > assigned a non-None value. Most cases are about function arguments, i.e. > values that come from outside of the current function, and thus are not > "obviously" correct even for the human reader or author of the code. > > Also, I'm yet to see a case where a None check really makes a difference in > performance. Often enough, the C compiler will be able to move them out of > loops or drop them completely because it already saw a None check against > the same local variable earlier on. In those cases, it's just Cython not > being smart enough to drop them itself, but without an impact on runtime > performance. And even if the C compiler is not smart enough either, at > least the branch prediction of the processor will strike in the relevant > cases (i.e. inside of loops) and reduce the overhead to "pretty much zero". > > All of this makes em think that we should be very careful when we consider > this option for the generated code. > > I agree that directive nonecheck=False is dangarous in general. This change mainly affect builtin objects (lists, tuples, dicts, unicode) and some situation in function/method calls. And it assume that when you apply this directive you know what you are doing and why. Note that Cython now already set nonecheck=False (but boundcheck and wraparound sets to True in Options.py) by default. But now it not affect builtin types and some special cases. May be the safer strategy is to set nonecheck=True by default and allow locally to apply nonecheck(False) when developer believes that it is necessary? -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Tue Mar 5 11:24:33 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Tue, 5 Mar 2013 13:24:33 +0300 Subject: [Cython] nonecheck and as_none_safe_node method In-Reply-To: References: <51359AD0.50209@behnel.de> Message-ID: 2013/3/5 Zaur Shibzukhov > > May be the safer strategy is to set nonecheck=True by default and allow > locally to apply nonecheck(False) when developer believes that it is > necessary? > Strategy of making nonecheck=True by default and setting explicitly nonecheck=False when it's necessary is more manageable IMHO. In Cython sources one can explicitly (where it's necessary) ignore this default setting, or set it explicitly to False in concrete context/environment. Zaur Shibzukhov -------------- next part -------------- An HTML attachment was scrubbed... URL: From yury at shurup.com Thu Mar 7 12:16:10 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Thu, 07 Mar 2013 12:16:10 +0100 Subject: [Cython] Cython syntax to pre-allocate lists for performance Message-ID: <1362654970.2849.9.camel@newpride> Hi, Is there any syntax that I can use to do something like this in Cython: py_object_ = PyList_New(123); ? If not, do you think that this can be added in one way or another? Unfortunately, I can't think of a non-disruptive way of doing it. For instance, if this [None] * N is given a completely new meaning, like make an empty list (of NULLs), instead of making a real list of Nones, it will certainly break Python code. Besides, it would probably be still faster than no pre-allocation, but slower than an empty list with pre-allocation... Maybe [NULL] * N ? Any ideas? -- Sincerely yours, Yury V. Zaytsev From stefan_ml at behnel.de Thu Mar 7 12:21:39 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Mar 2013 12:21:39 +0100 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <1362654970.2849.9.camel@newpride> References: <1362654970.2849.9.camel@newpride> Message-ID: <51387843.602@behnel.de> Yury V. Zaytsev, 07.03.2013 12:16: > Is there any syntax that I can use to do something like this in Cython: > > py_object_ = PyList_New(123); ? Note that Python has an algorithm for shrinking a list on appending, so this might not be sufficient for your use case. > If not, do you think that this can be added in one way or another? > > Unfortunately, I can't think of a non-disruptive way of doing it. For > instance, if this > > [None] * N > > is given a completely new meaning, like make an empty list (of NULLs), > instead of making a real list of Nones, it will certainly break Python > code. Besides, it would probably be still faster than no pre-allocation, > but slower than an empty list with pre-allocation... > > Maybe > > [NULL] * N ? What do you need it for? Won't list comprehensions work for you? They could potentially be adapted to presize the list. And why won't [None]*N help you out? It should be pretty cheap. Stefan From nikita at nemkin.ru Thu Mar 7 12:59:22 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Thu, 07 Mar 2013 17:59:22 +0600 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <51387843.602@behnel.de> References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> Message-ID: On Thu, 07 Mar 2013 17:16:10 +0600, Yury V. Zaytsev wrote: > Hi, > > Is there any syntax that I can use to do something like this in Cython: > > py_object_ = PyList_New(123); ? > > If not, do you think that this can be added in one way or another? > > Unfortunately, I can't think of a non-disruptive way of doing it. For > instance, if this > > [None] * N > > is given a completely new meaning, like make an empty list (of NULLs), > instead of making a real list of Nones, it will certainly break Python > code. Besides, it would probably be still faster than no pre-allocation, > but slower than an empty list with pre-allocation... > > Maybe > > [NULL] * N ? > > Any ideas? > I really like the "[NULL] * N" thing. Efficient empty list allocation and filling is something I stumble upon quite often, especially in binding code. I doubt Cython will be able to automatically use PyList_SET_ITEM for assignment to such NULL list (it would require induction variable analysis), but eliminating one extra pass over the list is already helpful. Implementation note (if this gets implemented): Cython's optimized list assignment routine uses Py_DECREF, this will have to be changed to Py_XDECREF, otherwise NULL list items won't be directly assignable from Cython. (PyList_SetItem always uses Py_XDECREF on the old element). > What do you need it for? > > Won't list comprehensions work for you? They could potentially be adapted > to presize the list. List comprehensions do not preallocate the list. If they did, the need for the above would be somewhat diminished. > And why won't [None]*N help you out? It should be pretty cheap. [None] * N makes From nikita at nemkin.ru Thu Mar 7 13:02:17 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Thu, 07 Mar 2013 18:02:17 +0600 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <51387843.602@behnel.de> References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> Message-ID: Sorry, accidental early send. Previous mail continued... [None] * N makes an extra pass over the list to assign None to each item (and also incref None n times). This is useless extra work. The larget the list, the worse it gets. Best regards, Nikita Nemkin From szport at gmail.com Thu Mar 7 14:39:33 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Thu, 7 Mar 2013 16:39:33 +0300 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <51387843.602@behnel.de> References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> Message-ID: 2013/3/7 Stefan Behnel > Yury V. Zaytsev, 07.03.2013 12:16: > > Is there any syntax that I can use to do something like this in Cython: > > > > py_object_ = PyList_New(123); ? > > Note that Python has an algorithm for shrinking a list on appending, so > this might not be sufficient for your use case. > > > > If not, do you think that this can be added in one way or another? > > > > Unfortunately, I can't think of a non-disruptive way of doing it. For > > instance, if this > > > > [None] * N > > > > is given a completely new meaning, like make an empty list (of NULLs), > > instead of making a real list of Nones, it will certainly break Python > > code. Besides, it would probably be still faster than no pre-allocation, > > but slower than an empty list with pre-allocation... > > > > Maybe > > > > [NULL] * N ? > > What do you need it for? > > Won't list comprehensions work for you? They could potentially be adapted > to presize the list. > > I guess the problem is to construct new (even empty) list with pre-allocated memory exactly for N elements. N*[NULL] - changes semantics because there can't be list with N elements and filled by NULL. N*[None] - more expansive for further assignments because of Py_DECREFs. I suppose that N*[] could do the trick. It could be optimized so that N*[] is equal to an empty list but with preallocated memory exactly for N elements. Could it be? Zaur Shibzukhov -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Thu Mar 7 15:39:32 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Thu, 7 Mar 2013 17:39:32 +0300 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> Message-ID: 2013/3/7 Zaur Shibzukhov > I guess the problem is to construct new (even empty) list with > pre-allocated memory exactly for N elements. > > N*[NULL] - changes semantics because there can't be list with N elements > and filled by NULL. > N*[None] - more expansive for further assignments because of Py_DECREFs. > > I suppose that N*[] could do the trick. It could be optimized so that N*[] > is equal to an empty list but with preallocated memory exactly for N > elements. Could it be? > > Cython optimize already PyList_Append very well. Theofore scenario when first one create empty list with exactly prealocated memory for N elements and second eval elements of the list and add them using plain list.append could optimized in Cython very well too. As result constructed list will contain memory only for N elements. This allows don't vast memory when one need to build many many lists with relative big size. Zaur Shibzukhov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Thu Mar 7 15:48:41 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 07 Mar 2013 15:48:41 +0100 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> Message-ID: <5138A8C9.3050605@behnel.de> Zaur Shibzukhov, 07.03.2013 15:39: > 2013/3/7 Zaur Shibzukhov > >> I guess the problem is to construct new (even empty) list with >> pre-allocated memory exactly for N elements. >> >> N*[NULL] - changes semantics because there can't be list with N elements >> and filled by NULL. >> N*[None] - more expansive for further assignments because of Py_DECREFs. >> >> I suppose that N*[] could do the trick. That looks wrong to me. >> It could be optimized so that N*[] >> is equal to an empty list but with preallocated memory exactly for N >> elements. Could it be? > > Cython optimize already PyList_Append very well. Theofore scenario when > first one create empty list with exactly prealocated memory for N elements > and second eval elements of the list and add them using plain list.append > could optimized in Cython very well too. As result constructed list will > contain memory only for N elements. This allows don't vast memory when one > need to build many many lists with relative big size. I prefer not adding any new syntax as long as we are not sure we can't fix it by making list comprehensions smarter. I tried this a while ago and have some initial code lying around somewhere in my patch queue. Didn't have the time to make it any usable, though, also because Cython didn't have its own append() for list comprehensions at the time. It does now, as you noted. Stefan From szport at gmail.com Thu Mar 7 18:15:45 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Thu, 7 Mar 2013 20:15:45 +0300 Subject: [Cython] Add support for list/tuple slicing Message-ID: Hello! Current Cython generate for slicing of list/tuple general PySequence_GetSlice/SetSlice call. We could replace that to native call for Py{List|Tuple}_GetSlice and PyList_SetSlice for lists/tuples. Here are the changes: https://github.com/intellimath/cython/commit/27525a5dc9f6eba31b330a6ec04e7a105191d9f5 Zaur Shibzukhov -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at gmail.com Thu Mar 7 19:07:53 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 7 Mar 2013 10:07:53 -0800 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <5138A8C9.3050605@behnel.de> References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> <5138A8C9.3050605@behnel.de> Message-ID: On Thu, Mar 7, 2013 at 6:48 AM, Stefan Behnel wrote: > Zaur Shibzukhov, 07.03.2013 15:39: >> 2013/3/7 Zaur Shibzukhov >> >>> I guess the problem is to construct new (even empty) list with >>> pre-allocated memory exactly for N elements. >>> >>> N*[NULL] - changes semantics because there can't be list with N elements >>> and filled by NULL. >>> N*[None] - more expansive for further assignments because of Py_DECREFs. >>> >>> I suppose that N*[] could do the trick. > > That looks wrong to me. > > >>> It could be optimized so that N*[] >>> is equal to an empty list but with preallocated memory exactly for N >>> elements. Could it be? >> >> Cython optimize already PyList_Append very well. Theofore scenario when >> first one create empty list with exactly prealocated memory for N elements >> and second eval elements of the list and add them using plain list.append >> could optimized in Cython very well too. As result constructed list will >> contain memory only for N elements. This allows don't vast memory when one >> need to build many many lists with relative big size. > > I prefer not adding any new syntax as long as we are not sure we can't fix > it by making list comprehensions smarter. I tried this a while ago and have > some initial code lying around somewhere in my patch queue. Didn't have the > time to make it any usable, though, also because Cython didn't have its own > append() for list comprehensions at the time. It does now, as you noted. There are several cases where we can get the size of the result list upfront, we can certainly do better here now. I'm also -1 to adding special syntax for populating a list with NULL values, if you really want to do this (and I doubt it really matters in most cases) calling PyList_New is the "syntax" to use. - Robert From yury at shurup.com Thu Mar 7 19:26:26 2013 From: yury at shurup.com (Yury V. Zaytsev) Date: Thu, 07 Mar 2013 19:26:26 +0100 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <51387843.602@behnel.de> References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> Message-ID: <1362680786.2664.12.camel@newpride> Hi Stefan, On Thu, 2013-03-07 at 12:21 +0100, Stefan Behnel wrote: > Note that Python has an algorithm for shrinking a list on appending, so > this might not be sufficient for your use case. What do you need it for? W00t! I didn't know about that. I'm wrapping a C++ code that should transmit large lists of objects to Python, while these objects are stored into something vector-like, which shouldn't get exposed directly. In the past, they did something like obj = PyList_New(a.size()); for (a.begin(); a.end(); ++a) PyList_SetItem(obj, ...) I figured I can translate it into a while loop like obj = [] while (it != a.end()): obj.append(it) inc(it) but then I'm not using the information about the size of a that I already have, and for huge lists this tends to be quite slow. I think this must be quite a common use case for bindings... > And why won't [None]*N help you out? It should be pretty cheap. It probably will, at least a bit. It just stroke me that if I'm going to do something along the lines of idx = 0 obj = [None] * a.size() while (it != a.end()): obj[idx] = it idx += 1 inc(it) I could also squeeze the last bits of performance by avoiding the creation of Nones and subsequently populating the list with them. If you say I have to use PyList_New directly, oh well... It's just that now since I'm rewriting the bindings in Cython anyways, I'm also trying to avoid using Python C API directly as much as possible. > Won't list comprehensions work for you? They could potentially be adapted > to presize the list. I guess not. -- Sincerely yours, Yury V. Zaytsev From robertwb at gmail.com Thu Mar 7 19:44:48 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 7 Mar 2013 10:44:48 -0800 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <1362680786.2664.12.camel@newpride> References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> <1362680786.2664.12.camel@newpride> Message-ID: On Thu, Mar 7, 2013 at 10:26 AM, Yury V. Zaytsev wrote: > Hi Stefan, > > On Thu, 2013-03-07 at 12:21 +0100, Stefan Behnel wrote: > >> Note that Python has an algorithm for shrinking a list on appending, so >> this might not be sufficient for your use case. What do you need it for? > > W00t! I didn't know about that. > > I'm wrapping a C++ code that should transmit large lists of objects to > Python, while these objects are stored into something vector-like, which > shouldn't get exposed directly. In the past, they did something like > > obj = PyList_New(a.size()); > for (a.begin(); a.end(); ++a) PyList_SetItem(obj, ...) > > I figured I can translate it into a while loop like > > obj = [] > while (it != a.end()): > obj.append(it) > inc(it) > > but then I'm not using the information about the size of a that I > already have, and for huge lists this tends to be quite slow. > > I think this must be quite a common use case for bindings... > >> And why won't [None]*N help you out? It should be pretty cheap. > > It probably will, at least a bit. It just stroke me that if I'm going to > do something along the lines of > > idx = 0 > obj = [None] * a.size() > while (it != a.end()): > obj[idx] = it > idx += 1 > inc(it) > > I could also squeeze the last bits of performance by avoiding the > creation of Nones and subsequently populating the list with them. > > If you say I have to use PyList_New directly, oh well... It's just that > now since I'm rewriting the bindings in Cython anyways, I'm also trying > to avoid using Python C API directly as much as possible. I would time the two approaches to see if it really matters. >> Won't list comprehensions work for you? They could potentially be adapted >> to presize the list. > > I guess not. [o for o in a] is nice and clean. If a has a size method (a common stl pattern), we could optimistically call that to do the pre-allocation. I don't know exactly what your usecase is, but you might consider simply exposing a list-like wrapper supporting __getitem__ and iteration, rather than eagerly converting the entire thing to a list. - Robert From greg.ewing at canterbury.ac.nz Fri Mar 8 00:19:53 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 08 Mar 2013 12:19:53 +1300 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> Message-ID: <51392099.7040205@canterbury.ac.nz> Nikita Nemkin wrote: > Sorry, accidental early send. Previous mail continued... > > [None] * N makes an extra pass over the list to assign None to each item > (and also incref None n times). Maybe this could be optimised by adding N to the reference count instead of incrementing it N times? -- Greg From robertwb at gmail.com Fri Mar 8 00:41:54 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 7 Mar 2013 15:41:54 -0800 Subject: [Cython] Cython syntax to pre-allocate lists for performance In-Reply-To: <51392099.7040205@canterbury.ac.nz> References: <1362654970.2849.9.camel@newpride> <51387843.602@behnel.de> <51392099.7040205@canterbury.ac.nz> Message-ID: On Thu, Mar 7, 2013 at 3:19 PM, Greg Ewing wrote: > Nikita Nemkin wrote: >> >> Sorry, accidental early send. Previous mail continued... >> >> [None] * N makes an extra pass over the list to assign None to each item >> (and also incref None n times). > > > Maybe this could be optimised by adding N to the reference > count instead of incrementing it N times? I'd be surprised if the C compiler doesn't. http://hg.python.org/cpython/file/1d4849f9e37d/Objects/listobject.c#l515 From szport at gmail.com Fri Mar 8 08:49:47 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 8 Mar 2013 10:49:47 +0300 Subject: [Cython] Add support for list/tuple slicing In-Reply-To: References: Message-ID: 2013/3/7 Zaur Shibzukhov : > Current Cython generate for slicing of list/tuple general > PySequence_GetSlice/SetSlice call. > We could replace that to native call for Py{List|Tuple}_GetSlice and > PyList_SetSlice for lists/tuples. There is updated change that use utility code __Pyx_Py{List|Tuple}_GetSlice because Py{List|Tuple}_GetSlice dosn't support negative indices. That job do (in CPython) {list|tuple}slice function from type object's slot ({list|tuple}_subscript), but it handle both indices and slice objects which add overhead. That's the reason why PySequence_GetSlice is slower: it create slice object and falls to {list|tuple}_subscript. Therefore I added utility code. Here is utility code: /////////////// PyList_GetSlice.proto /////////////// static PyObject* __Pyx_PyList_GetSlice( PyObject* lst, Py_ssize_t start, Py_ssize_t stop); /////////////// PyList_GetSlice /////////////// PyObject* __Pyx_PyList_GetSlice( PyObject* lst, Py_ssize_t start, Py_ssize_t stop) { Py_ssize_t i, length; PyListObject* np; PyObject **src, **dest; PyObject *v; length = PyList_GET_SIZE(lst); if (start < 0) { start += length; if (start < 0) start = 0; } if (stop < 0) stop += length; else if (stop > length) stop = length; length = stop - start; if (length <= 0) return PyList_New(0); np = (PyListObject*) PyList_New(length); if (np == NULL) return NULL; src = ((PyListObject*)lst)->ob_item + start; dest = np->ob_item; for (i = 0; i < length; i++) { v = src[i]; Py_INCREF(v); dest[i] = v; } return (PyObject*)np; } /////////////// PyTuple_GetSlice.proto /////////////// static PyObject* __Pyx_PyTuple_GetSlice( PyObject* ob, Py_ssize_t start, Py_ssize_t stop); /////////////// PyTuple_GetSlice /////////////// PyObject* __Pyx_PyTuple_GetSlice( PyObject* ob, Py_ssize_t start, Py_ssize_t stop) { Py_ssize_t i, length; PyTupleObject* np; PyObject **src, **dest; PyObject *v; length = PyTuple_GET_SIZE(ob); if (start < 0) { start += length; if (start < 0) start = 0; } if (stop < 0) stop += length; else if (stop > length) stop = length; length = stop - start; if (length <= 0) return PyList_New(0); np = (PyTupleObject *) PyTuple_New(length); if (np == NULL) return NULL; src = ((PyTupleObject*)ob)->ob_item + start; dest = np->ob_item; for (i = 0; i < length; i++) { v = src[i]; Py_INCREF(v); dest[i] = v; } return (PyObject*)np; } Here is testing code: list_slice.pyx ----------------- from cpython.sequence cimport PySequence_GetSlice cdef extern from "list_tuple_slices.h": inline object __Pyx_PyList_GetSlice(object ob, int start, int stop) inline object __Pyx_PyTuple_GetSlice(object ob, int start, int stop) cdef list lst = list(range(10)) cdef list lst2 = list(range(7)) def get_slice1(list lst): cdef int i cdef list res = [] for i in range(200000): res.append(PySequence_GetSlice(lst, 2, 8)) return res def get_slice2(list lst): cdef int i cdef list res = [] for i in range(200000): res.append(__Pyx_PyList_GetSlice(lst, 2, 8)) return res def test_get_slice1(): get_slice1(lst) def test_get_slice2(): get_slice2(lst) tuple_slicing.pyx ----------------------- from cpython.sequence cimport PySequence_GetSlice cdef extern from "list_tuple_slices.h": inline object __Pyx_PyList_GetSlice(object lst, int start, int stop) inline object __Pyx_PyTuple_GetSlice(object ob, int start, int stop) cdef tuple lst = tuple(range(10)) def get_slice1(tuple lst): cdef int i cdef list res = [] for i in range(200000): res.append(PySequence_GetSlice(lst, 2, 8)) return res def get_slice2(tuple lst): cdef int i cdef list res = [] for i in range(200000): res.append(__Pyx_PyTuple_GetSlice(lst, 2, 8)) return res def test_get_slice1(): get_slice1(lst) def test_get_slice2(): get_slice2(lst) Here are timings: for list (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.list_slice import test_get_slice1" "test_get_slice1()" raw times: 10.2 10.3 10.4 10.1 10.2 100 loops, best of 5: 101 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.list_slice import test_get_slice1" "test_get_slice1()" raw times: 10.3 10.3 10.2 10.3 10.2 100 loops, best of 5: 102 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.list_slice import test_get_slice2" "test_get_slice2()" raw times: 8.16 8.19 8.17 8.2 8.16 100 loops, best of 5: 81.6 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.list_slice import test_get_slice2" "test_get_slice2()" raw times: 8.1 8.05 8.03 8.06 8.07 100 loops, best of 5: 80.3 msec per loop for tuple (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.tuple_slice import test_get_slice1" "test_get_slice1()" raw times: 7.2 7.16 7.16 7.18 7.17 100 loops, best of 5: 71.6 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.tuple_slice import test_get_slice1" "test_get_slice1()" raw times: 7.22 7.22 7.19 7.18 7.18 100 loops, best of 5: 71.8 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.tuple_slice import test_get_slice2" "test_get_slice2()" raw times: 9.23 5.2 4.95 4.96 4.98 100 loops, best of 5: 49.5 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s "from mytests.tuple_slice import test_get_slice2" "test_get_slice2()" raw times: 4.92 4.93 4.9 4.94 4.92 100 loops, best of 5: 49 msec per loop This change dosn't contain list slice assignments because previous testing and timings showed that this need more analysis. Maybe I'l make pull request with this change + tests? Zaur Shibzukhov From ben.strulo at bt.com Fri Mar 8 10:25:10 2013 From: ben.strulo at bt.com (ben.strulo at bt.com) Date: Fri, 8 Mar 2013 09:25:10 +0000 Subject: [Cython] Probably Memory Leak Message-ID: Hi there, I think I may have found a memory leak in cpython.array. Or I may have screwed up: I have a test.pyx containing: #---------------------------- from cpython.array cimport array,clone cdef class Test(object): cdef int[:] myarr def __init__(self): cdef array templatei = array("i") self.myarr = clone(templatei,10000,True) #---------------------------- Then a test harness which is just: #---------------------------- import test i = 0 while True: print i i += 1 s = test.Test() #---------------------------- And this fills memory until I get a MemoryError exception. I'm using a fresh copy of Cython from Git (unless I messed that up :)) on Windows, compiling with MSVC 9. Not sure what diagnostics might help but it's a pretty simple test case. I haven't found a bug in the Cython source but this doesn't seem right. Hope this is of interest Ben Strulo -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Mar 15 11:59:14 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Mar 2013 11:59:14 +0100 Subject: [Cython] python-dev discussion on better CPython core-level parallelism Message-ID: <5142FF02.7030001@behnel.de> http://thread.gmane.org/gmane.comp.python.devel/137858/focus=137858 From stefan_ml at behnel.de Fri Mar 15 20:52:22 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 15 Mar 2013 20:52:22 +0100 Subject: [Cython] [cython] BUG: Avoid exporting symbols in MemoryView utility code. (#197) In-Reply-To: References: Message-ID: <51437BF6.9030106@behnel.de> Hi, this change revealed that the generated utility code functions are written into the C code a bit too unconditionally. The numpy_memoryview test shows lots of C compiler warnings about unused dtype conversion functions now: https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/1165/warnings15Result/package.45/file.1478077899/ Stefan From johntyree at gmail.com Sat Mar 16 18:16:18 2013 From: johntyree at gmail.com (John Tyree) Date: Sat, 16 Mar 2013 18:16:18 +0100 Subject: [Cython] Template functions In-Reply-To: References: Message-ID: <20130316171618.GA15429@gmail.com> There is currently a void in Cython's C++ support with respect to function (not class) templates. It would be great to have such a thing, dangerous or not, so I'm proposing something to get things rolling. Given that function templates are 100% transparent to the caller, it seems that the only barrier is Cython's type system. Even in the easiest case, where the function returns a known primitive type for all input, we still can't use it. template std::string to_string(T a) ------- from libcpp.string import string as cpp_string cdef extern from "foo.h" namespace "std": cpp_string to_string(??? a, ??? b) We can used fused types if we know that the function is restricted to numeric types, for example, but in general this is not the case. The only workaround I currently have is to declare the function N times for N types. This isn't disastrous, but prevents sharing of code. As an alternative, what about a dynamic ANY type that uses the fused type machinery, but always succeeds when specializing? Or perhaps it just shouldn't be type checked at all? There is always a backend that will generate the type error and this possibly gives us macro "functions" for free in C. cdef extern from "foo.h" namespace "std": cpp_string to_string(cython.any_t a, cython.any_t b) Pros: Huge number of functions become accessible from Cython User explicitly states when a type should be unchecked Allows mixtures of typed and untyped parameters in a single call Cons: Makes determining return types hard in some cases. Error messages might be difficult to interpret ????? I'm-sure-this-list-should-be-longer I'll admit I haven't dug very deep as far as the implications of such a thing. Is it a reasonable idea? What are the major issues with such an approach? -John From robertwb at gmail.com Sat Mar 16 18:22:08 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 16 Mar 2013 10:22:08 -0700 Subject: [Cython] Template functions In-Reply-To: <20130316171618.GA15429@gmail.com> References: <20130316171618.GA15429@gmail.com> Message-ID: On Sat, Mar 16, 2013 at 10:16 AM, John Tyree wrote: > There is currently a void in Cython's C++ support with respect to function (not > class) templates. It would be great to have such a thing, dangerous or not, so > I'm proposing something to get things rolling. > > Given that function templates are 100% transparent to the caller, it seems that > the only barrier is Cython's type system. Even in the easiest case, where the > function returns a known primitive type for all input, we still can't use it. > > template > std::string to_string(T a) > > ------- > > from libcpp.string import string as cpp_string > > cdef extern from "foo.h" namespace "std": > > cpp_string to_string(??? a, ??? b) > > > We can used fused types if we know that the function is restricted to numeric > types, for example, but in general this is not the case. The only workaround I > currently have is to declare the function N times for N types. This isn't > disastrous, but prevents sharing of code. > > As an alternative, what about a dynamic ANY type that uses the fused type > machinery, but always succeeds when specializing? Or perhaps it just shouldn't > be type checked at all? There is always a backend that will generate the type > error and this possibly gives us macro "functions" for free in C. > > > cdef extern from "foo.h" namespace "std": > > cpp_string to_string(cython.any_t a, cython.any_t b) > > > Pros: > Huge number of functions become accessible from Cython > User explicitly states when a type should be unchecked > Allows mixtures of typed and untyped parameters in a single call > > Cons: > Makes determining return types hard in some cases. > Error messages might be difficult to interpret > ????? > I'm-sure-this-list-should-be-longer > > > I'll admit I haven't dug very deep as far as the implications of such a thing. > Is it a reasonable idea? What are the major issues with such an approach? I was thinking of something along the lines of cdef extern from ...: cpp_string to_string[T](T value) T my_func[T, S](T a, S b) ... It's more a question of how to implement it. - Robert From nikita at nemkin.ru Sat Mar 16 21:39:33 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Sun, 17 Mar 2013 02:39:33 +0600 Subject: [Cython] Minor bug: emitted junk line prevents compilation Message-ID: Hi, I believe I have found a bit of broken/junk code. This line produces an unpaired and unnecessary #if directive: https://github.com/cython/cython/blob/master/Cython/Compiler/ModuleNode.py#L2423 The fix is to simply remove it. In case you are interested in how to hit this line, declare in some .pxd: cdef extern from "Python.h": ctypedef class __builtin__.BaseException [object PyBaseExceptionObject]: pass and cimport it in another .pyx. Best regards, Nikita Nemkin From stefan_ml at behnel.de Sat Mar 16 21:59:12 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 16 Mar 2013 21:59:12 +0100 Subject: [Cython] Minor bug: emitted junk line prevents compilation In-Reply-To: References: Message-ID: <5144DD20.3050804@behnel.de> Nikita Nemkin, 16.03.2013 21:39: > I believe I have found a bit of broken/junk code. > This line produces an unpaired and unnecessary #if directive: > https://github.com/cython/cython/blob/master/Cython/Compiler/ModuleNode.py#L2423 > > The fix is to simply remove it. Yes, it's correctly used further down in the code. Thanks! > In case you are interested in how to hit this line, declare in some .pxd: > > cdef extern from "Python.h": > ctypedef class __builtin__.BaseException [object > PyBaseExceptionObject]: > pass Why would you need to do that in your code? > and cimport it in another .pyx. It's sad that the cross-module importing and C-API code is so badly tested. Any help to improve this situation will be very warmly appreciated. Stefan From nikita at nemkin.ru Sat Mar 16 22:30:58 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Sun, 17 Mar 2013 03:30:58 +0600 Subject: [Cython] Minor bug: emitted junk line prevents compilation In-Reply-To: <5144DD20.3050804@behnel.de> References: <5144DD20.3050804@behnel.de> Message-ID: On Sun, 17 Mar 2013 02:59:12 +0600, Stefan Behnel wrote: >> In case you are interested in how to hit this line, declare in some >> .pxd: >> >> cdef extern from "Python.h": >> ctypedef class __builtin__.BaseException [object >> PyBaseExceptionObject]: >> pass > > Why would you need to do that in your code? It makes Cython treat BaseException as an extension type (Exception is declared similarly) and allows for things like: * Using "Exception" as a type for parameters, attributes, casts. All this with Cython-generated and optimized typeckecking. * Creating Exception subclasses as cdef classes. It's a hack, but a very useful one. Best regards, Nikita Nemkin From pav at iki.fi Sun Mar 17 17:15:51 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 17 Mar 2013 18:15:51 +0200 Subject: [Cython] Refcount error with fused types in classes Message-ID: Hi, Here's a snippet demonstrating a refcount error with fused types inside classes: ---------8<--------- cimport cython ctypedef fused some_t: int double class Foo(object): def bar(self, some_t x): pass cdef extern from "Python.h": int Py_REFCNT(object) def main(): x = Foo() print "before:", Py_REFCNT(x) x.bar(1.0) # spuriously increments refcount of `x` print "after: ", Py_REFCNT(x) ---------8<--------- -- Pauli Virtanen From markflorisson88 at gmail.com Sun Mar 17 17:51:37 2013 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 17 Mar 2013 16:51:37 +0000 Subject: [Cython] Refcount error with fused types in classes In-Reply-To: References: Message-ID: On 17 March 2013 16:15, Pauli Virtanen wrote: > Hi, > > Here's a snippet demonstrating a refcount error with fused types inside > classes: > > ---------8<--------- > cimport cython > > ctypedef fused some_t: > int > double > > class Foo(object): > def bar(self, some_t x): > pass > > cdef extern from "Python.h": > int Py_REFCNT(object) > > def main(): > x = Foo() > print "before:", Py_REFCNT(x) > x.bar(1.0) # spuriously increments refcount of `x` > print "after: ", Py_REFCNT(x) > ---------8<--------- > > > -- > Pauli Virtanen > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Thanks, I pushed a fix here: https://github.com/markflorisson88/cython (fd4853d202b13a92). From pav at iki.fi Sun Mar 17 18:18:07 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 17 Mar 2013 19:18:07 +0200 Subject: [Cython] Refcount error with fused types in classes In-Reply-To: References: Message-ID: Hi, 17.03.2013 18:51, mark florisson kirjoitti: [clip] > Thanks, I pushed a fix here: https://github.com/markflorisson88/cython > (fd4853d202b13a92). Thanks. You beat me to this, I just arrived at the same fix :) Cheers, Pauli From pav at iki.fi Sun Mar 17 18:32:20 2013 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 17 Mar 2013 19:32:20 +0200 Subject: [Cython] Refcount error with fused types in classes In-Reply-To: References: Message-ID: 17.03.2013 19:18, Pauli Virtanen kirjoitti: > Hi, > > 17.03.2013 18:51, mark florisson kirjoitti: > [clip] >> Thanks, I pushed a fix here: https://github.com/markflorisson88/cython >> (fd4853d202b13a92). > > Thanks. You beat me to this, I just arrived at the same fix :) Note that the Py_XDECREF(self->__signatures__) needs to be removed from _dealloc, though. -- Pauli Virtanen From johntyree at gmail.com Sun Mar 17 20:15:17 2013 From: johntyree at gmail.com (John Tyree) Date: Sun, 17 Mar 2013 20:15:17 +0100 Subject: [Cython] Template functions In-Reply-To: References: Message-ID: <20130317191517.GA6530@gmail.com> > > I was thinking of something along the lines of > > cdef extern from ...: > cpp_string to_string[T](T value) > T my_func[T, S](T a, S b) > ... > > It's more a question of how to implement it. > > - Robert Well this closely matches the syntax used for classes and won't require any type inference, since the user supplies the type at the call site (am I reading that correctly?) so I'm not sure what about it will be particularly challenging. If it's done this way the compiler could generate prototypes as necessary in a preprocessing step, without inferring anything about the types until later when overloading is resolved. That feels kind of hacky to me, but I've never written a compiler with the size and scope of Cython, maybe it's not too bad. This is essentially what the user has to do already, and it "works". The biggest complaint I have about this method is that without inference it looks like it could lead to a *lot* of extra writing out of types. I'm dreading the thought of writing out nested template types when calling factory functions like those in the thrust library, which was what motivated this in the first place. -John From nikita at nemkin.ru Mon Mar 18 14:17:13 2013 From: nikita at nemkin.ru (Nikita Nemkin) Date: Mon, 18 Mar 2013 19:17:13 +0600 Subject: [Cython] Minor bun in compile time constant handling Message-ID: Hi, Here: https://github.com/cython/cython/blob/master/Cython/Compiler/Parsing.py#L708-L711 compile-time unicode and bytes values should be wrapped with EncodedString and BytesLiteral respectively: elif isinstance(value, _unicode): return ExprNodes.UnicodeNode(pos, value=EncodedString(value)) elif isinstance(value, _bytes): return ExprNodes.BytesNode(pos, value=BytesLiteral(value)) Otherwise attempts to use compile-time strings in Python context result in errors like "AttributeError: 'unicode' object has no attribute 'is_unicode'". Best regards, Nikita Nemkin From robertwb at gmail.com Mon Mar 18 17:43:14 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 18 Mar 2013 09:43:14 -0700 Subject: [Cython] Template functions In-Reply-To: <20130317191517.GA6530@gmail.com> References: <20130317191517.GA6530@gmail.com> Message-ID: On Sun, Mar 17, 2013 at 12:15 PM, John Tyree wrote: >> >> I was thinking of something along the lines of >> >> cdef extern from ...: >> cpp_string to_string[T](T value) >> T my_func[T, S](T a, S b) >> ... >> >> It's more a question of how to implement it. >> >> - Robert > > Well this closely matches the syntax used for classes and won't require any type > inference, since the user supplies the type at the call site (am I reading that > correctly?) so I'm not sure what about it will be particularly challenging. > > If it's done this way the compiler could generate prototypes as necessary in a > preprocessing step, without inferring anything about the types until later when > overloading is resolved. That feels kind of hacky to me, but I've never written > a compiler with the size and scope of Cython, maybe it's not too bad. This is > essentially what the user has to do already, and it "works". > > The biggest complaint I have about this method is that without inference it > looks like it could lead to a *lot* of extra writing out of types. I'm dreading > the thought of writing out nested template types when calling factory functions > like those in the thrust library, which was what motivated this in the first > place. I think we need something to constrain the argument types (e.g. as they relate to each other), as well as provide a return type. The "any" type seems to lead way to easily to incorrect code, as well as surprises, e.g. is the "object" type accepted? (FWIW, I was thinking of allowing inference, that'll actually be pretty easy once the rest is in place.) - Robert From stefan_ml at behnel.de Wed Mar 20 07:47:11 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 20 Mar 2013 07:47:11 +0100 Subject: [Cython] Jenkins down Message-ID: <51495B6F.5080607@behnel.de> Hi, just to let you know that sage.math (where our Jenkins instance runs) has been down for a couple of days already and it's currently unclear when it will be back. Most likely during the next days, though. Note that this means that we have lost all work spaces and git caches, so the first builds will take somewhat longer than normal once we get it restarted (assuming that everything comes up nicely in the first place...). Stefan From volker.mische at gmail.com Fri Mar 22 14:47:52 2013 From: volker.mische at gmail.com (Volker Mische) Date: Fri, 22 Mar 2013 14:47:52 +0100 Subject: [Cython] Constant pointers not working Message-ID: <514C6108.5040709@gmail.com> Hi all, I was excited to see that 'const' is finally supported, but constant pointers are not. Here's an example with the corresponding error: Error compiling Cython file: ------------------------------------------------------------ ... cdef extern int foo(const int *const bar) ^ ------------------------------------------------------------ const.pxd:1:37: Expected ')', found 'bar' Cheers, Volker From matej at laitl.cz Sat Mar 23 23:57:51 2013 From: matej at laitl.cz (=?utf-8?B?TWF0xJtq?= Laitl) Date: Sat, 23 Mar 2013 23:57:51 +0100 Subject: [Cython] view[0].methodcall() produces invalid C when view is memoryview of extension type Message-ID: <1401112.QzbjNdWC9X@edgy> Hi, following test code produces C code that fails to compile: > cdef class ExtensionType(object): > cdef public int dummy > > def __init__(self, n): > self.dummy = n > > cdef cfoo(self): > print self.dummy > > items = [ExtensionType(1), ExtensionType(2)] > cdef ExtensionType[:] view = np.array(items, dtype=ExtensionType) > view[0].cfoo() with gcc error and relevant C file lines: extension_type_memoryview.c:2604:94: error: ?PyObject? has no member named ?__pyx_vtab? 2570: PyObject *__pyx_t_1 = NULL; (...) 2601: __pyx_t_1 = (PyObject *) *((struct :__pyx_obj_25extension_type_memoryview_ExtensionType * *) ( /* dim=0 */ : (__pyx_v_25extension_type_memoryview_view.data + __pyx_t_2 * : __pyx_v_25extension_type_memoryview_view.strides[0]) )); 2602: __Pyx_INCREF((PyObject*)__pyx_t_1); 2603: /* __pyx_t_4 allocated */ 2604: __pyx_t_4 = ((struct : __pyx_vtabstruct_25extension_type_memoryview_ExtensionType *)__pyx_t_1 :->__pyx_vtab)->cfoo(__pyx_t_1); if (unlikely(!__pyx_t_4)) {__pyx_filename : = __pyx_f[0]; __pyx_lineno = 69; It seems that generic PyObject* temporary for __pyx_t_1 is used here while typed ExtensionType* temporary should have been used instead (as suggested by excess casting on line 2601). I have the above test-case (and a bit more) prepared as a patch that I'll pull-request one this seemingly trivial bug is fixed in git. Regards, Mat?j From stefan_ml at behnel.de Sun Mar 24 17:29:00 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Mar 2013 17:29:00 +0100 Subject: [Cython] Jenkins down In-Reply-To: <51495B6F.5080607@behnel.de> References: <51495B6F.5080607@behnel.de> Message-ID: <514F29CC.8030005@behnel.de> Stefan Behnel, 20.03.2013 07:47: > just to let you know that sage.math (where our Jenkins instance runs) has > been down for a couple of days already and it's currently unclear when it > will be back. Most likely during the next days, though. Note that this > means that we have lost all work spaces and git caches, so the first builds > will take somewhat longer than normal once we get it restarted (assuming > that everything comes up nicely in the first place...). I started Jenkins on the boxen server (which the sage.math DNS entry currently forwards to), running on a local disk AFAICT. It looks ok so far, although I had to disable the 32bit tests due to missing Ubuntu packages. It will also be generally slower than on sage.math, because we no longer have a ramdisk to put the workspaces into. But I think it's most important to have it back alive at all. Stefan From stefan_ml at behnel.de Sun Mar 24 18:02:33 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Mar 2013 18:02:33 +0100 Subject: [Cython] timeframe for 0.19? Message-ID: <514F31A9.4050103@behnel.de> Hi, the current master has collected quite a number of improvements and I think we should try to get them out of the door. Any objections to starting with the preparations in the first week of April? Stefan From wstein at gmail.com Sun Mar 24 19:21:23 2013 From: wstein at gmail.com (William Stein) Date: Sun, 24 Mar 2013 11:21:23 -0700 Subject: [Cython] Jenkins down In-Reply-To: <514F29CC.8030005@behnel.de> References: <51495B6F.5080607@behnel.de> <514F29CC.8030005@behnel.de> Message-ID: On Sun, Mar 24, 2013 at 9:29 AM, Stefan Behnel wrote: > Stefan Behnel, 20.03.2013 07:47: >> just to let you know that sage.math (where our Jenkins instance runs) has >> been down for a couple of days already and it's currently unclear when it >> will be back. Most likely during the next days, though. Note that this >> means that we have lost all work spaces and git caches, so the first builds >> will take somewhat longer than normal once we get it restarted (assuming >> that everything comes up nicely in the first place...). > > I started Jenkins on the boxen server (which the sage.math DNS entry > currently forwards to), running on a local disk AFAICT. It looks ok so far, > although I had to disable the 32bit tests due to missing Ubuntu packages. Which packages? I can install them easily. > It will also be generally slower than on sage.math, because we no longer > have a ramdisk to put the workspaces into. But I think it's most important > to have it back alive at all. > > Stefan > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel -- William Stein Professor of Mathematics University of Washington http://wstein.org From stefan_ml at behnel.de Sun Mar 24 20:14:32 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 24 Mar 2013 20:14:32 +0100 Subject: [Cython] Jenkins down In-Reply-To: References: <51495B6F.5080607@behnel.de> <514F29CC.8030005@behnel.de> Message-ID: <514F5098.8090100@behnel.de> William Stein, 24.03.2013 19:21: > On Sun, Mar 24, 2013 at 9:29 AM, Stefan Behnel wrote: >> I started Jenkins on the boxen server (which the sage.math DNS entry >> currently forwards to), running on a local disk AFAICT. It looks ok so far, >> although I had to disable the 32bit tests due to missing Ubuntu packages. > > Which packages? I can install them easily. Thanks - I would have asked if I knew which ones. I couldn't look into this yet, just noticed that the 32bit builds didn't work "for some reason". Basically, sage.math had a (mostly) working 32bit gcc build environment, but I'll have to see what that included. Stefan From robertwb at gmail.com Mon Mar 25 18:43:23 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 25 Mar 2013 10:43:23 -0700 Subject: [Cython] timeframe for 0.19? In-Reply-To: <514F31A9.4050103@behnel.de> References: <514F31A9.4050103@behnel.de> Message-ID: Sound good to me. On Sun, Mar 24, 2013 at 10:02 AM, Stefan Behnel wrote: > Hi, > > the current master has collected quite a number of improvements and I think > we should try to get them out of the door. Any objections to starting with > the preparations in the first week of April? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From Martin.Fiers at intec.ugent.be Tue Mar 26 10:52:02 2013 From: Martin.Fiers at intec.ugent.be (Martin Fiers) Date: Tue, 26 Mar 2013 10:52:02 +0100 Subject: [Cython] Bug: Returning real value crashes the code, complex value does not Message-ID: <51516FC2.1090802@intec.ugent.be> Dear Cython developers, I stumbled upon a strange error when using Cython. I made a minimal working example, see attachment for the two necessary files. (btw I didn't find the e-mail address of Robert Bradshaw so I could not request him for an account on the issue tracker. Is it possible to put the bug on there?) To reproduce the bug: 1) Reboot to Windows :) (the bug only appears on Windows) 2) Run compile_bug.py to generate the Cython extension 3) Try to run the my_func_exposed function: python >>> import complex_double (does not crash) >>> complex_double.my_func_exposed(1,1j) (crashes) >>> complex_double.my_func_exposed(1,1) If I put a breakpoint in the code with gdb, jump in the code, and leave the function again, it does not crash! Also, it is no problem on Linux. It has to do with the fact that in the first case, a real value was used. In the complex-value case, it does not crash. I went through the generated cpp file and I don't see any issues there (the reason I use cpp is because it's in a big project that needs cpp enabled; it is further linked and so on). gcc version used: 4.6.2 (mingw) cython version used: 0.18 (I did pip install Cython) python version used: python 2.7.3 (MSC v.1500 32 bit). Looking forward to hearing from you! With kind regards, Martin -- ----------------------------------------------------------- ir. Martin Fiers Photonics Research Group Universiteit Gent - Ghent University Sint-Pietersnieuwstraat 41 9000 Gent - Belgium T + 32 9 264 34 48 E martin.fiers at intec.ugent.be W www.caphesim.com ----------------------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: compile_bug.py Type: text/x-python Size: 1087 bytes Desc: not available URL: -------------- next part -------------- import cython @cython.cdivision(True) cdef public double complex my_func(int a, b): if(a==0): return 1; else: return b; def my_func_exposed(int a, b): return my_func(a,b) From robertwb at gmail.com Tue Mar 26 18:48:43 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Tue, 26 Mar 2013 10:48:43 -0700 Subject: [Cython] Bug: Returning real value crashes the code, complex value does not In-Reply-To: <51516FC2.1090802@intec.ugent.be> References: <51516FC2.1090802@intec.ugent.be> Message-ID: On Tue, Mar 26, 2013 at 2:52 AM, Martin Fiers wrote: > Dear Cython developers, > > I stumbled upon a strange error when using Cython. I made a minimal working > example, see attachment for the two necessary files. (btw I didn't find the > e-mail address of Robert Bradshaw so I could not request him for an account > on the issue tracker. Is it possible to put the bug on there?) Sure. You should have my email now. > To reproduce the bug: > 1) Reboot to Windows :) (the bug only appears on Windows) > 2) Run compile_bug.py to generate the Cython extension > 3) Try to run the my_func_exposed function: > > python >>>> import complex_double > (does not crash) >>>> complex_double.my_func_exposed(1,1j) > (crashes) >>>> complex_double.my_func_exposed(1,1) > > If I put a breakpoint in the code with gdb, jump in the code, and leave the > function again, it does not crash! Also, it is no problem on Linux. > > It has to do with the fact that in the first case, a real value was used. In > the complex-value case, it does not crash. I went through the generated cpp > file and I don't see any issues there (the reason I use cpp is because it's > in a big project that needs cpp enabled; it is further linked and so on). > > gcc version used: 4.6.2 (mingw) > cython version used: 0.18 (I did pip install Cython) > python version used: python 2.7.3 (MSC v.1500 32 bit). Very strange. Does calling PyComplex_AsCComplex directly produce the same crash? What about cdef complex double x = 1.0 or cdef object py_x = 1.0 cdef complex double x = py_x ? - Robert From Martin.Fiers at intec.ugent.be Wed Mar 27 01:12:13 2013 From: Martin.Fiers at intec.ugent.be (Martin Fiers) Date: Wed, 27 Mar 2013 01:12:13 +0100 Subject: [Cython] Bug: Returning real value crashes the code, complex value does not In-Reply-To: References: <51516FC2.1090802@intec.ugent.be> Message-ID: <5152395D.4000408@intec.ugent.be> On 3/26/2013 6:48 PM, Robert Bradshaw wrote: > On Tue, Mar 26, 2013 at 2:52 AM, Martin Fiers > wrote: >> Dear Cython developers, >> >> I stumbled upon a strange error when using Cython. I made a minimal working >> example, see attachment for the two necessary files. (btw I didn't find the >> e-mail address of Robert Bradshaw so I could not request him for an account >> on the issue tracker. Is it possible to put the bug on there?) > Sure. You should have my email now. Thank you! I just sent a mail. Also, thanks for replying so quickly. Replies follow inline. > >> To reproduce the bug: >> 1) Reboot to Windows :) (the bug only appears on Windows) >> 2) Run compile_bug.py to generate the Cython extension >> 3) Try to run the my_func_exposed function: >> >> python >>>>> import complex_double >> (does not crash) >>>>> complex_double.my_func_exposed(1,1j) >> (crashes) >>>>> complex_double.my_func_exposed(1,1) >> If I put a breakpoint in the code with gdb, jump in the code, and leave the >> function again, it does not crash! Also, it is no problem on Linux. >> >> It has to do with the fact that in the first case, a real value was used. In >> the complex-value case, it does not crash. I went through the generated cpp >> file and I don't see any issues there (the reason I use cpp is because it's >> in a big project that needs cpp enabled; it is further linked and so on). >> >> gcc version used: 4.6.2 (mingw) >> cython version used: 0.18 (I did pip install Cython) >> python version used: python 2.7.3 (MSC v.1500 32 bit). > Very strange. Does calling PyComplex_AsCComplex directly produce the > same crash? What about I'm not sure how to call this directly. Do you mean by modifying the generated cpp file and then manually building an extension module? > > cdef complex double x = 1.0 This one works. > > or > > cdef object py_x = 1.0 > cdef complex double x = py_x This one crashes! Regards, Martin > ? > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel -- ----------------------------------------------------------- ir. Martin Fiers Photonics Research Group Universiteit Gent - Ghent University Sint-Pietersnieuwstraat 41 9000 Gent - Belgium T + 32 9 264 34 48 E martin.fiers at intec.ugent.be W www.caphesim.com ----------------------------------------------------------- From robertwb at gmail.com Wed Mar 27 22:36:12 2013 From: robertwb at gmail.com (Robert Bradshaw) Date: Wed, 27 Mar 2013 14:36:12 -0700 Subject: [Cython] Bug: Returning real value crashes the code, complex value does not In-Reply-To: <5152BF75.5030708@intec.ugent.be> References: <51516FC2.1090802@intec.ugent.be> <5152395D.4000408@intec.ugent.be> <5152BF75.5030708@intec.ugent.be> Message-ID: On Wed, Mar 27, 2013 at 2:44 AM, Martin Fiers wrote: > On 3/27/2013 3:54 AM, Robert Bradshaw wrote: >> >> On Tue, Mar 26, 2013 at 5:12 PM, Martin Fiers >> wrote: >>> >>> On 3/26/2013 6:48 PM, Robert Bradshaw wrote: >>>> >>>> On Tue, Mar 26, 2013 at 2:52 AM, Martin Fiers >>>> wrote: >>>>> >>>>> Dear Cython developers, >>>>> >>>>> I stumbled upon a strange error when using Cython. I made a minimal >>>>> working >>>>> example, see attachment for the two necessary files. (btw I didn't find >>>>> the >>>>> e-mail address of Robert Bradshaw so I could not request him for an >>>>> account >>>>> on the issue tracker. Is it possible to put the bug on there?) >>>> >>>> Sure. You should have my email now. >>> >>> Thank you! I just sent a mail. >>> >>> Also, thanks for replying so quickly. Replies follow inline. >>> >>>>> To reproduce the bug: >>>>> 1) Reboot to Windows :) (the bug only appears on Windows) >>>>> 2) Run compile_bug.py to generate the Cython extension >>>>> 3) Try to run the my_func_exposed function: >>>>> >>>>> python >>>>>>>> >>>>>>>> import complex_double >>>>> >>>>> (does not crash) >>>>>>>> >>>>>>>> complex_double.my_func_exposed(1,1j) >>>>> >>>>> (crashes) >>>>>>>> >>>>>>>> complex_double.my_func_exposed(1,1) >>>>> >>>>> If I put a breakpoint in the code with gdb, jump in the code, and leave >>>>> the >>>>> function again, it does not crash! Also, it is no problem on Linux. >>>>> >>>>> It has to do with the fact that in the first case, a real value was >>>>> used. >>>>> In >>>>> the complex-value case, it does not crash. I went through the generated >>>>> cpp >>>>> file and I don't see any issues there (the reason I use cpp is because >>>>> it's >>>>> in a big project that needs cpp enabled; it is further linked and so >>>>> on). >>>>> >>>>> gcc version used: 4.6.2 (mingw) >>>>> cython version used: 0.18 (I did pip install Cython) >>>>> python version used: python 2.7.3 (MSC v.1500 32 bit). >>>> >>>> Very strange. Does calling PyComplex_AsCComplex directly produce the >>>> same crash? What about >>> >>> I'm not sure how to call this directly. Do you mean by modifying the >>> generated cpp file and then manually building an extension module? >>> >>>> cdef complex double x = 1.0 >>> >>> This one works. >>> >>>> or >>>> >>>> cdef object py_x = 1.0 >>>> cdef complex double x = py_x >>> >>> This one crashes! >> >> Ah. Try >> >> from cpython.complex cimport Py_complex, PyComplex_AsCComplex >> cdef Py_complex x = PyComplex_AsCComplex(py_x) >> print x.real, x.imag > > Ok. I tried this, and it also crashes. Here's the modification: > > from cpython.complex cimport Py_complex > from cpython.complex cimport PyComplex_AsCComplex > > @cython.cdivision(True) > cdef public double complex my_func(int a, b): > > cdef object py_x = 1.0 > > #cdef double complex x = 1.0 # Does not crash > #cdef double complex x2 = py_x # Crashes for py_x = 1, > not for py_x=1j. > #cdef Py_complex x = PyComplex_AsCComplex(py_x) # Crashes, even for > py_x=1j > #print x.real, x.imag > > And as you can see, the PyComplex_AsCComplex also crashes (SIGSEGV). > I tried to compile with debug information, as in the instructions in > http://docs.cython.org/src/userguide/debugging.html > But I cannot get the line numbers. Probably I need a debug-python version, > but that seems to be very nontrivial on Windows. > > Not sure if I can think of other options to test it and/or track down the > bug... > > Now it even crashes when py_x = 1j. So maybe there's something else going > wrong here too. I wonder if it's a compiler miss-match or something like that. > Regards, > Martin > > P.S. I only replied to you because you didn't put the > cython-devel at python.org in the previous mail. Oops. Un-intentional oversight. From dalcinl at gmail.com Fri Mar 29 20:20:12 2013 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 29 Mar 2013 22:20:12 +0300 Subject: [Cython] Commit f2a4b09b broke petsc4py Message-ID: https://github.com/cython/cython/commit/f2a4b09b94dc0783625dc869af0880742c29f58d I could not figure out how to fix it, but the following patch to the test case reproduces the problem: diff --git a/tests/run/tp_new_cimport.srctree b/tests/run/tp_new_cimport.srctree index d60d712..632172c 100644 --- a/tests/run/tp_new_cimport.srctree +++ b/tests/run/tp_new_cimport.srctree @@ -42,7 +42,7 @@ def test_sub(): ######## a.pxd ######## -cdef class ExtTypeA: +cdef api class ExtTypeA[type ExtTypeA_Type, object ExtTypeAObject]: cdef readonly attrA ######## a.pyx ######## -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stefan_ml at behnel.de Fri Mar 29 21:23:43 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 29 Mar 2013 21:23:43 +0100 Subject: [Cython] Commit f2a4b09b broke petsc4py In-Reply-To: References: Message-ID: <5155F84F.3000102@behnel.de> Hi Lisandro! Lisandro Dalcin, 29.03.2013 20:20: > https://github.com/cython/cython/commit/f2a4b09b94dc0783625dc869af0880742c29f58d > > I could not figure out how to fix it, but the following patch to the > test case reproduces the problem: > > > diff --git a/tests/run/tp_new_cimport.srctree b/tests/run/tp_new_cimport.srctree > index d60d712..632172c 100644 > --- a/tests/run/tp_new_cimport.srctree > +++ b/tests/run/tp_new_cimport.srctree > @@ -42,7 +42,7 @@ def test_sub(): > > ######## a.pxd ######## > > -cdef class ExtTypeA: > +cdef api class ExtTypeA[type ExtTypeA_Type, object ExtTypeAObject]: > cdef readonly attrA > > ######## a.pyx ######## Hmm, yes, that's not obvious to me either. I pushed a quick fix, but I'm sure there's a cleaner way to do this. (And if there isn't, there should be one...) https://github.com/cython/cython/commit/3257193a7865c1f45ac2479954be5569f0b8337e Stefan