From arfrever.fta at gmail.com Sun Apr 1 20:23:27 2012 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Sun, 1 Apr 2012 20:23:27 +0200 Subject: [Cython] Cython 0.16 Release Candidate In-Reply-To: References: Message-ID: <201204012023.30671.Arfrever.FTA@gmail.com> All tests pass with Python 2.6 (2.6.7 release). All tests pass with Python 2.7 (snapshot of 2.7 branch, revision 3623c3e6c049). All tests pass with Python 3.1 (3.1.4 release). 4 failures with Python 3.2 (snapshot of 3.2 branch, revision 0a4a6f98bd8e). Failures with Python 3.2: ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.cpp:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 24: invalid continuation byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.c:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.cpp:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 24: invalid start byte ---------------------------------------------------------------------- Ran 6475 tests in 2225.023s FAILED (failures=4) ALL DONE -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From stefan_ml at behnel.de Mon Apr 2 13:13:29 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 02 Apr 2012 13:13:29 +0200 Subject: [Cython] [cython-users] GSoC 2012 In-Reply-To: References: <4F58B3DA.5070602@behnel.de> <4F5C5D62.20003@behnel.de> Message-ID: <4F7989D9.5030508@behnel.de> Vitja Makarov, 11.03.2012 09:51: > 2012/3/11 Stefan Behnel: >> mark florisson, 11.03.2012 07:44: >>> - better type inference, that would be enabled by default and again >>> handle thing like reassignments of variables and fallbacks to the >>> default object type. With entry caching Cython could build a database >>> of types ((extension) classes, functions, variables) used in the >>> modules and functions that are compiled (also def functions), and >>> infer the types used and specialize on those. Maybe a switch should be >>> added to cython to handle circular dependencies, or maybe with the >>> distutils preprocessing it can run all the type inference first and >>> keep track of unresolved entries, and try to fill those in after >>> building the database. For bonus points the user can be allowed to >>> write plugins to aid the process. >> >> That would be my favourite. We definitely need control flow driven type >> inference, local type specialisation, variable renaming, etc. Maybe even >> whole program (or at least module) analysis, like ShedSkin and PyPy do for >> their restricted Python dialects. Any serious step towards that goal would >> be a good outcome of a GSoC. > > I think we should be careful here and try to avoid making Cython code > more complicated. I agree that WPA is probably way out of scope. However, control flow driven type inference would allow us to infer the type of a variable in a given block, e.g. for code like this: if isinstance(x, list): ... else: ... or handle cases like this: def test(x): x = list(x) # ... do read-only stuff with x below this point ... Here, we currently infer that x is an unknown object that is being assigned to twice, even though it's obviously a list in all interesting parts of the function. Stefan From vitja.makarov at gmail.com Mon Apr 2 14:14:20 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 2 Apr 2012 16:14:20 +0400 Subject: [Cython] [cython-users] GSoC 2012 In-Reply-To: <4F7989D9.5030508@behnel.de> References: <4F58B3DA.5070602@behnel.de> <4F5C5D62.20003@behnel.de> <4F7989D9.5030508@behnel.de> Message-ID: 2012/4/2 Stefan Behnel : > Vitja Makarov, 11.03.2012 09:51: >> 2012/3/11 Stefan Behnel: >>> mark florisson, 11.03.2012 07:44: >>>> - better type inference, that would be enabled by default and again >>>> handle thing like reassignments of variables and fallbacks to the >>>> default object type. With entry caching Cython could build a database >>>> of types ((extension) classes, functions, variables) used in the >>>> modules and functions that are compiled (also def functions), and >>>> infer the types used and specialize on those. Maybe a switch should be >>>> added to cython to handle circular dependencies, or maybe with the >>>> distutils preprocessing it can run all the type inference first and >>>> keep track of unresolved entries, and try to fill those in after >>>> building the database. For bonus points the user can be allowed to >>>> write plugins to aid the process. >>> >>> That would be my favourite. We definitely need control flow driven type >>> inference, local type specialisation, variable renaming, etc. Maybe even >>> whole program (or at least module) analysis, like ShedSkin and PyPy do for >>> their restricted Python dialects. Any serious step towards that goal would >>> be a good outcome of a GSoC. >> >> I think we should be careful here and try to avoid making Cython code >> more complicated. > > I agree that WPA is probably way out of scope. However, control flow driven > type inference would allow us to infer the type of a variable in a given > block, e.g. for code like this: > > ?if isinstance(x, list): > ? ? ?... > ?else: > ? ? ?... > > or handle cases like this: > > ?def test(x): > ? ? ?x = list(x) > ? ? ?# ... do read-only stuff with x below this point ... > > Here, we currently infer that x is an unknown object that is being assigned > to twice, even though it's obviously a list in all interesting parts of the > function. > What to do if an entry is of PyObject type in some block and of some C-type in another? Should it be splitten into two different entries? -- vitja. From stefan_ml at behnel.de Mon Apr 2 14:23:22 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 02 Apr 2012 14:23:22 +0200 Subject: [Cython] [cython-users] GSoC 2012 In-Reply-To: References: <4F58B3DA.5070602@behnel.de> <4F5C5D62.20003@behnel.de> <4F7989D9.5030508@behnel.de> Message-ID: <4F799A3A.80006@behnel.de> Vitja Makarov, 02.04.2012 14:14: > 2012/4/2 Stefan Behnel: >> Vitja Makarov, 11.03.2012 09:51: >>> 2012/3/11 Stefan Behnel: >>>> mark florisson, 11.03.2012 07:44: >>>>> - better type inference, that would be enabled by default and again >>>>> handle thing like reassignments of variables and fallbacks to the >>>>> default object type. With entry caching Cython could build a database >>>>> of types ((extension) classes, functions, variables) used in the >>>>> modules and functions that are compiled (also def functions), and >>>>> infer the types used and specialize on those. Maybe a switch should be >>>>> added to cython to handle circular dependencies, or maybe with the >>>>> distutils preprocessing it can run all the type inference first and >>>>> keep track of unresolved entries, and try to fill those in after >>>>> building the database. For bonus points the user can be allowed to >>>>> write plugins to aid the process. >>>> >>>> That would be my favourite. We definitely need control flow driven type >>>> inference, local type specialisation, variable renaming, etc. Maybe even >>>> whole program (or at least module) analysis, like ShedSkin and PyPy do for >>>> their restricted Python dialects. Any serious step towards that goal would >>>> be a good outcome of a GSoC. >>> >>> I think we should be careful here and try to avoid making Cython code >>> more complicated. >> >> I agree that WPA is probably way out of scope. However, control flow driven >> type inference would allow us to infer the type of a variable in a given >> block, e.g. for code like this: >> >> if isinstance(x, list): >> ... >> else: >> ... >> >> or handle cases like this: >> >> def test(x): >> x = list(x) >> # ... do read-only stuff with x below this point ... >> >> Here, we currently infer that x is an unknown object that is being assigned >> to twice, even though it's obviously a list in all interesting parts of the >> function. >> > > What to do if an entry is of PyObject type in some block and of some > C-type in another? > > Should it be split into two different entries? Yes, that's what I meant with "variable renaming". I admit that I have no idea how complex that would be, though... Stefan From stefan_ml at behnel.de Tue Apr 3 13:59:56 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 03 Apr 2012 13:59:56 +0200 Subject: [Cython] class optimisations (Re: [cython-users] How to pass Cython flags from Distutils?) In-Reply-To: References: <4F683927.10408@mnw-scan.com> <4F798797.2070807@behnel.de> <4F7A6813.3080709@mnw-scan.com> <4F7A921D.1010700@behnel.de> Message-ID: <4F7AE63C.6090805@behnel.de> [moving this discussion from cython-users to cython-devel] Robert Bradshaw, 03.04.2012 09:43: > On Mon, Apr 2, 2012 at 11:01 PM, Stefan Behnel wrote: >> Robert Bradshaw, 03.04.2012 07:51: >>> auto_cpdef is expiremental >> >> Is that another word for "deprecated"? > > No, it's another word for "incomplete." Ah, just a typo then. > Can something be deprecated if > it was never even finished? It's probably something we should > eventually do by default as an optimization, at least for methods, as > well as letting compiled classes become cdef classes (minus the > semantic idiosyncrasies) whenever possible (can we always detect this? We can at least start with the "obviously safe" cases, assuming we find any. A "__slots__" field would be a good indicator, for example. And when we get extension types to have a __dict__, that should fix a lot of the differences already. > What about subclasses that want to multiply-inherit? You can inherit from multiple extension types in a Python type, and classes with more than one parent aren't candidates anyway. So this doesn't restrict us. > It may still be > an option worth finishing up, making it easy to automatically take a > (slightly-incompatible) step towards static binding. Yes, the option could be extended to include classes at some point, before we go for more automatic default optimisations. That would keep the changes explicit at the beginning. Stefan From wesmckinn at gmail.com Tue Apr 3 23:18:41 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 3 Apr 2012 17:18:41 -0400 Subject: [Cython] Bug report with 0.16 RC Message-ID: I don't have a Trac account yet, but wanted to report this bug with the 0.16 RC. This function worked fine under 0.15.1: @cython.wraparound(False) @cython.boundscheck(False) def is_lexsorted(list list_of_arrays): cdef: int i Py_ssize_t n, nlevels int32_t k, cur, pre ndarray arr nlevels = len(list_of_arrays) n = len(list_of_arrays[0]) cdef int32_t **vecs = malloc(nlevels * sizeof(int32_t*)) for i from 0 <= i < nlevels: vecs[i] = ( list_of_arrays[i]).data # assume uniqueness?? for i from 1 <= i < n: for k from 0 <= k < nlevels: cur = vecs[k][i] pre = vecs[k][i-1] if cur == pre: continue elif cur > pre: break else: return False free(vecs) return True gives this error: python setup.py build_ext --inplace running build_ext cythoning pandas/src/tseries.pyx to pandas/src/tseries.c Error compiling Cython file: ------------------------------------------------------------ ... nlevels = len(list_of_arrays) n = len(list_of_arrays[0]) cdef int32_t **vecs = malloc(nlevels * sizeof(int32_t*)) for i from 0 <= i < nlevels: vecs[i] = ( list_of_arrays[i]).data ^ ------------------------------------------------------------ pandas/src/groupby.pyx:120:59: Compiler crash in AnalyseExpressionsTransform ModuleNode.body = StatListNode(tseries.pyx:1:0) StatListNode.stats[52] = StatListNode(groupby.pyx:4:0) StatListNode.stats[6] = CompilerDirectivesNode(groupby.pyx:109:0) CompilerDirectivesNode.body = StatListNode(groupby.pyx:109:0) StatListNode.stats[0] = DefNode(groupby.pyx:109:0, modifiers = [...]/0, name = u'is_lexsorted', num_required_args = 1, py_wrapper_required = True, reqd_kw_flags_cname = '0', used = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(groupby.pyx:110:4, is_terminator = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(groupby.pyx:119:4) File 'Nodes.py', line 6054, in analyse_expressions: ForFromStatNode(groupby.pyx:119:4, relation1 = u'<=', relation2 = u'<') File 'Nodes.py', line 342, in analyse_expressions: StatListNode(groupby.pyx:120:18) File 'Nodes.py', line 4778, in analyse_expressions: SingleAssignmentNode(groupby.pyx:120:18) File 'Nodes.py', line 4883, in analyse_types: SingleAssignmentNode(groupby.pyx:120:18) File 'ExprNodes.py', line 7079, in analyse_types: TypecastNode(groupby.pyx:120:18, result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4274, in analyse_types: AttributeNode(groupby.pyx:120:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: AttributeNode(groupby.pyx:120:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4436, in analyse_attribute: AttributeNode(groupby.pyx:120:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) Compiler crash traceback from this point on: File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", line 4436, in analyse_attribute replacement_node = numpy_transform_attribute_node(self) File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", line 18, in numpy_transform_attribute_node numpy_pxd_scope = node.obj.entry.type.scope.parent_scope AttributeError: 'TypecastNode' object has no attribute 'entry' From stefan_ml at behnel.de Mon Apr 9 09:16:10 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 09 Apr 2012 09:16:10 +0200 Subject: [Cython] Cython on PyPy is (mostly) functional Message-ID: <4F828CBA.7030805@behnel.de> Hi, Cython is now mostly functional on the latest PyPy nightly builds. https://sage.math.washington.edu:8091/hudson/job/cython-scoder-pypy-nightly/ There are still crashers and a number of tests are disabled for that reason, but the number of passing tests makes it fair to consider it usable (if it works, it works). Most of the failing tests are due to bugs in PyPy's cpyext (the C-API compatibility layer), and most of the crashers as well. Some doctests just fail due to different exception messages, PyPy has a famous history of that. Also, basically any test for __dealloc__() methods is bound to fail because PyPy's garbage collector has no way of making sure that they have been called at a given point. Still, it's worth taking another look through the test results, because Cython can sometimes work around problems in cpyext more easily than it would be to really fix them on PyPy side. One major source of problems are borrowed references, because PyPy cannot easily guarantee that they stay alive in C space when all owned references are in Python space. Their memory management can move objects around, for example, and cpyext can't block that because it can't know when a borrowed reference dies. That means that something as ubiquitous in Cython as PyTuple_GET_ITEM() may not always work well, and is also far from being as fast in cpyext as in CPython. The crashers can be seen in a forked complete test run in addition to the stripped test job above: https://sage.math.washington.edu:8091/hudson/job/cython-scoder-pypy-nightly-safe/lastBuild/consoleFull Interestingly, specifically the new features, i.e. memory views and fused functions, current account for a number of crashes. Likely not that hard to fix on our side, but needs investigation. I put up a pull request with the current changes: https://github.com/cython/cython/pull/110 Nothing controversial, really, but worth looking through to see what kind of problems to expect when writing code for PyPy. Once these are merged, I'll copy over my PyPy test jobs in Jenkins to the cython-devel jobs. That means that you should start taking a look at those after pushing your changes and get used to either writing your code in a portable way or providing a fallback code path for PyPy. Use the CYTHON_COMPILING_IN_(PYPY|CPYTHON) macros for that. At some point, it'll become interesting to revisit these specialisations and to start benchmarking them in PyPy. However, given that PyPy's entire cpyext hasn't received any major optimisation yet, it'd be a waste of time to optimise for it on our side right now. The emphasis is clearly on making it safely work at all. Stefan From dalcinl at gmail.com Tue Apr 10 20:32:37 2012 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 10 Apr 2012 21:32:37 +0300 Subject: [Cython] never used numpy.pxd, but now my code is failing Message-ID: Is there any way to disable special-casing of numpy arrays? IMHO, if I'm not using Cython's numpy.pxd file, Cython should let me decide how to manage the beast. Error compiling Cython file: ------------------------------------------------------------ ... if ((nm != PyArray_DIM(aj, 0)) or (nm != PyArray_DIM(av, 0)) or (si*bs * sj*bs != sv)): raise ValueError( ("input arrays have incompatible shapes: " "rows.shape=%s, cols.shape=%s, vals.shape=%s") % (ai.shape, aj.shape, av.shape)) ^ ------------------------------------------------------------ PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From d.s.seljebotn at astro.uio.no Tue Apr 10 21:52:39 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 21:52:39 +0200 Subject: [Cython] never used numpy.pxd, but now my code is failing In-Reply-To: References: Message-ID: <4F848F87.1090608@astro.uio.no> On 04/10/2012 08:32 PM, Lisandro Dalcin wrote: > Is there any way to disable special-casing of numpy arrays? IMHO, if > I'm not using Cython's numpy.pxd file, Cython should let me decide how > to manage the beast. > > > Error compiling Cython file: > ------------------------------------------------------------ > ... > if ((nm != PyArray_DIM(aj, 0)) or > (nm != PyArray_DIM(av, 0)) or > (si*bs * sj*bs != sv)): raise ValueError( > ("input arrays have incompatible shapes: " > "rows.shape=%s, cols.shape=%s, vals.shape=%s") % > (ai.shape, aj.shape, av.shape)) > ^ > ------------------------------------------------------------ > > PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object > Whoops, sorry about that. I patched on yet another hack here: https://github.com/dagss/cython/commit/6f2271d2b3390d869a53d15b2b70769df029b218 Even if there's been a lot of trouble with these hacks I hope it can still go in; it is important in order to keep a significant part of the Cython userbase happy. Dag From d.s.seljebotn at astro.uio.no Tue Apr 10 21:53:14 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 21:53:14 +0200 Subject: [Cython] never used numpy.pxd, but now my code is failing In-Reply-To: <4F848F87.1090608@astro.uio.no> References: <4F848F87.1090608@astro.uio.no> Message-ID: <4F848FAA.6080205@astro.uio.no> On 04/10/2012 09:52 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 08:32 PM, Lisandro Dalcin wrote: >> Is there any way to disable special-casing of numpy arrays? IMHO, if >> I'm not using Cython's numpy.pxd file, Cython should let me decide how >> to manage the beast. >> >> >> Error compiling Cython file: >> ------------------------------------------------------------ >> ... >> if ((nm != PyArray_DIM(aj, 0)) or >> (nm != PyArray_DIM(av, 0)) or >> (si*bs * sj*bs != sv)): raise ValueError( >> ("input arrays have incompatible shapes: " >> "rows.shape=%s, cols.shape=%s, vals.shape=%s") % >> (ai.shape, aj.shape, av.shape)) >> ^ >> ------------------------------------------------------------ >> >> PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object >> > > Whoops, sorry about that. I patched on yet another hack here: > > https://github.com/dagss/cython/commit/6f2271d2b3390d869a53d15b2b70769df029b218 BTW, that's the _numpy branch. Dag > > > Even if there's been a lot of trouble with these hacks I hope it can > still go in; it is important in order to keep a significant part of the > Cython userbase happy. From dalcinl at gmail.com Tue Apr 10 23:41:46 2012 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 11 Apr 2012 00:41:46 +0300 Subject: [Cython] never used numpy.pxd, but now my code is failing In-Reply-To: <4F848FAA.6080205@astro.uio.no> References: <4F848F87.1090608@astro.uio.no> <4F848FAA.6080205@astro.uio.no> Message-ID: On 10 April 2012 22:53, Dag Sverre Seljebotn wrote: > On 04/10/2012 09:52 PM, Dag Sverre Seljebotn wrote: >> >> On 04/10/2012 08:32 PM, Lisandro Dalcin wrote: >>> >>> Is there any way to disable special-casing of numpy arrays? IMHO, if >>> I'm not using Cython's numpy.pxd file, Cython should let me decide how >>> to manage the beast. >>> >>> >>> Error compiling Cython file: >>> ------------------------------------------------------------ >>> ... >>> if ((nm != PyArray_DIM(aj, 0)) or >>> (nm != PyArray_DIM(av, 0)) or >>> (si*bs * sj*bs != sv)): raise ValueError( >>> ("input arrays have incompatible shapes: " >>> "rows.shape=%s, cols.shape=%s, vals.shape=%s") % >>> (ai.shape, aj.shape, av.shape)) >>> ^ >>> ------------------------------------------------------------ >>> >>> PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object >>> >> >> Whoops, sorry about that. I patched on yet another hack here: >> >> >> https://github.com/dagss/cython/commit/6f2271d2b3390d869a53d15b2b70769df029b218 > > > BTW, that's the _numpy branch. > The fix worked for me. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From markflorisson88 at gmail.com Wed Apr 11 17:19:40 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 11 Apr 2012 16:19:40 +0100 Subject: [Cython] NumPy dependency in Jenkins builds In-Reply-To: <4F6CD389.6000808@behnel.de> References: <4F6C880D.6010208@behnel.de> <4F6CD389.6000808@behnel.de> Message-ID: On 23 March 2012 19:48, Stefan Behnel wrote: > Stefan Behnel, 23.03.2012 15:26: > > mark florisson, 23.03.2012 14:26: > >> This may be OT for this thread > > > > ... in which case it's quite common to start a new one ... > > > >> but sas numpy removed at some point from Jenkins? I'm seeing this for > >> all python versions since Februari 25: > >> > >> Following tests excluded because of missing dependencies on your > >> system: > >> run.memoryviewattrs > >> run.numpy_ValueError_T172 > >> run.numpy_bufacc_T155 > >> run.numpy_cimport > >> run.numpy_memoryview > >> run.numpy_parallel > >> run.numpy_test > >> ALL DONE > > > > May be my fault. I think when I unified the build jobs, I might have > > disabled it because we didn't have a NumPy version for Py3 at the time. > > > > I'll look into it. > > I've re-enabled them for all CPython builds except for the latest py3k > branch. NumPy 1.6.1 doesn't compile there due to the new Unicode buffer > layout (PEP 383). > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > How do I enable numpy in my own build configuration? Should I re-clone the cython-devel ones? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Apr 11 19:04:31 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 11 Apr 2012 19:04:31 +0200 Subject: [Cython] NumPy dependency in Jenkins builds In-Reply-To: References: <4F6C880D.6010208@behnel.de> <4F6CD389.6000808@behnel.de> Message-ID: <4F85B99F.6010107@behnel.de> mark florisson, 11.04.2012 17:19: > On 23 March 2012 19:48, Stefan Behnel wrote: >> Stefan Behnel, 23.03.2012 15:26: >>> mark florisson, 23.03.2012 14:26: >>>> Following tests excluded because of missing dependencies on your >>>> system: >>>> run.memoryviewattrs >>>> run.numpy_ValueError_T172 >>>> run.numpy_bufacc_T155 >>>> run.numpy_cimport >>>> run.numpy_memoryview >>>> run.numpy_parallel >>>> run.numpy_test >>>> ALL DONE >>> >>> May be my fault. I think when I unified the build jobs, I might have >>> disabled it because we didn't have a NumPy version for Py3 at the time. >>> >>> I'll look into it. >> >> I've re-enabled them for all CPython builds except for the latest py3k >> branch. NumPy 1.6.1 doesn't compile there due to the new Unicode buffer >> layout (PEP 383). > > How do I enable numpy in my own build configuration? Should I re-clone the > cython-devel ones? You can do that, yes. Alternatively, just write "pyXY-ext" instead of "pyXY" in the PYVERSION axis. Doing that in the test jobs is enough, the build jobs don't need it. Stefan From markflorisson88 at gmail.com Thu Apr 12 14:08:00 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 12 Apr 2012 13:08:00 +0100 Subject: [Cython] NumPy dependency in Jenkins builds In-Reply-To: <4F85B99F.6010107@behnel.de> References: <4F6C880D.6010208@behnel.de> <4F6CD389.6000808@behnel.de> <4F85B99F.6010107@behnel.de> Message-ID: On 11 April 2012 18:04, Stefan Behnel wrote: > mark florisson, 11.04.2012 17:19: >> On 23 March 2012 19:48, Stefan Behnel wrote: >>> Stefan Behnel, 23.03.2012 15:26: >>>> mark florisson, 23.03.2012 14:26: >>>>> Following tests excluded because of missing dependencies on your >>>>> system: >>>>> ? ?run.memoryviewattrs >>>>> ? ?run.numpy_ValueError_T172 >>>>> ? ?run.numpy_bufacc_T155 >>>>> ? ?run.numpy_cimport >>>>> ? ?run.numpy_memoryview >>>>> ? ?run.numpy_parallel >>>>> ? ?run.numpy_test >>>>> ALL DONE >>>> >>>> May be my fault. I think when I unified the build jobs, I might have >>>> disabled it because we didn't have a NumPy version for Py3 at the time. >>>> >>>> I'll look into it. >>> >>> I've re-enabled them for all CPython builds except for the latest py3k >>> branch. NumPy 1.6.1 doesn't compile there due to the new Unicode buffer >>> layout (PEP 383). >> >> How do I enable numpy in my own build configuration? Should I re-clone the >> cython-devel ones? > > You can do that, yes. Alternatively, just write "pyXY-ext" instead of > "pyXY" in the PYVERSION axis. Doing that in the test jobs is enough, the > build jobs don't need it. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Ok, thanks Stefan. From markflorisson88 at gmail.com Thu Apr 12 16:38:37 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 12 Apr 2012 15:38:37 +0100 Subject: [Cython] Cython 0.16 RC 1 Message-ID: Yet another release candidate, this will hopefully be the last before the 0.16 release. You can grab it from here: http://wiki.cython.org/ReleaseNotes-0.16 There were several fixes for the numpy attribute rewrite, memoryviews and fused types. Accessing the 'base' attribute of a typed ndarray now goes through the object layer, which means direct assignment is no longer supported. If there are any problems, please let us know. From markflorisson88 at gmail.com Thu Apr 12 20:21:24 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 12 Apr 2012 19:21:24 +0100 Subject: [Cython] pyregr test suite Message-ID: Hey, Could we run the pyregr test suite manually instead of automatically? It takes a lot of resources to build, and a single simple push to the cython-devel branch results in the build slots being hogged for hours, making the continuous development a lot less 'continuous'. We could just decide to run the pyregr suite every so often, or whenever we make an addition or change that could actually affect Python code (if one updates a test then there is no use in running pyregr for instance). Mark From robertwb at gmail.com Thu Apr 12 22:21:11 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 12 Apr 2012 13:21:11 -0700 Subject: [Cython] pyregr test suite In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 11:21 AM, mark florisson wrote: > Hey, > > Could we run the pyregr test suite manually instead of automatically? > It takes a lot of resources to build, and a single simple push to the > cython-devel branch results in the build slots being hogged for hours, > making the continuous development a lot less 'continuous'. We could > just decide to run the pyregr suite every so often, or whenever we > make an addition or change that could actually affect Python code (if > one updates a test then there is no use in running pyregr for > instance). +1 to manual + periodic for these tests. Alternatively we could make them depend on each other, so at most one core is consumed. - Robert From wesmckinn at gmail.com Thu Apr 12 23:00:29 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 12 Apr 2012 17:00:29 -0400 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 10:38 AM, mark florisson wrote: > Yet another release candidate, this will hopefully be the last before > the 0.16 release. You can grab it from here: > http://wiki.cython.org/ReleaseNotes-0.16 > > There were several fixes for the numpy attribute rewrite, memoryviews > and fused types. Accessing the 'base' attribute of a typed ndarray now > goes through the object layer, which means direct assignment is no > longer supported. > > If there are any problems, please let us know. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I'm unable to build pandas using git master Cython. I just released pandas 0.7.3 today which has no issues at all with 0.15.1: http://pypi.python.org/pypi/pandas For example: 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace running build_ext cythoning pandas/src/tseries.pyx to pandas/src/tseries.c Error compiling Cython file: ------------------------------------------------------------ ... self.store = {} ptr = malloc(self.depth * sizeof(int32_t*)) for i in range(self.depth): ptr[i] = ( label_arrays[i]).data ^ ------------------------------------------------------------ pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform ModuleNode.body = StatListNode(tseries.pyx:1:0) StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, as_name = u'MultiMap', class_name = u'MultiMap', doc = u'\n Need to come up with a better data structure for multi-level indexing\n ', module_name = u'', visibility = u'private') CClassDefNode.body = StatListNode(tseries.pyx:91:4) StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) StatListNode.stats[0] = DefNode(tseries.pyx:95:4, modifiers = [...]/0, name = u'__init__', num_required_args = 2, py_wrapper_required = True, reqd_kw_flags_cname = '0', used = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:96:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:106:8) File 'Nodes.py', line 5903, in analyse_expressions: ForInStatNode(tseries.pyx:106:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:107:21) File 'Nodes.py', line 4767, in analyse_expressions: SingleAssignmentNode(tseries.pyx:107:21) File 'Nodes.py', line 4872, in analyse_types: SingleAssignmentNode(tseries.pyx:107:21) File 'ExprNodes.py', line 7082, in analyse_types: TypecastNode(tseries.pyx:107:21, result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4274, in analyse_types: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4436, in analyse_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) Compiler crash traceback from this point on: File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", line 4436, in analyse_attribute replacement_node = numpy_transform_attribute_node(self) File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", line 18, in numpy_transform_attribute_node numpy_pxd_scope = node.obj.entry.type.scope.parent_scope AttributeError: 'TypecastNode' object has no attribute 'entry' building 'pandas._tseries' extension creating build creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/pandas creating build/temp.linux-x86_64-2.7/pandas/src gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o build/temp.linux-x86_64-2.7/pandas/src/tseries.o pandas/src/tseries.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation. error: command 'gcc' failed with exit status 1 ----- I kludged this particular line in the pandas/timeseries branch so it will build on git master Cython, but I was treated to dozens of failures, errors, and finally a segfault in the middle of the test suite. Suffice to say I'm not sure I would advise you to release the library in its current state until all of this is resolved. Happy to help however I can but I'm back to 0.15.1 for now. - Wes From d.s.seljebotn at astro.uio.no Thu Apr 12 23:32:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 12 Apr 2012 23:32:11 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References:

Message-ID: <4F8749DB.2080403@astro.uio.no> On 04/12/2012 11:00 PM, Wes McKinney wrote: > On Thu, Apr 12, 2012 at 10:38 AM, mark florisson > wrote: >> Yet another release candidate, this will hopefully be the last before >> the 0.16 release. You can grab it from here: >> http://wiki.cython.org/ReleaseNotes-0.16 >> >> There were several fixes for the numpy attribute rewrite, memoryviews >> and fused types. Accessing the 'base' attribute of a typed ndarray now >> goes through the object layer, which means direct assignment is no >> longer supported. >> >> If there are any problems, please let us know. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > I'm unable to build pandas using git master Cython. I just released > pandas 0.7.3 today which has no issues at all with 0.15.1: It is no surprise that master doesn't work. Can you try again with the "release" branch? (We should obviously start to tell people which git branch to fetch in addition to the tarball. And perhaps create a "devel" branch and let master be betas and release candidates.) Dag From d.s.seljebotn at astro.uio.no Fri Apr 13 00:11:27 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 00:11:27 +0200 Subject: [Cython] CEP1000: Native dispatch through callables Message-ID: <4F87530F.7050000@astro.uio.no> Travis Oliphant recently raised the issue on the NumPy list of what mechanisms to use to box native functions produced by his Numba so that SciPy functions can call it, e.g. (I'm making the numba part up): @numba # Compiles function using LLVM def f(x): return 3 * x print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! Obviously, we want something standard, so that Cython functions can also be called in a fast way. This is very similar to CEP 523 (http://wiki.cython.org/enhancements/nativecall), but rather than Cython-to-Cython, we want something that both SciPy, NumPy, numba, Cython, f2py, fwrap can implement. Here's my proposal; Travis seems happy to implement something like it for numba and parts of SciPy: http://wiki.cython.org/enhancements/nativecall Obviously this is (in a modified form) PEP-material, but I think it is much better to just get it working with a nice range of tools first (makes the PEP application stronger as well). Feedback most welcome! Dag From d.s.seljebotn at astro.uio.no Fri Apr 13 00:34:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 00:34:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87530F.7050000@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> Message-ID: <4F875867.3070401@astro.uio.no> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: > Travis Oliphant recently raised the issue on the NumPy list of what > mechanisms to use to box native functions produced by his Numba so that > SciPy functions can call it, e.g. (I'm making the numba part up): > > @numba # Compiles function using LLVM > def f(x): > return 3 * x > > print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! > > Obviously, we want something standard, so that Cython functions can also > be called in a fast way. > > This is very similar to CEP 523 > (http://wiki.cython.org/enhancements/nativecall), but rather than > Cython-to-Cython, we want something that both SciPy, NumPy, numba, > Cython, f2py, fwrap can implement. > > Here's my proposal; Travis seems happy to implement something like it > for numba and parts of SciPy: > > http://wiki.cython.org/enhancements/nativecall I'm sorry. HERE is the CEP: http://wiki.cython.org/enhancements/cep1000 Since writing that yesterday, I've moved more in the direction of wanting a zero-terminated list of overloads instead of providing a count, and have the fast protocol jump over the header (since version is available elsewhere), and just demand that the structure is sizeof(void*)-aligned in the first place rather than the complicated padding. Dag From robertwb at gmail.com Fri Apr 13 01:38:33 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 12 Apr 2012 16:38:33 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F875867.3070401@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> Message-ID: On Thu, Apr 12, 2012 at 3:34 PM, Dag Sverre Seljebotn wrote: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so that >> SciPy functions can call it, e.g. (I'm making the numba part up): >> >> @numba # Compiles function using LLVM >> def f(x): >> return 3 * x >> >> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >> >> Obviously, we want something standard, so that Cython functions can also >> be called in a fast way. >> >> This is very similar to CEP 523 >> (http://wiki.cython.org/enhancements/nativecall), but rather than >> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >> Cython, f2py, fwrap can implement. >> >> Here's my proposal; Travis seems happy to implement something like it >> for numba and parts of SciPy: >> >> http://wiki.cython.org/enhancements/nativecall > > > I'm sorry. HERE is the CEP: > > http://wiki.cython.org/enhancements/cep1000 > > Since writing that yesterday, I've moved more in the direction of wanting a > zero-terminated list of overloads instead of providing a count, and have the > fast protocol jump over the header (since version is available elsewhere), > and just demand that the structure is sizeof(void*)-aligned in the first > place rather than the complicated padding. Great idea to coordinate with the many other projects here. Eventually this could maybe even be a PEP. Somewhat related, I'd like to add support for Go-style interfaces. These would essentially be vtables of pre-fetched function pointers, and could play very nicely with this interface. Have you given any thought as to what happens if __call__ is re-assigned for an object (or subclass of an object) supporting this interface? Or is this out of scope? Minor nit: I don't think should_dereference is worth branching on, if one wants to save the allocation one can still use a variable-sized type and point to oneself. Yes, that's an extra dereference, but the memory is already likely close and it greatly simplifies the logic. But I could be wrong here. Also, I'm not sure the type registration will scale, especially if every callable type wanted to get registered. (E.g. currently closures and generators are new types...) Where to draw the line? (Perhaps things could get registered lazily on the first __nativecall__ lookup, as they're likely to be looked up again?) - Robert From stefan_ml at behnel.de Fri Apr 13 07:11:23 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:11:23 +0200 Subject: [Cython] pyregr test suite In-Reply-To: References: Message-ID: <4F87B57B.6000807@behnel.de> Robert Bradshaw, 12.04.2012 22:21: > On Thu, Apr 12, 2012 at 11:21 AM, mark florisson wrote: >> Could we run the pyregr test suite manually instead of automatically? >> It takes a lot of resources to build, and a single simple push to the >> cython-devel branch results in the build slots being hogged for hours, >> making the continuous development a lot less 'continuous'. We could >> just decide to run the pyregr suite every so often, or whenever we >> make an addition or change that could actually affect Python code (if >> one updates a test then there is no use in running pyregr for >> instance). > > +1 to manual + periodic for these tests. Alternatively we could make > them depend on each other, so at most one core is consumed. Ok, I'll set it up. Stefan From stefan_ml at behnel.de Fri Apr 13 07:17:37 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:17:37 +0200 Subject: [Cython] pyregr test suite In-Reply-To: References: Message-ID: <4F87B6F1.8090804@behnel.de> mark florisson, 12.04.2012 20:21: > Could we run the pyregr test suite manually instead of automatically? > It takes a lot of resources to build, and a single simple push to the > cython-devel branch results in the build slots being hogged for hours, Careful here. It takes a lot of time, yes, but the reason it currently takes ages is that the Py3k tests have started to hang in the test_tempfile tests at some point and don't terminate any more. May be a bug in the tests or a problem on our side, don't know. Stefan From stefan_ml at behnel.de Fri Apr 13 07:22:52 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:22:52 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <4F8749DB.2080403@astro.uio.no> References:

<4F8749DB.2080403@astro.uio.no> Message-ID: <4F87B82C.3060405@behnel.de> Dag Sverre Seljebotn, 12.04.2012 23:32: > (We should obviously start to tell people which git branch to fetch in > addition to the tarball. +1 > And perhaps create a "devel" branch and let master > be betas and release candidates.) -1 I think we should just always merge release branches back into the master, especially when we make a release. Alternatively, we could start naming release branches "Cython-0.16" etc. and leave them open - but I find the way we currently do it ok. A tag is usually better than an open branch for the way we make our releases. Stefan From stefan_ml at behnel.de Fri Apr 13 07:24:38 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:24:38 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F875867.3070401@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> Message-ID: <4F87B896.5050000@behnel.de> Dag Sverre Seljebotn, 13.04.2012 00:34: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so that >> SciPy functions can call it, e.g. (I'm making the numba part up): >> >> @numba # Compiles function using LLVM >> def f(x): >> return 3 * x >> >> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >> >> Obviously, we want something standard, so that Cython functions can also >> be called in a fast way. >> >> This is very similar to CEP 523 >> (http://wiki.cython.org/enhancements/nativecall), but rather than >> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >> Cython, f2py, fwrap can implement. >> >> Here's my proposal; Travis seems happy to implement something like it >> for numba and parts of SciPy: >> >> http://wiki.cython.org/enhancements/nativecall > > I'm sorry. HERE is the CEP: > > http://wiki.cython.org/enhancements/cep1000 Some general remarks: I'm all for doing something in this direction and have been hinting at it on the PyPy mailing list for a while, without reaction so far. I'll trigger them again, with a pointer to this discussion and the CEP. PyPy should be totally interested in a generic way to do fast calls into wrapped C code in general and Cython implemented functions specifically. Their JIT would then look at the function at runtime and unwrap it. There's PEP 362 which proposes a Signature object. It seems to have attracted some interest lately and Guido seems to like it also. I think we should come up with a way to add a C level interface to that, instead of designing something entirely separate. http://www.python.org/dev/peps/pep-0362/ Stefan From d.s.seljebotn at astro.uio.no Fri Apr 13 10:52:07 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 10:52:07 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> Message-ID: <4F87E937.9050705@astro.uio.no> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: > On Thu, Apr 12, 2012 at 3:34 PM, Dag Sverre Seljebotn > wrote: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so that >>> SciPy functions can call it, e.g. (I'm making the numba part up): >>> >>> @numba # Compiles function using LLVM >>> def f(x): >>> return 3 * x >>> >>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>> >>> Obviously, we want something standard, so that Cython functions can also >>> be called in a fast way. >>> >>> This is very similar to CEP 523 >>> (http://wiki.cython.org/enhancements/nativecall), but rather than >>> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >>> Cython, f2py, fwrap can implement. >>> >>> Here's my proposal; Travis seems happy to implement something like it >>> for numba and parts of SciPy: >>> >>> http://wiki.cython.org/enhancements/nativecall >> >> >> I'm sorry. HERE is the CEP: >> >> http://wiki.cython.org/enhancements/cep1000 >> >> Since writing that yesterday, I've moved more in the direction of wanting a >> zero-terminated list of overloads instead of providing a count, and have the >> fast protocol jump over the header (since version is available elsewhere), >> and just demand that the structure is sizeof(void*)-aligned in the first >> place rather than the complicated padding. > > Great idea to coordinate with the many other projects here. Eventually > this could maybe even be a PEP. > > Somewhat related, I'd like to add support for Go-style interfaces. > These would essentially be vtables of pre-fetched function pointers, > and could play very nicely with this interface. Yep; but you agree that this can be done in isolation without considering vtables first? > Have you given any thought as to what happens if __call__ is > re-assigned for an object (or subclass of an object) supporting this > interface? Or is this out of scope? Out-of-scope, I'd say. Though you can always write an object that detects if you assign to __call__... > Minor nit: I don't think should_dereference is worth branching on, if > one wants to save the allocation one can still use a variable-sized > type and point to oneself. Yes, that's an extra dereference, but the > memory is already likely close and it greatly simplifies the logic. > But I could be wrong here. Those minor nits are exactly what I seek; since Travis will have the first implementation in numba<->SciPy, I just want to make sure that what he does will work efficiently work Cython. Can we perhaps just require that the information is embedded in the object? I must admit that when I wrote that I was mostly thinking of JIT-style code generation, where you only use should_dereference for code-generation. But yes, by converting the table to a C structure you can do without a JIT. > > Also, I'm not sure the type registration will scale, especially if > every callable type wanted to get registered. (E.g. currently closures > and generators are new types...) Where to draw the line? (Perhaps > things could get registered lazily on the first __nativecall__ lookup, > as they're likely to be looked up again?) Right... if we do some work to synchronize the types for Cython modules generated by the same version of Cython, we're left with 3-4 types for Cython, right? Then a couple for numba and one for f2py; so on the order of 10? An alternative is do something funny in the type object to get across the offset-in-object information (abusing the docstring, or introduce our own flag which means that the type object has an additional non-standard field at the end). Dag From d.s.seljebotn at astro.uio.no Fri Apr 13 11:13:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 11:13:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87B896.5050000@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> Message-ID: <4F87EE2B.20702@astro.uio.no> On 04/13/2012 07:24 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 13.04.2012 00:34: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so that >>> SciPy functions can call it, e.g. (I'm making the numba part up): >>> >>> @numba # Compiles function using LLVM >>> def f(x): >>> return 3 * x >>> >>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>> >>> Obviously, we want something standard, so that Cython functions can also >>> be called in a fast way. >>> >>> This is very similar to CEP 523 >>> (http://wiki.cython.org/enhancements/nativecall), but rather than >>> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >>> Cython, f2py, fwrap can implement. >>> >>> Here's my proposal; Travis seems happy to implement something like it >>> for numba and parts of SciPy: >>> >>> http://wiki.cython.org/enhancements/nativecall >> >> I'm sorry. HERE is the CEP: >> >> http://wiki.cython.org/enhancements/cep1000 > > Some general remarks: > > I'm all for doing something in this direction and have been hinting at it > on the PyPy mailing list for a while, without reaction so far. I'll trigger > them again, with a pointer to this discussion and the CEP. PyPy should be > totally interested in a generic way to do fast calls into wrapped C code in > general and Cython implemented functions specifically. Their JIT would then > look at the function at runtime and unwrap it. > > There's PEP 362 which proposes a Signature object. It seems to have > attracted some interest lately and Guido seems to like it also. I think we > should come up with a way to add a C level interface to that, instead of > designing something entirely separate. > > http://www.python.org/dev/peps/pep-0362/ Well, provided that you still want an efficient representation that can be strcmp-ed in dispatch codes, this seems to boil down to using a Signature object rather than a capsule (with a C interface), and store it in __signature__ rather than __fastcall__, and perhaps provide a slot in the type object for a function returning it. I really think the right approach is to prove the concept outside of the standardization process first; a) by the time a PEP would be accepted it will have been years since Travis had time to work on this, b) as far as the slot in the type object goes, we're left with users on Python 2.4 today; a Python 3.4+ solution is not really a solution. Dag From stefan_ml at behnel.de Fri Apr 13 11:15:11 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 11:15:11 +0200 Subject: [Cython] PyPy sprint in Leipzig, June 22-27 (was: Re: CEP1000: Native dispatch through callables) In-Reply-To: <4F87B896.5050000@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> Message-ID: <4F87EE9F.4070205@behnel.de> Stefan Behnel, 13.04.2012 07:24: > Dag Sverre Seljebotn, 13.04.2012 00:34: >> http://wiki.cython.org/enhancements/cep1000 > > I'm all for doing something in this direction and have been hinting at it > on the PyPy mailing list for a while, without reaction so far. I'll trigger > them again, with a pointer to this discussion and the CEP. PyPy should be > totally interested in a generic way to do fast calls into wrapped C code in > general and Cython implemented functions specifically. Their JIT would then > look at the function at runtime and unwrap it. BTW, there will be a PyPy sprint in Leipzig from June 22-27. If anyone's interested in coordinating with PyPy on this and other topics, that might be a good place to go for a day or two. http://permalink.gmane.org/gmane.comp.python.pypy/9896 Stefan From robertwb at gmail.com Fri Apr 13 12:17:09 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 03:17:09 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87E937.9050705@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >> >> On Thu, Apr 12, 2012 at 3:34 PM, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>>> >>>> >>>> Travis Oliphant recently raised the issue on the NumPy list of what >>>> mechanisms to use to box native functions produced by his Numba so that >>>> SciPy functions can call it, e.g. (I'm making the numba part up): >>>> >>>> @numba # Compiles function using LLVM >>>> def f(x): >>>> return 3 * x >>>> >>>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>>> >>>> Obviously, we want something standard, so that Cython functions can also >>>> be called in a fast way. >>>> >>>> This is very similar to CEP 523 >>>> (http://wiki.cython.org/enhancements/nativecall), but rather than >>>> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >>>> Cython, f2py, fwrap can implement. >>>> >>>> Here's my proposal; Travis seems happy to implement something like it >>>> for numba and parts of SciPy: >>>> >>>> http://wiki.cython.org/enhancements/nativecall >>> >>> >>> >>> I'm sorry. HERE is the CEP: >>> >>> http://wiki.cython.org/enhancements/cep1000 >>> >>> Since writing that yesterday, I've moved more in the direction of wanting >>> a >>> zero-terminated list of overloads instead of providing a count, and have >>> the >>> fast protocol jump over the header (since version is available >>> elsewhere), >>> and just demand that the structure is sizeof(void*)-aligned in the first >>> place rather than the complicated padding. >> >> >> Great idea to coordinate with the many other projects here. Eventually >> this could maybe even be a PEP. >> >> Somewhat related, I'd like to add support for Go-style interfaces. >> These would essentially be vtables of pre-fetched function pointers, >> and could play very nicely with this interface. > > > Yep; but you agree that this can be done in isolation without considering > vtables first? Yes, for sure. >> Have you given any thought as to what happens if __call__ is >> re-assigned for an object (or subclass of an object) supporting this >> interface? Or is this out of scope? > > > Out-of-scope, I'd say. Though you can always write an object that detects if > you assign to __call__... > > >> Minor nit: I don't think should_dereference is worth branching on, if >> one wants to save the allocation one can still use a variable-sized >> type and point to oneself. Yes, that's an extra dereference, but the >> memory is already likely close and it greatly simplifies the logic. >> But I could be wrong here. > > > Those minor nits are exactly what I seek; since Travis will have the first > implementation in numba<->SciPy, I just want to make sure that what he does > will work efficiently work Cython. +1 I have to admit building/invoking these var-arg-sized __nativecall__ records seems painful. Here's another suggestion: struct { void* pointer; size_t signature; // compressed binary representation, 95% coverage char* long_signature; // used if signature is not representable in a size_t, as indicated by signature = 0 } record; These char* could optionally be allocated at the end of the record* for optimal locality. We could even dispense with the binary signature, but having that option allows us to avoid strcmp for stuff like d)d and ffi)f. > Can we perhaps just require that the information is embedded in the object? I think not, this would require variably-sized objects (and also use up the variable sized nature). Given that this is in a portion of the program that is iterating over a Python tuple, I think the extra deference here is non-consequential. > I must admit that when I wrote that I was mostly thinking of JIT-style code > generation, where you only use should_dereference for code-generation. But > yes, by converting the table to a C structure you can do without a JIT. > > >> >> Also, I'm not sure the type registration will scale, especially if >> every callable type wanted to get registered. (E.g. currently closures >> and generators are new types...) Where to draw the line? (Perhaps >> things could get registered lazily on the first __nativecall__ lookup, >> as they're likely to be looked up again?) > > > Right... if we do some work to synchronize the types for Cython modules > generated by the same version of Cython, we're left with 3-4 types for > Cython, right? Then a couple for numba and one for f2py; so on the order of > 10? No, I think each closure is its own type. > An alternative is do something funny in the type object to get across the > offset-in-object information (abusing the docstring, or introduce our own > flag which means that the type object has an additional non-standard field > at the end). It's a hack, but the flag + non-standard field idea might just work... Ah, don't you just love C :) - Robert From stefan_ml at behnel.de Fri Apr 13 11:35:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 11:35:18 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87EE2B.20702@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> <4F87EE2B.20702@astro.uio.no> Message-ID: <4F87F356.9040607@behnel.de> Dag Sverre Seljebotn, 13.04.2012 11:13: > On 04/13/2012 07:24 AM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 13.04.2012 00:34: >>> http://wiki.cython.org/enhancements/cep1000 >> >> There's PEP 362 which proposes a Signature object. It seems to have >> attracted some interest lately and Guido seems to like it also. I think we >> should come up with a way to add a C level interface to that, instead of >> designing something entirely separate. >> >> http://www.python.org/dev/peps/pep-0362/ > > Well, provided that you still want an efficient representation that can be > strcmp-ed in dispatch codes, this seems to boil down to using a Signature > object rather than a capsule (with a C interface), and store it in > __signature__ rather than __fastcall__, and perhaps provide a slot in the > type object for a function returning it. Basically, yes. I was just bringing it up because we should keep it in mind when designing a solution. Moving it into the Signature object would also allow C signature introspection from Python code, for example. It would obviously need a straight C level way to access it. I'm not sure it has to be a function, though. I would prefer a simple array of structs that map signature strings to function pointers. Like the PyMethodDef struct. > I really think the right approach is to prove the concept outside of the > standardization process first; a) by the time a PEP would be accepted it > will have been years since Travis had time to work on this, b) as far as > the slot in the type object goes, we're left with users on Python 2.4 > today; a Python 3.4+ solution is not really a solution. Sure. But nothing keeps us from backporting at least parts of it to older Pythons, like we did for so many other things. Stefan From stefan_ml at behnel.de Fri Apr 13 13:38:56 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 13:38:56 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> Message-ID: <4F881050.7000302@behnel.de> Robert Bradshaw, 13.04.2012 12:17: > On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>> Have you given any thought as to what happens if __call__ is >>> re-assigned for an object (or subclass of an object) supporting this >>> interface? Or is this out of scope? >> >> Out-of-scope, I'd say. Though you can always write an object that detects if >> you assign to __call__... +1 for out of scope. This is a pure C level feature. >>> Minor nit: I don't think should_dereference is worth branching on, if >>> one wants to save the allocation one can still use a variable-sized >>> type and point to oneself. Yes, that's an extra dereference, but the >>> memory is already likely close and it greatly simplifies the logic. >>> But I could be wrong here. >> >> >> Those minor nits are exactly what I seek; since Travis will have the first >> implementation in numba<->SciPy, I just want to make sure that what he does >> will work efficiently work Cython. > > +1 > > I have to admit building/invoking these var-arg-sized __nativecall__ > records seems painful. Here's another suggestion: > > struct { > void* pointer; > size_t signature; // compressed binary representation, 95% coverage > char* long_signature; // used if signature is not representable in > a size_t, as indicated by signature = 0 > } record; > > These char* could optionally be allocated at the end of the record* > for optimal locality. We could even dispense with the binary > signature, but having that option allows us to avoid strcmp for stuff > like d)d and ffi)f. Assuming we use literals and a const char* for the signature, the C compiler would cut down the number of signature strings automatically for us. And a pointer comparison is the same as a size_t comparison. That would only apply at a per-module level, though, so it would require an indirection for the signature IDs. But it would avoid a global registry. Another idea would be to set the signature ID field to 0 at the beginning and call a C-API function to let the current runtime assign an ID > 0, unique for the currently running application. Then every user would only have to parse the signature once to adapt to the respective ID and could otherwise branch based on it directly. For Cython, we could generate a static ID variable for each typed call that we found in the sources. When encountering a C signature on a callable, either a) the ID variable is still empty (initial case), then we parse the signature to see if it matches the expected signature. If it does, we assign the corresponding ID to the static ID variable and issue a direct call. If b) the ID field is already set (normal case), we compare the signature IDs directly and issue a C call it they match. If the IDs do not match, we issue a normal Python call. >> Right... if we do some work to synchronize the types for Cython modules >> generated by the same version of Cython, we're left with 3-4 types for >> Cython, right? Then a couple for numba and one for f2py; so on the order of >> 10? > > No, I think each closure is its own type. And that even applies to fused functions, right? They'd have one closure for each type combination. >> An alternative is do something funny in the type object to get across the >> offset-in-object information (abusing the docstring, or introduce our own >> flag which means that the type object has an additional non-standard field >> at the end). > > It's a hack, but the flag + non-standard field idea might just work... Plus, it wouldn't have to stay a non-standard field. If it's accepted into CPython 3.4, we could safely use it in all existing versions of CPython. Stefan From d.s.seljebotn at astro.uio.no Fri Apr 13 13:59:45 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 13:59:45 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881050.7000302@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> Message-ID: <4F881531.4090406@astro.uio.no> On 04/13/2012 01:38 PM, Stefan Behnel wrote: > Robert Bradshaw, 13.04.2012 12:17: >> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>> Have you given any thought as to what happens if __call__ is >>>> re-assigned for an object (or subclass of an object) supporting this >>>> interface? Or is this out of scope? >>> >>> Out-of-scope, I'd say. Though you can always write an object that detects if >>> you assign to __call__... > > +1 for out of scope. This is a pure C level feature. > > >>>> Minor nit: I don't think should_dereference is worth branching on, if >>>> one wants to save the allocation one can still use a variable-sized >>>> type and point to oneself. Yes, that's an extra dereference, but the >>>> memory is already likely close and it greatly simplifies the logic. >>>> But I could be wrong here. >>> >>> >>> Those minor nits are exactly what I seek; since Travis will have the first >>> implementation in numba<->SciPy, I just want to make sure that what he does >>> will work efficiently work Cython. >> >> +1 >> >> I have to admit building/invoking these var-arg-sized __nativecall__ >> records seems painful. Here's another suggestion: >> >> struct { >> void* pointer; >> size_t signature; // compressed binary representation, 95% coverage Once you start passing around functions that take memory view slices as arguments, that 95% estimate will be off I think. >> char* long_signature; // used if signature is not representable in >> a size_t, as indicated by signature = 0 >> } record; >> >> These char* could optionally be allocated at the end of the record* >> for optimal locality. We could even dispense with the binary >> signature, but having that option allows us to avoid strcmp for stuff >> like d)d and ffi)f. > > Assuming we use literals and a const char* for the signature, the C > compiler would cut down the number of signature strings automatically for > us. And a pointer comparison is the same as a size_t comparison. I'll go one further: Intern Python bytes objects. It's just a PyObject*, but it's *required* (or just strongly encouraged) to have gone through sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) Obviously in a PEP you'd have a C-API function for such interning (completely standalone utility). Performance of interning operation itself doesn't matter... Unless CPython has interning features itself, like in Java? Was that present back in the day and then ripped out? Requiring interning is somewhat less elegant in one way, but it makes a lot of other stuff much simpler. That gives us struct { void *pointer; PyBytesObject *signature; } record; and then you allocate a NULL-terminated arrays of these for all the overloads. > > That would only apply at a per-module level, though, so it would require an > indirection for the signature IDs. But it would avoid a global registry. > > Another idea would be to set the signature ID field to 0 at the beginning > and call a C-API function to let the current runtime assign an ID> 0, > unique for the currently running application. Then every user would only > have to parse the signature once to adapt to the respective ID and could > otherwise branch based on it directly. > > For Cython, we could generate a static ID variable for each typed call that > we found in the sources. When encountering a C signature on a callable, > either a) the ID variable is still empty (initial case), then we parse the > signature to see if it matches the expected signature. If it does, we > assign the corresponding ID to the static ID variable and issue a direct > call. If b) the ID field is already set (normal case), we compare the > signature IDs directly and issue a C call it they match. If the IDs do not > match, we issue a normal Python call. > > >>> Right... if we do some work to synchronize the types for Cython modules >>> generated by the same version of Cython, we're left with 3-4 types for >>> Cython, right? Then a couple for numba and one for f2py; so on the order of >>> 10? >> >> No, I think each closure is its own type. > > And that even applies to fused functions, right? They'd have one closure > for each type combination. > > >>> An alternative is do something funny in the type object to get across the >>> offset-in-object information (abusing the docstring, or introduce our own >>> flag which means that the type object has an additional non-standard field >>> at the end). >> >> It's a hack, but the flag + non-standard field idea might just work... > > Plus, it wouldn't have to stay a non-standard field. If it's accepted into > CPython 3.4, we could safely use it in all existing versions of CPython. Sounds good. Perhaps just find a single "extended", then add a new flag field in our payload, in case we need to extend the types object yet again later and run out of unused flag bits (TBD: figure out how many unused flag bits there are). Dag From njs at pobox.com Fri Apr 13 14:19:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Apr 2012 13:19:38 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87E937.9050705@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 9:52 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >> Also, I'm not sure the type registration will scale, especially if >> every callable type wanted to get registered. (E.g. currently closures >> and generators are new types...) Where to draw the line? (Perhaps >> things could get registered lazily on the first __nativecall__ lookup, >> as they're likely to be looked up again?) > > > Right... if we do some work to synchronize the types for Cython modules > generated by the same version of Cython, we're left with 3-4 types for > Cython, right? Then a couple for numba and one for f2py; so on the order of > 10? > > An alternative is do something funny in the type object to get across the > offset-in-object information (abusing the docstring, or introduce our own > flag which means that the type object has an additional non-standard field > at the end). In Python 2.7, it looks like there may be a few TP_FLAG bits free -- 15 and 16 are labeled "reserved for stackless python", and 2, 11, 22 don't have anything defined. There may also be an unused ssize_t field ob_size at the beginning of the type object -- for some reason PyTypeObject is declared as variable size (using PyObject_VAR_HEAD), but I don't see any variable-size fields in it, the docs claim that the ob_size field is a "historical artifact that is maintained for binary compatibility...Always set this field to zero", and Include/object.h has a definition for a PyHeapTypeObject which has a PyTypeObject as its first member, which would not work if PyTypeObject had variable size. Grep says that the only place where ob_type->ob_size is accessed is in Objects/typeobject.c:object_sizeof(), which at first glance appears to be a bug, and anyway I don't think anyone cares whether __sizeof__ on C-callable objects is exactly correct. One could use this for an offset, or even a pointer. One could also add a field easily by just subclassing PyTypeObject. The Signature thing seems like a distraction to me. Signature is intended as just a nice convenient format for looking up stuff that's otherwise stored in more obscure ways -- the API equivalent of pretty-printing. The important thing here is getting the C-level dispatch right. -- Nathaniel From njs at pobox.com Fri Apr 13 14:25:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Apr 2012 13:25:38 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 12:59 PM, Dag Sverre Seljebotn wrote: > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? http://docs.python.org/library/functions.html#intern ? (C API: PyString_InternInPlace, moved from __builtin__.intern to sys.intern in Py3.) - N From stefan_ml at behnel.de Fri Apr 13 14:27:48 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 14:27:48 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: <4F881BC4.1070004@behnel.de> Dag Sverre Seljebotn, 13.04.2012 13:59: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> Robert Bradshaw, 13.04.2012 12:17: >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does will work efficiently work Cython. >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> void* pointer; >>> size_t signature; // compressed binary representation, 95% coverage > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. Yes, I really think it makes sense to keeps IDs unique only over the runtime of the application. (Note that using ssize_t instead of size_t would allow setting the ID to -1 to disable signature matching, in case that's ever needed.) >>> char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, > but it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that > present back in the day and then ripped out? AFAIR, it always had to be done explicitly and is only available for unicode objects in Py3 (and only for bytes objects in Py2). The CPython parser also does it for identifiers, but it's not done automatically for anything else. It's also not cheap to do - it would require a weakref dict to accommodate for the temporary allocation of large strings, and weak references have a certain overhead. In any case, this is an entirely different use case that should be handled differently from normal string interning. > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > void *pointer; > PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the overloads. However, the problem is the setup. These references will have to be created at init time and discarded during runtime termination. Not a problem for Cython generated code, but some overhead for hand written code. Since the size of these structs is not a problem, I'd prefer keeping Python objects out of the game and using an ssize_t ID instead, inferred from a char* signature at module init time by calling a C-API function. That avoids the need for any cleanup. Stefan From markflorisson88 at gmail.com Fri Apr 13 14:46:27 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 13:46:27 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881050.7000302@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> Message-ID: On 13 April 2012 12:38, Stefan Behnel wrote: > Robert Bradshaw, 13.04.2012 12:17: >> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>> Have you given any thought as to what happens if __call__ is >>>> re-assigned for an object (or subclass of an object) supporting this >>>> interface? Or is this out of scope? >>> >>> Out-of-scope, I'd say. Though you can always write an object that detects if >>> you assign to __call__... > > +1 for out of scope. This is a pure C level feature. > > >>>> Minor nit: I don't think should_dereference is worth branching on, if >>>> one wants to save the allocation one can still use a variable-sized >>>> type and point to oneself. Yes, that's an extra dereference, but the >>>> memory is already likely close and it greatly simplifies the logic. >>>> But I could be wrong here. >>> >>> >>> Those minor nits are exactly what I seek; since Travis will have the first >>> implementation in numba<->SciPy, I just want to make sure that what he does >>> will work efficiently work Cython. >> >> +1 >> >> I have to admit building/invoking these var-arg-sized __nativecall__ >> records seems painful. Here's another suggestion: >> >> struct { >> ? ? void* pointer; >> ? ? size_t signature; // compressed binary representation, 95% coverage >> ? ? char* long_signature; // used if signature is not representable in >> a size_t, as indicated by signature = 0 >> } record; >> >> These char* could optionally be allocated at the end of the record* >> for optimal locality. We could even dispense with the binary >> signature, but having that option allows us to avoid strcmp for stuff >> like d)d and ffi)f. > > Assuming we use literals and a const char* for the signature, the C > compiler would cut down the number of signature strings automatically for > us. And a pointer comparison is the same as a size_t comparison. > > That would only apply at a per-module level, though, so it would require an > indirection for the signature IDs. But it would avoid a global registry. > > Another idea would be to set the signature ID field to 0 at the beginning > and call a C-API function to let the current runtime assign an ID > 0, > unique for the currently running application. Then every user would only > have to parse the signature once to adapt to the respective ID and could > otherwise branch based on it directly. > > For Cython, we could generate a static ID variable for each typed call that > we found in the sources. When encountering a C signature on a callable, > either a) the ID variable is still empty (initial case), then we parse the > signature to see if it matches the expected signature. If it does, we > assign the corresponding ID to the static ID variable and issue a direct > call. If b) the ID field is already set (normal case), we compare the > signature IDs directly and issue a C call it they match. If the IDs do not > match, we issue a normal Python call. > > >>> Right... if we do some work to synchronize the types for Cython modules >>> generated by the same version of Cython, we're left with 3-4 types for >>> Cython, right? Then a couple for numba and one for f2py; so on the order of >>> 10? >> >> No, I think each closure is its own type. > > And that even applies to fused functions, right? They'd have one closure > for each type combination. > Hm, there is only one type for the function (CyFunction), but there is a different type for the closure scope for each closure. The same goes for FusedFunction, there is only one type, and each instance contains a dict of specializations (mapping signatures to PyCFunctions). (But each module still has different function types of course). >>> An alternative is do something funny in the type object to get across the >>> offset-in-object information (abusing the docstring, or introduce our own >>> flag which means that the type object has an additional non-standard field >>> at the end). >> >> It's a hack, but the flag + non-standard field idea might just work... > > Plus, it wouldn't have to stay a non-standard field. If it's accepted into > CPython 3.4, we could safely use it in all existing versions of CPython. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Fri Apr 13 14:48:54 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 13:48:54 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On 13 April 2012 12:59, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> >> Robert Bradshaw, 13.04.2012 12:17: >>> >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> >>>>> Have you given any thought as to what happens if __call__ is >>>>> re-assigned for an object (or subclass of an object) supporting this >>>>> interface? Or is this out of scope? >>>> >>>> >>>> Out-of-scope, I'd say. Though you can always write an object that >>>> detects if >>>> you assign to __call__... >> >> >> +1 for out of scope. This is a pure C level feature. >> >> >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the >>>> first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does >>>> will work efficiently work Cython. >>> >>> >>> +1 >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> ? ? void* pointer; >>> ? ? size_t signature; // compressed binary representation, 95% coverage > > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. > It kind of depends on which arguments types and how many arguments you will allow, and whether or not collisions would be fine (which would imply ID comparison + strcmp()). >>> ? ? char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? > > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > ? ?void *pointer; > ? ?PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the > overloads. > Interesting. What I like about size_t it that it could define a deterministic ordering, which means specializations could be stored in a binary search tree in array form. Cython would precompute the size_t for the specialization it needs (and maybe account for promotions as well). >> >> That would only apply at a per-module level, though, so it would require >> an >> indirection for the signature IDs. But it would avoid a global registry. >> >> Another idea would be to set the signature ID field to 0 at the beginning >> and call a C-API function to let the current runtime assign an ID> ?0, >> unique for the currently running application. Then every user would only >> have to parse the signature once to adapt to the respective ID and could >> otherwise branch based on it directly. >> >> For Cython, we could generate a static ID variable for each typed call >> that >> we found in the sources. When encountering a C signature on a callable, >> either a) the ID variable is still empty (initial case), then we parse the >> signature to see if it matches the expected signature. If it does, we >> assign the corresponding ID to the static ID variable and issue a direct >> call. If b) the ID field is already set (normal case), we compare the >> signature IDs directly and issue a C call it they match. If the IDs do not >> match, we issue a normal Python call. >> >> >>>> Right... if we do some work to synchronize the types for Cython modules >>>> generated by the same version of Cython, we're left with 3-4 types for >>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>> of >>>> 10? >>> >>> >>> No, I think each closure is its own type. >> >> >> And that even applies to fused functions, right? They'd have one closure >> for each type combination. >> >> >>>> An alternative is do something funny in the type object to get across >>>> the >>>> offset-in-object information (abusing the docstring, or introduce our >>>> own >>>> flag which means that the type object has an additional non-standard >>>> field >>>> at the end). >>> >>> >>> It's a hack, but the flag + non-standard field idea might just work... >> >> >> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >> CPython 3.4, we could safely use it in all existing versions of CPython. > > > Sounds good. Perhaps just find a single "extended", then add a new flag > field in our payload, in case we need to extend the types object yet again > later and run out of unused flag bits (TBD: figure out how many unused flag > bits there are). > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Maybe it would be a good idea if there was a third project that defined this functionality in header files which projects could include (or in case of Cython directly inject into the generated C files). E.g. a function to check for the native interface, and a function that given an array of signature strings and function pointers builds the ABI information (and computes the ID), and one that given an ID and signature string finds the right specialization. The project should also expose a simple type system for the types we care about, and be able to generate signature strings and IDs for signatures. An optimization for the common case would be to only look at the first entry in the ABI information directly and compare that for the non-overloaded case, and otherwise do a logarithmic lookup, with a final fallback to calling through the Python layer. From stefan_ml at behnel.de Fri Apr 13 14:48:05 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 14:48:05 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881BC4.1070004@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> Message-ID: <4F882085.6070304@behnel.de> Stefan Behnel, 13.04.2012 14:27: > Dag Sverre Seljebotn, 13.04.2012 13:59: >> Requiring interning is somewhat less elegant in one way, but it makes a lot >> of other stuff much simpler. >> >> That gives us >> >> struct { >> void *pointer; >> PyBytesObject *signature; >> } record; >> >> and then you allocate a NULL-terminated arrays of these for all the overloads. > > However, the problem is the setup. These references will have to be created > at init time and discarded during runtime termination. Not a problem for > Cython generated code, but some overhead for hand written code. > > Since the size of these structs is not a problem, I'd prefer keeping Python > objects out of the game and using an ssize_t ID instead, inferred from a > char* signature at module init time by calling a C-API function. That > avoids the need for any cleanup. Actually, we could even use interned char* values. Nothing keeps that C-API setup function from reassigning the "char* signature" field to the char* buffer of an internally allocated byte string. Except that we'd have to *require* users to use literals or otherwise statically allocated C strings in that field. Hmm, maybe not the best idea ever... Stefan From markflorisson88 at gmail.com Fri Apr 13 15:01:30 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 14:01:30 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F882085.6070304@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> <4F882085.6070304@behnel.de> Message-ID: On 13 April 2012 13:48, Stefan Behnel wrote: > Stefan Behnel, 13.04.2012 14:27: >> Dag Sverre Seljebotn, 13.04.2012 13:59: >>> Requiring interning is somewhat less elegant in one way, but it makes a lot >>> of other stuff much simpler. >>> >>> That gives us >>> >>> struct { >>> ? ? void *pointer; >>> ? ? PyBytesObject *signature; >>> } record; >>> >>> and then you allocate a NULL-terminated arrays of these for all the overloads. >> >> However, the problem is the setup. These references will have to be created >> at init time and discarded during runtime termination. Not a problem for >> Cython generated code, but some overhead for hand written code. >> >> Since the size of these structs is not a problem, I'd prefer keeping Python >> objects out of the game and using an ssize_t ID instead, inferred from a >> char* signature at module init time by calling a C-API function. That >> avoids the need for any cleanup. > > Actually, we could even use interned char* values. Nothing keeps that C-API > setup function from reassigning the "char* signature" field to the char* > buffer of an internally allocated byte string. Except that we'd have to > *require* users to use literals or otherwise statically allocated C strings > in that field. Hmm, maybe not the best idea ever... > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel You could create a module shared by all versions and projects, which exposes a function 'get_signature', which given a char *signature returns the pointer that should be used in the ABI signature type information. You can then always compare by identity. From d.s.seljebotn at astro.uio.no Fri Apr 13 15:27:34 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 15:27:34 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> <4F882085.6070304@behnel.de> Message-ID: <4F8829C6.2050903@astro.uio.no> On 04/13/2012 03:01 PM, mark florisson wrote: > On 13 April 2012 13:48, Stefan Behnel wrote: >> Stefan Behnel, 13.04.2012 14:27: >>> Dag Sverre Seljebotn, 13.04.2012 13:59: >>>> Requiring interning is somewhat less elegant in one way, but it makes a lot >>>> of other stuff much simpler. >>>> >>>> That gives us >>>> >>>> struct { >>>> void *pointer; >>>> PyBytesObject *signature; >>>> } record; >>>> >>>> and then you allocate a NULL-terminated arrays of these for all the overloads. >>> >>> However, the problem is the setup. These references will have to be created >>> at init time and discarded during runtime termination. Not a problem for >>> Cython generated code, but some overhead for hand written code. >>> >>> Since the size of these structs is not a problem, I'd prefer keeping Python >>> objects out of the game and using an ssize_t ID instead, inferred from a >>> char* signature at module init time by calling a C-API function. That >>> avoids the need for any cleanup. >> >> Actually, we could even use interned char* values. Nothing keeps that C-API >> setup function from reassigning the "char* signature" field to the char* >> buffer of an internally allocated byte string. Except that we'd have to >> *require* users to use literals or otherwise statically allocated C strings >> in that field. Hmm, maybe not the best idea ever... >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > You could create a module shared by all versions and projects, which > exposes a function 'get_signature', which given a char *signature > returns the pointer that should be used in the ABI signature type > information. You can then always compare by identity. I fail to see how this is different from what I proposed, with interning bytes objects (which I still prefer; although the binary-search features of direct comparison makes that attractive too). BTW, any proposal that requires an actual project/library that both Cython and NumPy depends on will fail in the real world. Dag From stefan_ml at behnel.de Fri Apr 13 15:52:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 15:52:44 +0200 Subject: [Cython] pyregr test suite In-Reply-To: <4F87B57B.6000807@behnel.de> References: <4F87B57B.6000807@behnel.de> Message-ID: <4F882FAC.90008@behnel.de> Stefan Behnel, 13.04.2012 07:11: > Robert Bradshaw, 12.04.2012 22:21: >> On Thu, Apr 12, 2012 at 11:21 AM, mark florisson wrote: >>> Could we run the pyregr test suite manually instead of automatically? >>> It takes a lot of resources to build, and a single simple push to the >>> cython-devel branch results in the build slots being hogged for hours, >>> making the continuous development a lot less 'continuous'. We could >>> just decide to run the pyregr suite every so often, or whenever we >>> make an addition or change that could actually affect Python code (if >>> one updates a test then there is no use in running pyregr for >>> instance). >> >> +1 to manual + periodic for these tests. Alternatively we could make >> them depend on each other, so at most one core is consumed. > > Ok, I'll set it up. They are now triggered by the (nightly) CPython builds and the four configurations run sequentially (there's an option for that), starting with the C tests. I would recommend configuring your own pyregr test jobs (if you have any) for manual runs by disabling all of their triggers. Stefan From markflorisson88 at gmail.com Fri Apr 13 16:18:01 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 15:18:01 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8829C6.2050903@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> <4F882085.6070304@behnel.de> <4F8829C6.2050903@astro.uio.no> Message-ID: On 13 April 2012 14:27, Dag Sverre Seljebotn wrote: > On 04/13/2012 03:01 PM, mark florisson wrote: >> >> On 13 April 2012 13:48, Stefan Behnel ?wrote: >>> >>> Stefan Behnel, 13.04.2012 14:27: >>>> >>>> Dag Sverre Seljebotn, 13.04.2012 13:59: >>>>> >>>>> Requiring interning is somewhat less elegant in one way, but it makes a >>>>> lot >>>>> of other stuff much simpler. >>>>> >>>>> That gives us >>>>> >>>>> struct { >>>>> ? ? void *pointer; >>>>> ? ? PyBytesObject *signature; >>>>> } record; >>>>> >>>>> and then you allocate a NULL-terminated arrays of these for all the >>>>> overloads. >>>> >>>> >>>> However, the problem is the setup. These references will have to be >>>> created >>>> at init time and discarded during runtime termination. Not a problem for >>>> Cython generated code, but some overhead for hand written code. >>>> >>>> Since the size of these structs is not a problem, I'd prefer keeping >>>> Python >>>> objects out of the game and using an ssize_t ID instead, inferred from a >>>> char* signature at module init time by calling a C-API function. That >>>> avoids the need for any cleanup. >>> >>> >>> Actually, we could even use interned char* values. Nothing keeps that >>> C-API >>> setup function from reassigning the "char* signature" field to the char* >>> buffer of an internally allocated byte string. Except that we'd have to >>> *require* users to use literals or otherwise statically allocated C >>> strings >>> in that field. Hmm, maybe not the best idea ever... >>> >>> Stefan >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> You could create a module shared by all versions and projects, which >> exposes a function 'get_signature', which given a char *signature >> returns the pointer that should be used in the ABI signature type >> information. You can then always compare by identity. > > > I fail to see how this is different from what I proposed, with interning > bytes objects (which I still prefer; although the binary-search features of > direct comparison makes that attractive too). It's not really different, more a response to Stefan's comment. > BTW, any proposal that requires an actual project/library that both Cython > and NumPy depends on will fail in the real world. That's fine as long as they use the same way to expose ABI information. As a courtesy though, we could do it anyway, which makes it easier for those respective projects to understand what's involved, how to implement it, and they can then decide whether they want to ship that project as part of their own project. > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at gmail.com Fri Apr 13 19:26:17 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 10:26:17 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> >> Robert Bradshaw, 13.04.2012 12:17: >>> >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> >>>>> Have you given any thought as to what happens if __call__ is >>>>> re-assigned for an object (or subclass of an object) supporting this >>>>> interface? Or is this out of scope? >>>> >>>> >>>> Out-of-scope, I'd say. Though you can always write an object that >>>> detects if >>>> you assign to __call__... >> >> >> +1 for out of scope. This is a pure C level feature. >> >> >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the >>>> first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does >>>> will work efficiently work Cython. >>> >>> >>> +1 >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> ? ? void* pointer; >>> ? ? size_t signature; // compressed binary representation, 95% coverage > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. We have (on the high-performance systems we care about) 64-bits here. If we limit ourselves to a 6-bit alphabet, that gives a trivial encoding for up to 10 chars. We could be more clever here (Huffman coding) but that might be overkill. More importantly though, the "complicated" signatures are likely to be so cheap that the strcmp overhead matters. >>> ? ? char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? > > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > ? ?void *pointer; > ? ?PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the > overloads. Global interning is a nice idea. The one drawback I see is that it becomes much more expensive for dynamically calculated signatures. >> >> That would only apply at a per-module level, though, so it would require >> an >> indirection for the signature IDs. But it would avoid a global registry. >> >> Another idea would be to set the signature ID field to 0 at the beginning >> and call a C-API function to let the current runtime assign an ID> ?0, >> unique for the currently running application. Then every user would only >> have to parse the signature once to adapt to the respective ID and could >> otherwise branch based on it directly. >> >> For Cython, we could generate a static ID variable for each typed call >> that >> we found in the sources. When encountering a C signature on a callable, >> either a) the ID variable is still empty (initial case), then we parse the >> signature to see if it matches the expected signature. If it does, we >> assign the corresponding ID to the static ID variable and issue a direct >> call. If b) the ID field is already set (normal case), we compare the >> signature IDs directly and issue a C call it they match. If the IDs do not >> match, we issue a normal Python call. If I understand correctly, you're proposing struct { char* sig; long id; } sig_t; Where comparison would (sometimes?) compute id from sig by augmenting a global counter and dict? Might be expensive to bootstrap, but eventually all relevant ids would be filled in and it would be quick. Interesting. I wonder what the performance penalty would be over assuming id is statically computed lots of the time, and using that to compare against fixed values. And there's memory locality issues as well. >>>> Right... if we do some work to synchronize the types for Cython modules >>>> generated by the same version of Cython, we're left with 3-4 types for >>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>> of >>>> 10? >>> >>> >>> No, I think each closure is its own type. >> >> >> And that even applies to fused functions, right? They'd have one closure >> for each type combination. >> >> >>>> An alternative is do something funny in the type object to get across >>>> the >>>> offset-in-object information (abusing the docstring, or introduce our >>>> own >>>> flag which means that the type object has an additional non-standard >>>> field >>>> at the end). >>> >>> >>> It's a hack, but the flag + non-standard field idea might just work... >> >> >> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >> CPython 3.4, we could safely use it in all existing versions of CPython. > > > Sounds good. Perhaps just find a single "extended", then add a new flag > field in our payload, in case we need to extend the types object yet again > later and run out of unused flag bits (TBD: figure out how many unused flag > bits there are). > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at gmail.com Fri Apr 13 19:26:34 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 10:26:34 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 5:48 AM, mark florisson wrote: > On 13 April 2012 12:59, Dag Sverre Seljebotn wrote: >> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>> >>> Robert Bradshaw, 13.04.2012 12:17: >>>> >>>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>>> >>>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>>> >>>>>> Have you given any thought as to what happens if __call__ is >>>>>> re-assigned for an object (or subclass of an object) supporting this >>>>>> interface? Or is this out of scope? >>>>> >>>>> >>>>> Out-of-scope, I'd say. Though you can always write an object that >>>>> detects if >>>>> you assign to __call__... >>> >>> >>> +1 for out of scope. This is a pure C level feature. >>> >>> >>>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>>> one wants to save the allocation one can still use a variable-sized >>>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>>> memory is already likely close and it greatly simplifies the logic. >>>>>> But I could be wrong here. >>>>> >>>>> >>>>> >>>>> Those minor nits are exactly what I seek; since Travis will have the >>>>> first >>>>> implementation in numba<->SciPy, I just want to make sure that what he >>>>> does >>>>> will work efficiently work Cython. >>>> >>>> >>>> +1 >>>> >>>> I have to admit building/invoking these var-arg-sized __nativecall__ >>>> records seems painful. Here's another suggestion: >>>> >>>> struct { >>>> ? ? void* pointer; >>>> ? ? size_t signature; // compressed binary representation, 95% coverage >> >> >> Once you start passing around functions that take memory view slices as >> arguments, that 95% estimate will be off I think. >> > > It kind of depends on which arguments types and how many arguments you > will allow, and whether or not collisions would be fine (which would > imply ID comparison + strcmp()). Interesting idea, though this has the drawback of doubling (at least) the overhead of the simple (important) case as well as memory requirements/locality issues. >>>> ? ? char* long_signature; // used if signature is not representable in >>>> a size_t, as indicated by signature = 0 >>>> } record; >>>> >>>> These char* could optionally be allocated at the end of the record* >>>> for optimal locality. We could even dispense with the binary >>>> signature, but having that option allows us to avoid strcmp for stuff >>>> like d)d and ffi)f. >>> >>> >>> Assuming we use literals and a const char* for the signature, the C >>> compiler would cut down the number of signature strings automatically for >>> us. And a pointer comparison is the same as a size_t comparison. >> >> >> I'll go one further: Intern Python bytes objects. It's just a PyObject*, but >> it's *required* (or just strongly encouraged) to have gone through >> >> sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) >> >> Obviously in a PEP you'd have a C-API function for such interning >> (completely standalone utility). Performance of interning operation itself >> doesn't matter... >> >> Unless CPython has interning features itself, like in Java? Was that present >> back in the day and then ripped out? >> >> Requiring interning is somewhat less elegant in one way, but it makes a lot >> of other stuff much simpler. >> >> That gives us >> >> struct { >> ? ?void *pointer; >> ? ?PyBytesObject *signature; >> } record; >> >> and then you allocate a NULL-terminated arrays of these for all the >> overloads. >> > > Interesting. What I like about size_t it that it could define a > deterministic ordering, which means specializations could be stored in > a binary search tree in array form. I think the number of specializations would have to be quite large (>10, maybe 100) before a binary search wins out over a simple scan, but if we stored a count rather than did a null-terminated array teh lookup function could take this into account. (The header will already have plenty of room if we're storing a version number and want the records to be properly aligned.) Requiring them to be sorted would also allow us to abort on average half way through a scan. Of course prioritizing the "likely" signatures first may be more of a win. > Cython would precompute the size_t > for the specialization it needs (and maybe account for promotions as > well). Exactly. >>> That would only apply at a per-module level, though, so it would require >>> an >>> indirection for the signature IDs. But it would avoid a global registry. >>> >>> Another idea would be to set the signature ID field to 0 at the beginning >>> and call a C-API function to let the current runtime assign an ID> ?0, >>> unique for the currently running application. Then every user would only >>> have to parse the signature once to adapt to the respective ID and could >>> otherwise branch based on it directly. >>> >>> For Cython, we could generate a static ID variable for each typed call >>> that >>> we found in the sources. When encountering a C signature on a callable, >>> either a) the ID variable is still empty (initial case), then we parse the >>> signature to see if it matches the expected signature. If it does, we >>> assign the corresponding ID to the static ID variable and issue a direct >>> call. If b) the ID field is already set (normal case), we compare the >>> signature IDs directly and issue a C call it they match. If the IDs do not >>> match, we issue a normal Python call. >>> >>> >>>>> Right... if we do some work to synchronize the types for Cython modules >>>>> generated by the same version of Cython, we're left with 3-4 types for >>>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>>> of >>>>> 10? >>>> >>>> >>>> No, I think each closure is its own type. >>> >>> >>> And that even applies to fused functions, right? They'd have one closure >>> for each type combination. >>> >>> >>>>> An alternative is do something funny in the type object to get across >>>>> the >>>>> offset-in-object information (abusing the docstring, or introduce our >>>>> own >>>>> flag which means that the type object has an additional non-standard >>>>> field >>>>> at the end). >>>> >>>> >>>> It's a hack, but the flag + non-standard field idea might just work... >>> >>> >>> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >>> CPython 3.4, we could safely use it in all existing versions of CPython. >> >> >> Sounds good. Perhaps just find a single "extended", then add a new flag >> field in our payload, in case we need to extend the types object yet again >> later and run out of unused flag bits (TBD: figure out how many unused flag >> bits there are). >> >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Maybe it would be a good idea if there was a third project that > defined this functionality in header files which projects could > include (or in case of Cython directly inject into the generated C > files). E.g. a function to check for the native interface, and a > function that given an array of signature strings and function > pointers builds the ABI information (and computes the ID), and one > that given an ID and signature string finds the right specialization. > The project should also expose a simple type system for the types we > care about, and be able to generate signature strings and IDs for > signatures. > > An optimization for the common case would be to only look at the first > entry in the ABI information directly and compare that for the > non-overloaded case, and otherwise do a logarithmic lookup, with a > final fallback to calling through the Python layer. I think the ABI should be simple (and fully specified) enough to allow a trivial implementation, and we and others could ship our implementations as a tiny C library (or just a header file). - Robert From robertwb at gmail.com Fri Apr 13 20:21:28 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 11:21:28 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 10:26 AM, Robert Bradshaw wrote: > On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn > wrote: >> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>> That would only apply at a per-module level, though, so it would require >>> an >>> indirection for the signature IDs. But it would avoid a global registry. >>> >>> Another idea would be to set the signature ID field to 0 at the beginning >>> and call a C-API function to let the current runtime assign an ID> ?0, >>> unique for the currently running application. Then every user would only >>> have to parse the signature once to adapt to the respective ID and could >>> otherwise branch based on it directly. >>> >>> For Cython, we could generate a static ID variable for each typed call >>> that >>> we found in the sources. When encountering a C signature on a callable, >>> either a) the ID variable is still empty (initial case), then we parse the >>> signature to see if it matches the expected signature. If it does, we >>> assign the corresponding ID to the static ID variable and issue a direct >>> call. If b) the ID field is already set (normal case), we compare the >>> signature IDs directly and issue a C call it they match. If the IDs do not >>> match, we issue a normal Python call. > > If I understand correctly, you're proposing > > struct { > ?char* sig; > ?long id; > } sig_t; > > Where comparison would (sometimes?) compute id from sig by augmenting > a global counter and dict? Might be expensive to bootstrap, but > eventually all relevant ids would be filled in and it would be quick. > Interesting. I wonder what the performance penalty would be over > assuming id is statically computed lots of the time, and using that to > compare against fixed values. And there's memory locality issues as > well. To clarify, I'd really like to have the following as fast as possible: if (callable.sig.id == X) { // yep, that's what I thought } else { // generic call } Alternatively, one can imagine wanting to do: switch (callable.sig.id) { case X: // I can do this case Y: // this is common and fast as well ... default: // generic call } There is some question about how promotion should work (e.g. should this flexibility reside in the caller or the callee (or both, though that could result in a quadratic number of comparisons)?) - Robert From stefan_ml at behnel.de Fri Apr 13 21:15:22 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 21:15:22 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87B896.5050000@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> Message-ID: <4F887B4A.6090504@behnel.de> Stefan Behnel, 13.04.2012 07:24: > Dag Sverre Seljebotn, 13.04.2012 00:34: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> http://wiki.cython.org/enhancements/cep1000 > > I'm all for doing something in this direction and have been hinting at it > on the PyPy mailing list for a while, without reaction so far. I'll trigger > them again, with a pointer to this discussion and the CEP. PyPy should be > totally interested in a generic way to do fast calls into wrapped C code in > general and Cython implemented functions specifically. Their JIT would then > look at the function at runtime and unwrap it. I just learned that the support in PyPy would be rather straight forward. It already supports calling native code with a known signature through their "rlib/libffi.py" module, so all that remains to be done on their side is mapping the encoded signature to their own signature configuration. Stefan From robertwb at gmail.com Fri Apr 13 21:26:44 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 12:26:44 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F887B4A.6090504@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> <4F887B4A.6090504@behnel.de> Message-ID: On Fri, Apr 13, 2012 at 12:15 PM, Stefan Behnel wrote: > Stefan Behnel, 13.04.2012 07:24: >> Dag Sverre Seljebotn, 13.04.2012 00:34: >>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> http://wiki.cython.org/enhancements/cep1000 >> >> I'm all for doing something in this direction and have been hinting at it >> on the PyPy mailing list for a while, without reaction so far. I'll trigger >> them again, with a pointer to this discussion and the CEP. PyPy should be >> totally interested in a generic way to do fast calls into wrapped C code in >> general and Cython implemented functions specifically. Their JIT would then >> look at the function at runtime and unwrap it. > > I just learned that the support in PyPy would be rather straight forward. > It already supports calling native code with a known signature through > their "rlib/libffi.py" module, Cool. > so all that remains to be done on their side > is mapping the encoded signature to their own signature configuration. Or looking into borrowing theirs? (We might want more extensibility, e.g. declaring buffer types and nogil/exception data. I assume ctypes has a signature declaration format as well, right?) - Robert From stefan_ml at behnel.de Fri Apr 13 21:50:15 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 21:50:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> <4F887B4A.6090504@behnel.de> Message-ID: <4F888377.7020100@behnel.de> Robert Bradshaw, 13.04.2012 21:26: > On Fri, Apr 13, 2012 at 12:15 PM, Stefan Behnel wrote: >> Stefan Behnel, 13.04.2012 07:24: >>> Dag Sverre Seljebotn, 13.04.2012 00:34: >>>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>>> http://wiki.cython.org/enhancements/cep1000 >>> >>> I'm all for doing something in this direction and have been hinting at it >>> on the PyPy mailing list for a while, without reaction so far. I'll trigger >>> them again, with a pointer to this discussion and the CEP. PyPy should be >>> totally interested in a generic way to do fast calls into wrapped C code in >>> general and Cython implemented functions specifically. Their JIT would then >>> look at the function at runtime and unwrap it. >> >> I just learned that the support in PyPy would be rather straight forward. >> It already supports calling native code with a known signature through >> their "rlib/libffi.py" module, > > Cool. > >> so all that remains to be done on their side >> is mapping the encoded signature to their own signature configuration. > > Or looking into borrowing theirs? (We might want more extensibility, > e.g. declaring buffer types and nogil/exception data. I assume ctypes > has a signature declaration format as well, right?) PyPy's ctypes implementation is based on libffi. However, I think neither of the two has a declaration format (e.g. string based) other than the object based declaration notation. You basically pass them a sequence of type objects to declare the signature. That's not really easy to map to the C level - at least not efficiently... Stefan From stefan_ml at behnel.de Fri Apr 13 21:52:33 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 21:52:33 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: <4F888401.90309@behnel.de> Robert Bradshaw, 13.04.2012 20:21: > On Fri, Apr 13, 2012 at 10:26 AM, Robert Bradshaw wrote: >> On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: >>> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>>> That would only apply at a per-module level, though, so it would >>>> require an indirection for the signature IDs. But it would avoid a >>>> global registry. >>>> >>>> Another idea would be to set the signature ID field to 0 at the beginning >>>> and call a C-API function to let the current runtime assign an ID> 0, >>>> unique for the currently running application. Then every user would only >>>> have to parse the signature once to adapt to the respective ID and could >>>> otherwise branch based on it directly. >>>> >>>> For Cython, we could generate a static ID variable for each typed call >>>> that >>>> we found in the sources. When encountering a C signature on a callable, >>>> either a) the ID variable is still empty (initial case), then we parse the >>>> signature to see if it matches the expected signature. If it does, we >>>> assign the corresponding ID to the static ID variable and issue a direct >>>> call. If b) the ID field is already set (normal case), we compare the >>>> signature IDs directly and issue a C call it they match. If the IDs do not >>>> match, we issue a normal Python call. >> >> If I understand correctly, you're proposing >> >> struct { >> char* sig; >> long id; >> } sig_t; >> >> Where comparison would (sometimes?) compute id from sig by augmenting >> a global counter and dict? Might be expensive to bootstrap, but >> eventually all relevant ids would be filled in and it would be quick. Yes. If a function is only called once, the overhead won't matter. And starting from the second call, it would either be fast if the function signature matches or slow anyway if it doesn't match. >> Interesting. I wonder what the performance penalty would be over >> assuming id is statically computed lots of the time, and using that to >> compare against fixed values. And there's memory locality issues as >> well. > > To clarify, I'd really like to have the following as fast as possible: > > if (callable.sig.id == X) { > // yep, that's what I thought > } else { > // generic call > } > > Alternatively, one can imagine wanting to do: > > switch (callable.sig.id) { > case X: > // I can do this > case Y: > // this is common and fast as well > ... > default: > // generic call > } Yes, that's the idea. > There is some question about how promotion should work (e.g. should > this flexibility reside in the caller or the callee (or both, though > that could result in a quadratic number of comparisons)?) Callees could expose multiple signatures (which would result in a direct call for each, without further comparisons), then the caller would have to choose between those. However, if none matches exactly, the caller might want to promote its arguments and try more signatures. In any case, it's the caller that does the work, never the callee. We could generate code like this: /* cdef int x = ... * cdef long y = ... * cdef int z # interesting: what if z is not typed? * z = func(x, y) */ if (func.sig.id == id("[int,long] -> int")) { z = ((cast)func.cfunc) (x,y); } else if (sizeof(long) > sizeof(int) && (func.sig.id == id("[long,long] -> int"))) { z = ((cast)func.cfunc) ((long)x, y); } etc. ... else { /* pack and call as Python function */ } Meaning, the C compiler could reduce the amount of optimistic call code at compile time. Stefan From d.s.seljebotn at astro.uio.no Fri Apr 13 22:27:29 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 22:27:29 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Ah, I didn't think about 6-bit or huffman. Certainly helps. I'm almost +1 on your proposal now, but a couple of more ideas: 1) Let the key (the size_t) spill over to the next specialization entry if it is too large; and prepend that key with a continuation code (two size-ts could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using - as continuation). The key-based caller will expect a continuation if it knows about the specialization, and the prepended char will prevent spurios matches against the overspilled slot. We could even use the pointers for part of the continuation... 2) Separate the char* format strings from the keys, ie this memory layout: Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... Where nslots is larger than nspecs if there are continuations. OK, this is getting close to my original proposal, but the difference is the contiunation char, so that if you expect a short signature, you can safely scan every slot and branching and no null-checking necesarry. Dag -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Robert Bradshaw wrote: On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> >> Robert Bradshaw, 13.04.2012 12:17: >>> >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> >>>>> Have you given any thought as to what happens if __call__ is >>>>> re-assigned for an object (or subclass of an object) supporting this >>>>> interface? Or is this out of scope? >>>> >>>> >>>> Out-of-scope, I'd say. Though you can always write an object that >>>> detects if >>>> you assign to __call__... >> >> >> +1 for out of scope. This is a pure C level feature. >> >> >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the >>>> first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does >>>> will work efficiently work Cython. >>> >>> >>> +1 >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> void* pointer; >>> size_t signature; // compressed binary representation, 95% coverage > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. We have (on the high-performance systems we care about) 64-bits here. If we limit ourselves to a 6-bit alphabet, that gives a trivial encoding for up to 10 chars. We could be more clever here (Huffman coding) but that might be overkill. More importantly though, the "complicated" signatures are likely to be so cheap that the strcmp overhead matters. >>> char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? > > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > void *pointer; > PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the > overloads. Global interning is a nice idea. The one drawback I see is that it becomes much more expensive for dynamically calculated signatures. >> >> That would only apply at a per-module level, though, so it would require >> an >> indirection for the signature IDs. But it would avoid a global registry. >> >> Another idea would be to set the signature ID field to 0 at the beginning >> and call a C-API function to let the current runtime assign an ID> 0, >> unique for the currently running application. Then every user would only >> have to parse the signature once to adapt to the respective ID and could >> otherwise branch based on it directly. >> >> For Cython, we could generate a static ID variable for each typed call >> that >> we found in the sources. When encountering a C signature on a callable, >> either a) the ID variable is still empty (initial case), then we parse the >> signature to see if it matches the expected signature. If it does, we >> assign the corresponding ID to the static ID variable and issue a direct >> call. If b) the ID field is already set (normal case), we compare the >> signature IDs directly and issue a C call it they match. If the IDs do not >> match, we issue a normal Python call. If I understand correctly, you're proposing struct { char* sig; long id; } sig_t; Where comparison would (sometimes?) compute id from sig by augmenting a global counter and dict? Might be expensive to bootstrap, but eventually all relevant ids would be filled in and it would be quick. Interesting. I wonder what the performance penalty would be over assuming id is statically computed lots of the time, and using that to compare against fixed values. And there's memory locality issues as well. >>>> Right... if we do some work to synchronize the types for Cython modules >>>> generated by the same version of Cython, we're left with 3-4 types for >>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>> of >>>> 10? >>> >>> >>> No, I think each closure is its own type. >> >> >> And that even applies to fused functions, right? They'd have one closure >> for each type combination. >> >> >>>> An alternative is do something funny in the type object to get across >>>> the >>>> offset-in-object information (abusing the docstring, or introduce our >>>> own >>>> flag which means that the type object has an additional non-standard >>>> field >>>> at the end). >>> >>> >>> It's a hack, but the flag + non-standard field idea might just work... >> >> >> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >> CPython 3.4, we could safely use it in all existing versions of CPython. > > > Sounds good. Perhaps just find a single "extended", then add a new flag > field in our payload, in case we need to extend the types object yet again > later and run out of unused flag bits (TBD: figure out how many unused flag > bits there are). > > Dag > >_____________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel _____________________________________________ cython-devel mailing list cython-devel at python.org http://mail.python.org/mailman/listinfo/cython-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at gmail.com Fri Apr 13 22:54:12 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 13:54:12 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F888401.90309@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F888401.90309@behnel.de> Message-ID: On Fri, Apr 13, 2012 at 12:52 PM, Stefan Behnel wrote: > Robert Bradshaw, 13.04.2012 20:21: >> On Fri, Apr 13, 2012 at 10:26 AM, Robert Bradshaw wrote: >>> On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: >>>> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>>>> That would only apply at a per-module level, though, so it would >>>>> require an indirection for the signature IDs. But it would avoid a >>>>> global registry. >>>>> >>>>> Another idea would be to set the signature ID field to 0 at the beginning >>>>> and call a C-API function to let the current runtime assign an ID> ?0, >>>>> unique for the currently running application. Then every user would only >>>>> have to parse the signature once to adapt to the respective ID and could >>>>> otherwise branch based on it directly. >>>>> >>>>> For Cython, we could generate a static ID variable for each typed call >>>>> that >>>>> we found in the sources. When encountering a C signature on a callable, >>>>> either a) the ID variable is still empty (initial case), then we parse the >>>>> signature to see if it matches the expected signature. If it does, we >>>>> assign the corresponding ID to the static ID variable and issue a direct >>>>> call. If b) the ID field is already set (normal case), we compare the >>>>> signature IDs directly and issue a C call it they match. If the IDs do not >>>>> match, we issue a normal Python call. >>> >>> If I understand correctly, you're proposing >>> >>> struct { >>> ?char* sig; >>> ?long id; >>> } sig_t; >>> >>> Where comparison would (sometimes?) compute id from sig by augmenting >>> a global counter and dict? Might be expensive to bootstrap, but >>> eventually all relevant ids would be filled in and it would be quick. > > Yes. If a function is only called once, the overhead won't matter. And > starting from the second call, it would either be fast if the function > signature matches or slow anyway if it doesn't match. There's still data locality issues, including the cached id for the caller as well as the callee. >>> Interesting. I wonder what the performance penalty would be over >>> assuming id is statically computed lots of the time, and using that to >>> compare against fixed values. And there's memory locality issues as >>> well. >> >> To clarify, I'd really like to have the following as fast as possible: >> >> if (callable.sig.id == X) { >> ? ?// yep, that's what I thought >> } else { >> ? ?// generic call >> } >> >> Alternatively, one can imagine wanting to do: >> >> switch (callable.sig.id) { >> ? ? case X: >> ? ? ? ? // I can do this >> ? ? case Y: >> ? ? ? ? // this is common and fast as well >> ? ? ... >> ? ? default: >> ? ? ? ? // generic call >> } > > Yes, that's the idea. > > >> There is some question about how promotion should work (e.g. should >> this flexibility reside in the caller or the callee (or both, though >> that could result in a quadratic number of comparisons)?) > > Callees could expose multiple signatures (which would result in a direct > call for each, without further comparisons), then the caller would have to > choose between those. However, if none matches exactly, the caller might > want to promote its arguments and try more signatures. In any case, it's > the caller that does the work, never the callee. > > We could generate code like this: > > ? ?/* cdef int x = ... > ? ? * cdef long y = ... > ? ? * cdef int z ? ? ? # interesting: what if z is not typed? > ? ? * z = func(x, y) > ? ? */ > > ? ?if (func.sig.id == id("[int,long] -> int")) { > ? ? ? ? z = ((cast)func.cfunc) (x,y); > ? ?} else if (sizeof(long) > sizeof(int) && > ? ? ? ? ? ? ? (func.sig.id == id("[long,long] -> int"))) { > ? ? ? ? z = ((cast)func.cfunc) ((long)x, y); > ? ?} etc. ... else { > ? ? ? ? /* pack and call as Python function */ > ? ?} > > Meaning, the C compiler could reduce the amount of optimistic call code at > compile time. Interesting idea. Alternatively, I wonder if the signature could reflect exactly-sized types rather than int/long/etc. Perhaps that would make the code more complicated on both ends... I'm assuming your id(...) is computed at compile time in this example, right? Otherwise it would get a bit messier. - Robert From robertwb at gmail.com Fri Apr 13 23:18:30 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 14:18:30 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 1:27 PM, Dag Sverre Seljebotn wrote: > Ah, I didn't think about 6-bit or huffman. Certainly helps. Yeah, we don't want to complicate the ABI too much, but I think something like 8 4-bit common chars and 32 6-bit other chars (or 128 8-bit other chars) wouldn't be outrageous. The fact that we only have to encode into a single word makes the algorithm very simple (though the majority of the time we'd spit out pre-encoded literals). We have a version number to play with this as well. > I'm almost +1 on your proposal now, but a couple of more ideas: > > 1) Let the key (the size_t) spill over to the next specialization entry if > it is too large; and prepend that key with a continuation code (two size-ts > could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using > - as continuation). The key-based caller will expect a continuation if it > knows about the specialization, and the prepended char will prevent spurios > matches against the overspilled slot. > > We could even use the pointers for part of the continuation... > > 2) Separate the char* format strings from the keys, ie this memory layout: > > Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... > > Where nslots is larger than nspecs if there are continuations. > > OK, this is getting close to my original proposal, but the difference is the > contiunation char, so that if you expect a short signature, you can safely > scan every slot and branching and no null-checking necesarry. I don't think we need nslots (though it might be interesting). My thought is that once you start futzing with variable-length keys, you might as well just compare char*s. If one is concerned about memory, one could force the sigcharptr to be aligned, and then the "keys" could be either sigcharptr or key depending on whether the least significant bit was set. One could easily scan for/switch on a key and scanning for a char* would be almost as easy (just don't dereference if the lsb is set). I don't see us being memory constrained, so (version,nspecs,futureuse),(key,sigcharptr,funcptr)*,optionalsigchardata* seems fine to me even if only one of key/sigchrptr is ever used per spec. Null-terminating the specs would work fine as well (one less thing to keep track of during iteration). - Robert From njs at pobox.com Fri Apr 13 23:24:44 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Apr 2012 22:24:44 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn wrote: > Ah, I didn't think about 6-bit or huffman. Certainly helps. > > I'm almost +1 on your proposal now, but a couple of more ideas: > > 1) Let the key (the size_t) spill over to the next specialization entry if > it is too large; and prepend that key with a continuation code (two size-ts > could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using > - as continuation). The key-based caller will expect a continuation if it > knows about the specialization, and the prepended char will prevent spurios > matches against the overspilled slot. > > We could even use the pointers for part of the continuation... I am really lost here. Why is any of this complicated encoding stuff better than interning? Interning takes one line of code, is incredibly cheap (one dict lookup per call site and function definition), and it lets you check any possible signature (even complicated ones involving memoryviews) by doing a single-word comparison. And best of all, you don't have to think hard to make sure you got the encoding right. ;-) On a 32-bit system, pointers are smaller than a size_t, but more expressive! You can still do binary search if you want, etc. Is the problem just that interning requires a runtime calculation? Because I feel like C users (like numpy) will want to compute these compressed codes at module-init anyway, and those of us with a fancy compiler capable of computing them ahead of time (like Cython) can instruct that fancy compiler to compute them at module-init time just as easily? -- Nathaniel From robertwb at gmail.com Fri Apr 13 23:50:05 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 14:50:05 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith wrote: > On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn > wrote: >> Ah, I didn't think about 6-bit or huffman. Certainly helps. >> >> I'm almost +1 on your proposal now, but a couple of more ideas: >> >> 1) Let the key (the size_t) spill over to the next specialization entry if >> it is too large; and prepend that key with a continuation code (two size-ts >> could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using >> - as continuation). The key-based caller will expect a continuation if it >> knows about the specialization, and the prepended char will prevent spurios >> matches against the overspilled slot. >> >> We could even use the pointers for part of the continuation... > > I am really lost here. Why is any of this complicated encoding stuff > better than interning? Interning takes one line of code, is incredibly > cheap (one dict lookup per call site and function definition), and it > lets you check any possible signature (even complicated ones involving > memoryviews) by doing a single-word comparison. And best of all, you > don't have to think hard to make sure you got the encoding right. ;-) > > On a 32-bit system, pointers are smaller than a size_t, but more > expressive! You can still do binary search if you want, etc. Is the > problem just that interning requires a runtime calculation? Because I > feel like C users (like numpy) will want to compute these compressed > codes at module-init anyway, and those of us with a fancy compiler > capable of computing them ahead of time (like Cython) can instruct > that fancy compiler to compute them at module-init time just as > easily? Good question. The primary disadvantage of interning that I see is memory locality. I suppose if all the C-level caches of interned values were co-located, this may not be as big of an issue. Not being able to compare against compile-time constants may thwart some optimization opportunities, but that's less clear. It also requires coordination common repository, but I suppose one would just stick a set in some standard module (or leverage Python's interning). - Robert From d.s.seljebotn at astro.uio.no Sat Apr 14 00:06:47 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 00:06:47 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: Robert Bradshaw wrote: >On Fri, Apr 13, 2012 at 1:27 PM, Dag Sverre Seljebotn > wrote: >> Ah, I didn't think about 6-bit or huffman. Certainly helps. > >Yeah, we don't want to complicate the ABI too much, but I think >something like 8 4-bit common chars and 32 6-bit other chars (or 128 >8-bit other chars) wouldn't be outrageous. The fact that we only have >to encode into a single word makes the algorithm very simple (though >the majority of the time we'd spit out pre-encoded literals). We have >a version number to play with this as well. > >> I'm almost +1 on your proposal now, but a couple of more ideas: >> >> 1) Let the key (the size_t) spill over to the next specialization >entry if >> it is too large; and prepend that key with a continuation code (two >size-ts >> could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, >using >> - as continuation). The key-based caller will expect a continuation >if it >> knows about the specialization, and the prepended char will prevent >spurios >> matches against the overspilled slot. >> >> We could even use the pointers for part of the continuation... >> >> 2) Separate the char* format strings from the keys, ie this memory >layout: >> >> >Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... >> >> Where nslots is larger than nspecs if there are continuations. >> >> OK, this is getting close to my original proposal, but the difference >is the >> contiunation char, so that if you expect a short signature, you can >safely >> scan every slot and branching and no null-checking necesarry. > >I don't think we need nslots (though it might be interesting). My >thought is that once you start futzing with variable-length keys, you >might as well just compare char*s. This is where we disagree. If you are the caller you know at compile-time how much you want to match; I think comparing 2 or 3 size-t with no looping is a lot better (a fully-unrolled, 64-bit per instruction strcmp with one of the operands known to the compiler...). > >If one is concerned about memory, one could force the sigcharptr to be >aligned, and then the "keys" could be either sigcharptr or key >depending on whether the least significant bit was set. One could >easily scan for/switch on a key and scanning for a char* would be >almost as easy (just don't dereference if the lsb is set). > >I don't see us being memory constrained, so > >(version,nspecs,futureuse),(key,sigcharptr,funcptr)*,optionalsigchardata* > >seems fine to me even if only one of key/sigchrptr is ever used per >spec. Null-terminating the specs would work fine as well (one less >thing to keep track of during iteration). Well, can't one always use more L1 cache, or is that not a concern? If you have 5-6 different routines calling each other using this mechanism, each with multiple specializations, those unused slots translate to many cache lines wasted. I don't think it is that important, I just think that how pretty the C struct declaration ends up looking should not be a concern at all, when the whole point of this is speed anyway. You can always just use a throwaway struct declaration and a cast to get whatever layout you need. If the 'padding' leads to less branching then fine, but I don't see that it helps in any way. To refine my proposal a bit, we have a list of variable size entries, (keydata, keydata, ..., funcptr) where each keydata and the ptr is 64 bits on all platforms (see below); each entry must have a total length multiple of 128 bits (so that one can safely scan for a signature in 128 bit increments in the data *without* parsing or branching, you'll never hit a pointer), and each key but the first starts with a 'dash'. Signature strings are either kept separate, or even parsed/decoded from the keys. We really only care about speed when you have compiled or JITed code for the case, decoding should be fine otherwise. BTW, won't the Cython-generated C code be a horrible mess if we use size-t rather than insist on int64t? (ok, those need some ifdefs for various compilers, but still seem cleaner than operating with 32bit and 64bit keys, and stdint.h is winning ground). Dag > >- Robert >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Sat Apr 14 00:22:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 00:22:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

Message-ID: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Robert Bradshaw wrote: >On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith wrote: >> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >> wrote: >>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>> >>> I'm almost +1 on your proposal now, but a couple of more ideas: >>> >>> 1) Let the key (the size_t) spill over to the next specialization >entry if >>> it is too large; and prepend that key with a continuation code (two >size-ts >>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >encoding, using >>> - as continuation). The key-based caller will expect a continuation >if it >>> knows about the specialization, and the prepended char will prevent >spurios >>> matches against the overspilled slot. >>> >>> We could even use the pointers for part of the continuation... >> >> I am really lost here. Why is any of this complicated encoding stuff >> better than interning? Interning takes one line of code, is >incredibly >> cheap (one dict lookup per call site and function definition), and it >> lets you check any possible signature (even complicated ones >involving >> memoryviews) by doing a single-word comparison. And best of all, you >> don't have to think hard to make sure you got the encoding right. ;-) >> >> On a 32-bit system, pointers are smaller than a size_t, but more >> expressive! You can still do binary search if you want, etc. Is the >> problem just that interning requires a runtime calculation? Because I >> feel like C users (like numpy) will want to compute these compressed >> codes at module-init anyway, and those of us with a fancy compiler >> capable of computing them ahead of time (like Cython) can instruct >> that fancy compiler to compute them at module-init time just as >> easily? > >Good question. > >The primary disadvantage of interning that I see is memory locality. I >suppose if all the C-level caches of interned values were co-located, >this may not be as big of an issue. Not being able to compare against >compile-time constants may thwart some optimization opportunities, but >that's less clear. > >It also requires coordination common repository, but I suppose one >would just stick a set in some standard module (or leverage Python's >interning). More problems: 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python and we should not make it worse. You basically *need* a thread safe store separate from any python interpreter; though pythread.h does not rely on the interpreter state; which helps. 2) you end up with the known comparison values in read-write memory segments rather than readonly segments, which is probably worse on multicore systems? I really think that anything that we can do to make this near-c-speed should be done; none of the proposals are *that* complicated. Using keys, NumPy can in the C code choose to be slower but more readable; but using interned string forces cython to be slower, cython gets no way of choosing to go faster. (to the degree that it has an effect; none of these claims were checked) Dag > >- Robert >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Sat Apr 14 00:31:58 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 00:31:58 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: <8effd9fb-f145-48a8-a4cb-a75abac57a89@email.android.com> Dag Sverre Seljebotn wrote: > > >Robert Bradshaw wrote: > >>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith >wrote: >>> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >>> wrote: >>>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>>> >>>> I'm almost +1 on your proposal now, but a couple of more ideas: >>>> >>>> 1) Let the key (the size_t) spill over to the next specialization >>entry if >>>> it is too large; and prepend that key with a continuation code (two >>size-ts >>>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >>encoding, using >>>> - as continuation). The key-based caller will expect a continuation >>if it >>>> knows about the specialization, and the prepended char will prevent >>spurios >>>> matches against the overspilled slot. >>>> >>>> We could even use the pointers for part of the continuation... >>> >>> I am really lost here. Why is any of this complicated encoding stuff >>> better than interning? Interning takes one line of code, is >>incredibly >>> cheap (one dict lookup per call site and function definition), and >it >>> lets you check any possible signature (even complicated ones >>involving >>> memoryviews) by doing a single-word comparison. And best of all, you >>> don't have to think hard to make sure you got the encoding right. >;-) >>> >>> On a 32-bit system, pointers are smaller than a size_t, but more >>> expressive! You can still do binary search if you want, etc. Is the >>> problem just that interning requires a runtime calculation? Because >I >>> feel like C users (like numpy) will want to compute these compressed >>> codes at module-init anyway, and those of us with a fancy compiler >>> capable of computing them ahead of time (like Cython) can instruct >>> that fancy compiler to compute them at module-init time just as >>> easily? >> >>Good question. >> >>The primary disadvantage of interning that I see is memory locality. I >>suppose if all the C-level caches of interned values were co-located, >>this may not be as big of an issue. Not being able to compare against >>compile-time constants may thwart some optimization opportunities, but >>that's less clear. >> >>It also requires coordination common repository, but I suppose one >>would just stick a set in some standard module (or leverage Python's >>interning). > >More problems: > >1) It doesn't work well with multiple interpreter states. Ok, nothing >works with that at the moment, but it is on the roadmap for Python and >we should not make it worse. > >You basically *need* a thread safe store separate from any python >interpreter; though pythread.h does not rely on the interpreter state; >which helps. No, it doesn't, unless we want to ship a single(!) .so-file that can be depended upon by all relevant projects. There's just no way for loaded modules to communicate and synchronize that they know about this CEP except through an interpreter... That's almost impossible to work around in any clean way? (I can think of several very ugly ones...) Unless the multiple interpreter state idea is entirely dead in CPython, interning must be done seperately for each interpreter and the values stored in the module object. Ugh. Dag > >2) you end up with the known comparison values in read-write memory >segments rather than readonly segments, which is probably worse on >multicore systems? > >I really think that anything that we can do to make this near-c-speed >should be done; none of the proposals are *that* complicated. > >Using keys, NumPy can in the C code choose to be slower but more >readable; but using interned string forces cython to be slower, cython >gets no way of choosing to go faster. (to the degree that it has an >effect; none of these claims were checked) > >Dag > > >> >>- Robert >>_______________________________________________ >>cython-devel mailing list >>cython-devel at python.org >>http://mail.python.org/mailman/listinfo/cython-devel > >-- >Sent from my Android phone with K-9 Mail. Please excuse my brevity. >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From robertwb at gmail.com Sat Apr 14 01:46:39 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 16:46:39 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 3:06 PM, Dag Sverre Seljebotn wrote: > > > Robert Bradshaw wrote: > >>On Fri, Apr 13, 2012 at 1:27 PM, Dag Sverre Seljebotn >> wrote: >>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >> >>Yeah, we don't want to complicate the ABI too much, but I think >>something like 8 4-bit common chars and 32 6-bit other chars (or 128 >>8-bit other chars) wouldn't be outrageous. The fact that we only have >>to encode into a single word makes the algorithm very simple (though >>the majority of the time we'd spit out pre-encoded literals). We have >>a version number to play with this as well. >> >>> I'm almost +1 on your proposal now, but a couple of more ideas: >>> >>> 1) Let the key (the size_t) spill over to the next specialization >>entry if >>> it is too large; and prepend that key with a continuation code (two >>size-ts >>> could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, >>using >>> - as continuation). The key-based caller will expect a continuation >>if it >>> knows about the specialization, and the prepended char will prevent >>spurios >>> matches against the overspilled slot. >>> >>> We could even use the pointers for part of the continuation... >>> >>> 2) Separate the char* format strings from the keys, ie this memory >>layout: >>> >>> >>Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... >>> >>> Where nslots is larger than nspecs if there are continuations. >>> >>> OK, this is getting close to my original proposal, but the difference >>is the >>> contiunation char, so that if you expect a short signature, you can >>safely >>> scan every slot and branching and no null-checking necesarry. >> >>I don't think we need nslots (though it might be interesting). My >>thought is that once you start futzing with variable-length keys, you >>might as well just compare char*s. > > This is where we disagree. If you are the caller you know at compile-time how much you want to match; I think comparing 2 or 3 size-t with no looping is a lot better (a fully-unrolled, 64-bit per instruction strcmp with one of the operands known to the compiler...). Doesn't the compiler unroll strcmp much like this for a known operand? >>If one is concerned about memory, one could force the sigcharptr to be >>aligned, and then the "keys" could be either sigcharptr or key >>depending on whether the least significant bit was set. One could >>easily scan for/switch on a key and scanning for a char* would be >>almost as easy (just don't dereference if the lsb is set). >> >>I don't see us being memory constrained, so >> >>(version,nspecs,futureuse),(key,sigcharptr,funcptr)*,optionalsigchardata* >> >>seems fine to me even if only one of key/sigchrptr is ever used per >>spec. Null-terminating the specs would work fine as well (one less >>thing to keep track of during iteration). > > Well, can't one always use more L1 cache, or is that not a concern? If you have 5-6 different routines calling each other using this mechanism, each with multiple specializations, those unused slots translate to many cache lines wasted. > > I don't think it is that important, I just think that how pretty the C struct declaration ends up looking should not be a concern at all, when the whole point of this is speed anyway. You can always just use a throwaway struct declaration and a cast to get whatever layout you need. If the 'padding' leads to less branching then fine, but I don't see that it helps in any way. I was more concerned about guaranteeing each char* was aligned. > To refine my proposal a bit, we have a list of variable size entries, > > (keydata, keydata, ..., funcptr) > > where each keydata and the ptr is 64 bits on all platforms (see below); each entry must have a total length multiple of 128 bits (so that one can safely scan for a signature in 128 bit increments in the data *without* parsing or branching, you'll never hit a pointer), and each key but the first starts with a 'dash'. Ah, OK, similar to UTF-8. Yes, I like this idea. > Signature strings are either kept separate, or even parsed/decoded from the keys. We really only care about speed when you have compiled or JITed code for the case, decoding should be fine otherwise. True. > BTW, won't the Cython-generated C code be a horrible mess if we use size-t rather than insist on int64t? (ok, those need some ifdefs for various compilers, but still seem cleaner than operating with 32bit and 64bit keys, and stdint.h is winning ground). Sure, we could require 64-bit keys (and pointer slots). On Fri, Apr 13, 2012 at 3:22 PM, Dag Sverre Seljebotn wrote: >>> I am really lost here. Why is any of this complicated encoding stuff >>> better than interning? Interning takes one line of code, is >>incredibly >>> cheap (one dict lookup per call site and function definition), and it >>> lets you check any possible signature (even complicated ones >>involving >>> memoryviews) by doing a single-word comparison. And best of all, you >>> don't have to think hard to make sure you got the encoding right. ;-) >>> >>> On a 32-bit system, pointers are smaller than a size_t, but more >>> expressive! You can still do binary search if you want, etc. Is the >>> problem just that interning requires a runtime calculation? Because I >>> feel like C users (like numpy) will want to compute these compressed >>> codes at module-init anyway, and those of us with a fancy compiler >>> capable of computing them ahead of time (like Cython) can instruct >>> that fancy compiler to compute them at module-init time just as >>> easily? >> >>Good question. >> >>The primary disadvantage of interning that I see is memory locality. I >>suppose if all the C-level caches of interned values were co-located, >>this may not be as big of an issue. Not being able to compare against >>compile-time constants may thwart some optimization opportunities, but >>that's less clear. >> >>It also requires coordination common repository, but I suppose one >>would just stick a set in some standard module (or leverage Python's >>interning). > > More problems: > > 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python and we should not make it worse. > > You basically *need* a thread safe store separate from any python interpreter; though pythread.h does not rely on the interpreter state; which helps. I didn't know about the push for multiple interpreter states, but yeah, that makes things much more painful. > 2) you end up with the known comparison values in read-write memory segments rather than readonly segments, which is probably worse on multicore systems? Yeah, this is the kind of stuff I was vaguely worried about when I wrote "Not being able to compare against compile-time constants may thwart some optimization opportunities." I don't know what the impact is, but it's worth trying to measure and take into account. > I really think that anything that we can do to make this near-c-speed should be done; none of the proposals are *that* complicated. > > Using keys, NumPy can in the C code choose to be slower but more readable; but using interned string forces cython to be slower, cython gets no way of choosing to go faster. (to the degree that it has an effect; none of these claims were checked) Yep, agreed. - Robert From njs at pobox.com Sat Apr 14 02:19:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 14 Apr 2012 01:19:41 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 11:22 PM, Dag Sverre Seljebotn wrote: > > > Robert Bradshaw wrote: > >>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith wrote: >>> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >>> wrote: >>>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>>> >>>> I'm almost +1 on your proposal now, but a couple of more ideas: >>>> >>>> 1) Let the key (the size_t) spill over to the next specialization >>entry if >>>> it is too large; and prepend that key with a continuation code (two >>size-ts >>>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >>encoding, using >>>> - as continuation). The key-based caller will expect a continuation >>if it >>>> knows about the specialization, and the prepended char will prevent >>spurios >>>> matches against the overspilled slot. >>>> >>>> We could even use the pointers for part of the continuation... >>> >>> I am really lost here. Why is any of this complicated encoding stuff >>> better than interning? Interning takes one line of code, is >>incredibly >>> cheap (one dict lookup per call site and function definition), and it >>> lets you check any possible signature (even complicated ones >>involving >>> memoryviews) by doing a single-word comparison. And best of all, you >>> don't have to think hard to make sure you got the encoding right. ;-) >>> >>> On a 32-bit system, pointers are smaller than a size_t, but more >>> expressive! You can still do binary search if you want, etc. Is the >>> problem just that interning requires a runtime calculation? Because I >>> feel like C users (like numpy) will want to compute these compressed >>> codes at module-init anyway, and those of us with a fancy compiler >>> capable of computing them ahead of time (like Cython) can instruct >>> that fancy compiler to compute them at module-init time just as >>> easily? >> >>Good question. >> >>The primary disadvantage of interning that I see is memory locality. I >>suppose if all the C-level caches of interned values were co-located, >>this may not be as big of an issue. Not being able to compare against >>compile-time constants may thwart some optimization opportunities, but >>that's less clear. I would like to see some demonstration of this. E.g., you can run this: echo -e '#include \nint main(int argc, char ** argv) { return strcmp(argv[0], "a"); }' | gcc -S -x c - -o - -O2 | less Looks to me like for a short, known-at-compile-time string, with optimization on, gcc implements it by basically sticking the string in a global variable and then using a pointer... (If I do argv[0] == (char *)0x1234, then it places the constant value directly into the instruction stream. Strangely enough, it does *not* inline the constant value even if I do memcmp(&argv[0], "\1\2\3\4", 4), which should be exactly equivalent...!) I think gcc is just as likely to stick a bunch of static void * interned_dd_to_d; static void * interned_ll_to_l; next to each other in the memory image as it is to stick a bunch of equivalent manifest constants. If you're worried, make it static void * interned_signatures[NUM_SIGNATURES] -- then they'll definitely be next to each other. >>It also requires coordination common repository, but I suppose one >>would just stick a set in some standard module (or leverage Python's >>interning). > > More problems: > > 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python and we should not make it worse. This isn't a criticism, but I'd like to see a reference to the work in this direction! My impression was that it's been on the roadmap for maybe a decade, in a really desultory fashion: http://docs.python.org/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock So if it's actually happening that's quite interesting. > You basically *need* a thread safe store separate from any python interpreter; though pythread.h does not rely on the interpreter state; which helps. Anyway, yes, if you can't rely on the interpreter than you'd need some place to store the intern table, but I'm not sure why this would be a problem (in Python 3.6 or whenever it becomes relevant). > 2) you end up with the known comparison values in read-write memory segments rather than readonly segments, which is probably worse on multicore systems? Is it? Can you elaborate? Cache ping-ponging is certainly bad, but that's when multiple cores are writing to the same cache line, I can't see how the TLB flags would matter. I guess the problem would be if you also have some other data in the global variable space that you write to constantly, and then it turned out they were placed next to these read-only comparison values in the same cache line? > I really think that anything that we can do to make this near-c-speed should be done; none of the proposals are *that* complicated. I agree, but I object to codifying the waving on dead chickens. :-) > Using keys, NumPy can in the C code choose to be slower but more readable; but using interned string forces cython to be slower, cython gets no way of choosing to go faster. (to the degree that it has an effect; none of these claims were checked) I think the only slowdown we know of is a few dict lookups at module load time. - N From greg.ewing at canterbury.ac.nz Sat Apr 14 02:24:58 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2012 12:24:58 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: <4F88C3DA.6000009@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > 1) It doesn't work well with multiple interpreter states. Ok, nothing works > with that at the moment, but it is on the roadmap for Python Is it really? I got the impression that it's not considered feasible, since it would require massive changes to the entire implementation and totally break the existing C API. Has someone thought of a way around those problems? -- Greg From d.s.seljebotn at astro.uio.no Sat Apr 14 10:36:55 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 10:36:55 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: <7b6da5f0-8248-42f5-8263-b4cf9ebedf95@email.android.com> Nathaniel Smith wrote: >On Fri, Apr 13, 2012 at 11:22 PM, Dag Sverre Seljebotn > wrote: >> >> >> Robert Bradshaw wrote: >> >>>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith >wrote: >>>> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >>>> wrote: >>>>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>>>> >>>>> I'm almost +1 on your proposal now, but a couple of more ideas: >>>>> >>>>> 1) Let the key (the size_t) spill over to the next specialization >>>entry if >>>>> it is too large; and prepend that key with a continuation code >(two >>>size-ts >>>>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >>>encoding, using >>>>> - as continuation). The key-based caller will expect a >continuation >>>if it >>>>> knows about the specialization, and the prepended char will >prevent >>>spurios >>>>> matches against the overspilled slot. >>>>> >>>>> We could even use the pointers for part of the continuation... >>>> >>>> I am really lost here. Why is any of this complicated encoding >stuff >>>> better than interning? Interning takes one line of code, is >>>incredibly >>>> cheap (one dict lookup per call site and function definition), and >it >>>> lets you check any possible signature (even complicated ones >>>involving >>>> memoryviews) by doing a single-word comparison. And best of all, >you >>>> don't have to think hard to make sure you got the encoding right. >;-) >>>> >>>> On a 32-bit system, pointers are smaller than a size_t, but more >>>> expressive! You can still do binary search if you want, etc. Is the >>>> problem just that interning requires a runtime calculation? Because >I >>>> feel like C users (like numpy) will want to compute these >compressed >>>> codes at module-init anyway, and those of us with a fancy compiler >>>> capable of computing them ahead of time (like Cython) can instruct >>>> that fancy compiler to compute them at module-init time just as >>>> easily? >>> >>>Good question. >>> >>>The primary disadvantage of interning that I see is memory locality. >I >>>suppose if all the C-level caches of interned values were co-located, >>>this may not be as big of an issue. Not being able to compare against >>>compile-time constants may thwart some optimization opportunities, >but >>>that's less clear. > >I would like to see some demonstration of this. E.g., you can run this: > >echo -e '#include \nint main(int argc, char ** argv) { >return strcmp(argv[0], "a"); }' | gcc -S -x c - -o - -O2 | less > >Looks to me like for a short, known-at-compile-time string, with >optimization on, gcc implements it by basically sticking the string in >a global variable and then using a pointer... (If I do argv[0] == >(char *)0x1234, then it places the constant value directly into the >instruction stream. Strangely enough, it does *not* inline the >constant value even if I do memcmp(&argv[0], "\1\2\3\4", 4), which >should be exactly equivalent...!) Right. So: - With keys you have the *option* of hardcoding them, and then they will be in the instruction stream (rather than the instruction stream containing, essentially, a pointer to the key). - With interned, you always have a pointer you must dereference in the instruction stream. > >I think gcc is just as likely to stick a bunch of > static void * interned_dd_to_d; > static void * interned_ll_to_l; >next to each other in the memory image as it is to stick a bunch of >equivalent manifest constants. If you're worried, make it static void >* interned_signatures[NUM_SIGNATURES] -- then they'll definitely be >next to each other. > >>>It also requires coordination common repository, but I suppose one >>>would just stick a set in some standard module (or leverage Python's >>>interning). >> >> More problems: >> >> 1) It doesn't work well with multiple interpreter states. Ok, nothing >works with that at the moment, but it is on the roadmap for Python and >we should not make it worse. > >This isn't a criticism, but I'd like to see a reference to the work in >this direction! My impression was that it's been on the roadmap for >maybe a decade, in a really desultory fashion: >http://docs.python.org/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock >So if it's actually happening that's quite interesting. I wasn't referring to the GIL, but multiple interpreters (where objects from one cannot be used in another). PEP3121 mentions it as one of the things it prepares for. Perhaps that didn't go anywhere, I don't really know. > >> You basically *need* a thread safe store separate from any python >interpreter; though pythread.h does not rely on the interpreter state; >which helps. > >Anyway, yes, if you can't rely on the interpreter than you'd need some >place to store the intern table, but I'm not sure why this would be a >problem (in Python 3.6 or whenever it becomes relevant). > >> 2) you end up with the known comparison values in read-write memory >segments rather than readonly segments, which is probably worse on >multicore systems? > >Is it? Can you elaborate? Cache ping-ponging is certainly bad, but >that's when multiple cores are writing to the same cache line, I can't >see how the TLB flags would matter. > >I guess the problem would be if you also have some other data in the >global variable space that you write to constantly, and then it turned >out they were placed next to these read-only comparison values in the >same cache line? You may be right, my understanding of this is actually too vague. Anyway, if the constant ends up in the instruction stream it is at least one less register load from data cache with the key approach? Dag > >> I really think that anything that we can do to make this near-c-speed >should be done; none of the proposals are *that* complicated. > >I agree, but I object to codifying the waving on dead chickens. :-) > >> Using keys, NumPy can in the C code choose to be slower but more >readable; but using interned string forces cython to be slower, cython >gets no way of choosing to go faster. (to the degree that it has an >effect; none of these claims were checked) > >I think the only slowdown we know of is a few dict lookups at module >load time. > >- N >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Sat Apr 14 10:41:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 10:41:28 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F88C3DA.6000009@canterbury.ac.nz> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> Message-ID: <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> Greg Ewing wrote: >Dag Sverre Seljebotn wrote: > >> 1) It doesn't work well with multiple interpreter states. Ok, nothing >works >> with that at the moment, but it is on the roadmap for Python > >Is it really? I got the impression that it's not considered feasible, >since it would require massive changes to the entire implementation >and totally break the existing C API. Has someone thought of a way >around those problems? I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... You know this a lot better than me. Dag > >-- >Greg >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From stefan_ml at behnel.de Sat Apr 14 10:56:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 10:56:44 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> Message-ID: <4F893BCC.7040706@behnel.de> Dag Sverre Seljebotn, 14.04.2012 10:41: > Greg Ewing wrote: >> Dag Sverre Seljebotn wrote: >> >>> 1) It doesn't work well with multiple interpreter states. Ok, nothing >>> works with that at the moment, but it is on the roadmap for Python >> >> Is it really? I got the impression that it's not considered feasible, >> since it would require massive changes to the entire implementation >> and totally break the existing C API. Has someone thought of a way >> around those problems? > > I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... IIRC, the last status was that even after this PEP, Py3 still has serious issues with keeping extension modules in separate interpreters. And this probably isn't worth doing anything about because it won't work without a major effort in all sorts of places. And I never heard that any extension module even tried to support this. I don't think we should invest too much thought into this direction. Stefan From arfrever.fta at gmail.com Sat Apr 14 12:16:04 2012 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Sat, 14 Apr 2012 12:16:04 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: <201204141216.05543.Arfrever.FTA@gmail.com> 2012-04-12 16:38:37 mark florisson napisa?(a): > Yet another release candidate, this will hopefully be the last before > the 0.16 release. You can grab it from here: > http://wiki.cython.org/ReleaseNotes-0.16 > > There were several fixes for the numpy attribute rewrite, memoryviews > and fused types. Accessing the 'base' attribute of a typed ndarray now > goes through the object layer, which means direct assignment is no > longer supported. > > If there are any problems, please let us know. 4 tests still fail with Python 3.2 (currently 3.2.3). All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. Failures with Python 3.2: ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.cpp:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 24: invalid continuation byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.c:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.cpp:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 24: invalid start byte ---------------------------------------------------------------------- Ran 6485 tests in 2413.255s FAILED (failures=4) ALL DONE -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From markflorisson88 at gmail.com Sat Apr 14 12:46:23 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 11:46:23 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References:

Message-ID: On 12 April 2012 22:00, Wes McKinney wrote: > On Thu, Apr 12, 2012 at 10:38 AM, mark florisson > wrote: >> Yet another release candidate, this will hopefully be the last before >> the 0.16 release. You can grab it from here: >> http://wiki.cython.org/ReleaseNotes-0.16 >> >> There were several fixes for the numpy attribute rewrite, memoryviews >> and fused types. Accessing the 'base' attribute of a typed ndarray now >> goes through the object layer, which means direct assignment is no >> longer supported. >> >> If there are any problems, please let us know. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > I'm unable to build pandas using git master Cython. I just released > pandas 0.7.3 today which has no issues at all with 0.15.1: > > http://pypi.python.org/pypi/pandas > > For example: > > 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace > running build_ext > cythoning pandas/src/tseries.pyx to pandas/src/tseries.c > > Error compiling Cython file: > ------------------------------------------------------------ > ... > ? ? ? ?self.store = {} > > ? ? ? ?ptr = malloc(self.depth * sizeof(int32_t*)) > > ? ? ? ?for i in range(self.depth): > ? ? ? ? ? ?ptr[i] = ( label_arrays[i]).data > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ > ------------------------------------------------------------ > > pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform > > ModuleNode.body = StatListNode(tseries.pyx:1:0) > StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) > StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, > ? ?as_name = u'MultiMap', > ? ?class_name = u'MultiMap', > ? ?doc = u'\n ? ?Need to come up with a better data structure for > multi-level indexing\n ? ?', > ? ?module_name = u'', > ? ?visibility = u'private') > CClassDefNode.body = StatListNode(tseries.pyx:91:4) > StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) > StatListNode.stats[0] = DefNode(tseries.pyx:95:4, > ? ?modifiers = [...]/0, > ? ?name = u'__init__', > ? ?num_required_args = 2, > ? ?py_wrapper_required = True, > ? ?reqd_kw_flags_cname = '0', > ? ?used = True) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:96:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:106:8) > File 'Nodes.py', line 5903, in analyse_expressions: > ForInStatNode(tseries.pyx:106:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:107:21) > File 'Nodes.py', line 4767, in analyse_expressions: > SingleAssignmentNode(tseries.pyx:107:21) > File 'Nodes.py', line 4872, in analyse_types: > SingleAssignmentNode(tseries.pyx:107:21) > File 'ExprNodes.py', line 7082, in analyse_types: > TypecastNode(tseries.pyx:107:21, > ? ?result_is_used = True, > ? ?use_managed_ref = True) > File 'ExprNodes.py', line 4274, in analyse_types: > AttributeNode(tseries.pyx:107:59, > ? ?attribute = u'data', > ? ?initialized_check = True, > ? ?is_attribute = 1, > ? ?member = u'data', > ? ?needs_none_check = True, > ? ?op = '->', > ? ?result_is_used = True, > ? ?use_managed_ref = True) > File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: > AttributeNode(tseries.pyx:107:59, > ? ?attribute = u'data', > ? ?initialized_check = True, > ? ?is_attribute = 1, > ? ?member = u'data', > ? ?needs_none_check = True, > ? ?op = '->', > ? ?result_is_used = True, > ? ?use_managed_ref = True) > File 'ExprNodes.py', line 4436, in analyse_attribute: > AttributeNode(tseries.pyx:107:59, > ? ?attribute = u'data', > ? ?initialized_check = True, > ? ?is_attribute = 1, > ? ?member = u'data', > ? ?needs_none_check = True, > ? ?op = '->', > ? ?result_is_used = True, > ? ?use_managed_ref = True) > > Compiler crash traceback from this point on: > ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", > line 4436, in analyse_attribute > ? ?replacement_node = numpy_transform_attribute_node(self) > ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", > line 18, in numpy_transform_attribute_node > ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope > AttributeError: 'TypecastNode' object has no attribute 'entry' > building 'pandas._tseries' extension > creating build > creating build/temp.linux-x86_64-2.7 > creating build/temp.linux-x86_64-2.7/pandas > creating build/temp.linux-x86_64-2.7/pandas/src > gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC > -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include > -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o > build/temp.linux-x86_64-2.7/pandas/src/tseries.o > pandas/src/tseries.c:1:2: error: #error Do not use this file, it is > the result of a failed Cython compilation. > error: command 'gcc' failed with exit status 1 > > > ----- > > I kludged this particular line in the pandas/timeseries branch so it > will build on git master Cython, but I was treated to dozens of > failures, errors, and finally a segfault in the middle of the test > suite. Suffice to say I'm not sure I would advise you to release the > library in its current state until all of this is resolved. Happy to > help however I can but I'm back to 0.15.1 for now. > > - Wes > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel It seems that the numpy stopgap solution broke something in Pandas, I'm not sure what or how, but it leads to segfaults where code is trying to retrieve objects from a numpy array that are NULL. I tried disabling the numpy rewrites which unbreaks this with the cython release branch, so I think we should do another RC either with the attribute rewrite disabled or fixed. Dag, do you know what could have been broken by this fix that could lead to these results? From stefan_ml at behnel.de Sat Apr 14 13:00:17 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 13:00:17 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <201204141216.05543.Arfrever.FTA@gmail.com> References: <201204141216.05543.Arfrever.FTA@gmail.com> Message-ID: <4F8958C1.40404@behnel.de> Arfrever Frehtes Taifersar Arahesis, 14.04.2012 12:16: > 4 tests still fail with Python 3.2 (currently 3.2.3). > All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. Thanks for the report. > Failures with Python 3.2: > > ====================================================================== > FAIL: NestedWith (withstat) > Doctest: withstat.NestedWith > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest > raise self.failureException(self.format_failure(new.getvalue())) > AssertionError: Failed doctest test for withstat.NestedWith > File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith > > ---------------------------------------------------------------------- > File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith > Failed example: > NestedWith().runTest() > Exception raised: > Traceback (most recent call last): > File "/usr/lib64/python3.2/doctest.py", line 1288, in __run > compileflags, 1), test.globs) > File "", line 1, in > NestedWith().runTest() > File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) > File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) > File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) > File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) > File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func > DeprecationWarning, 2) > File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning > file.write(formatwarning(message, category, filename, lineno, line)) > File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning > line = linecache.getline(filename, lineno) if line is None else line > File "/usr/lib64/python3.2/linecache.py", line 15, in getline > lines = getlines(filename, module_globals) > File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines > return self.save_linecache_getlines(filename, module_globals) > File "/usr/lib64/python3.2/linecache.py", line 41, in getlines > return updatecache(filename, module_globals) > File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache > lines = fp.readlines() > File "/usr/lib64/python3.2/codecs.py", line 300, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte This looks like it's trying to print a DeprecationWarning because of some unittest related problem and fails to format the message for it. Doesn't look Cython related, but I'll see if I can find out something about this. Stefan From markflorisson88 at gmail.com Sat Apr 14 13:02:01 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 12:02:01 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <4F8958C1.40404@behnel.de> References: <201204141216.05543.Arfrever.FTA@gmail.com> <4F8958C1.40404@behnel.de> Message-ID: On 14 April 2012 12:00, Stefan Behnel wrote: > Arfrever Frehtes Taifersar Arahesis, 14.04.2012 12:16: >> 4 tests still fail with Python 3.2 (currently 3.2.3). >> All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. > > Thanks for the report. > Indeed, I just pushed a fix here: https://github.com/markflorisson88/cython/tree/release Afrever, could you retry running this tests, i.e. python runtests.py -vv 'run\.withstat' Thanks for the help! >> Failures with Python 3.2: >> >> ====================================================================== >> FAIL: NestedWith (withstat) >> Doctest: withstat.NestedWith >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ? File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest >> ? ? raise self.failureException(self.format_failure(new.getvalue())) >> AssertionError: Failed doctest test for withstat.NestedWith >> ? File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith >> >> ---------------------------------------------------------------------- >> File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith >> Failed example: >> ? ? NestedWith().runTest() >> Exception raised: >> ? ? Traceback (most recent call last): >> ? ? ? File "/usr/lib64/python3.2/doctest.py", line 1288, in __run >> ? ? ? ? compileflags, 1), test.globs) >> ? ? ? File "", line 1, in >> ? ? ? ? NestedWith().runTest() >> ? ? ? File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) >> ? ? ? File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) >> ? ? ? File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) >> ? ? ? File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) >> ? ? ? File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func >> ? ? ? ? DeprecationWarning, 2) >> ? ? ? File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning >> ? ? ? ? file.write(formatwarning(message, category, filename, lineno, line)) >> ? ? ? File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning >> ? ? ? ? line = linecache.getline(filename, lineno) if line is None else line >> ? ? ? File "/usr/lib64/python3.2/linecache.py", line 15, in getline >> ? ? ? ? lines = getlines(filename, module_globals) >> ? ? ? File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines >> ? ? ? ? return self.save_linecache_getlines(filename, module_globals) >> ? ? ? File "/usr/lib64/python3.2/linecache.py", line 41, in getlines >> ? ? ? ? return updatecache(filename, module_globals) >> ? ? ? File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache >> ? ? ? ? lines = fp.readlines() >> ? ? ? File "/usr/lib64/python3.2/codecs.py", line 300, in decode >> ? ? ? ? (result, consumed) = self._buffer_decode(data, self.errors, final) >> ? ? UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte > > This looks like it's trying to print a DeprecationWarning because of some > unittest related problem and fails to format the message for it. Doesn't > look Cython related, but I'll see if I can find out something about this. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sat Apr 14 15:06:37 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 15:06:37 +0200 Subject: [Cython] Fwd: sage.math cluster OFF until about 9:30am. In-Reply-To: References: Message-ID: <4F89765D.8070603@behnel.de> -------- Original-Message -------- From: William Stein (wstein a gmail.com) Hi, As previously announced a few times, the sage.math cluster is OFF, due to a electrical work that is being done in the building that houses the server room. I expect the machines to be off until about 9:30am. Obviously, anything that runs on those machines -- including http://sagenb.org, http://sagemath.org, etc. -- is off. I don't expect major havoc in getting things back up, since I had a chance to properly shut down all the machines. -- William From d.s.seljebotn at astro.uio.no Sat Apr 14 15:57:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 15:57:28 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References:

Message-ID: <4F898248.5030105@astro.uio.no> On 04/14/2012 12:46 PM, mark florisson wrote: > On 12 April 2012 22:00, Wes McKinney wrote: >> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >> wrote: >>> Yet another release candidate, this will hopefully be the last before >>> the 0.16 release. You can grab it from here: >>> http://wiki.cython.org/ReleaseNotes-0.16 >>> >>> There were several fixes for the numpy attribute rewrite, memoryviews >>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>> goes through the object layer, which means direct assignment is no >>> longer supported. >>> >>> If there are any problems, please let us know. >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> I'm unable to build pandas using git master Cython. I just released >> pandas 0.7.3 today which has no issues at all with 0.15.1: >> >> http://pypi.python.org/pypi/pandas >> >> For example: >> >> 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace >> running build_ext >> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >> >> Error compiling Cython file: >> ------------------------------------------------------------ >> ... >> self.store = {} >> >> ptr = malloc(self.depth * sizeof(int32_t*)) >> >> for i in range(self.depth): >> ptr[i] = ( label_arrays[i]).data >> ^ >> ------------------------------------------------------------ >> >> pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform >> >> ModuleNode.body = StatListNode(tseries.pyx:1:0) >> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >> as_name = u'MultiMap', >> class_name = u'MultiMap', >> doc = u'\n Need to come up with a better data structure for >> multi-level indexing\n ', >> module_name = u'', >> visibility = u'private') >> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >> modifiers = [...]/0, >> name = u'__init__', >> num_required_args = 2, >> py_wrapper_required = True, >> reqd_kw_flags_cname = '0', >> used = True) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:96:8) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:106:8) >> File 'Nodes.py', line 5903, in analyse_expressions: >> ForInStatNode(tseries.pyx:106:8) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:107:21) >> File 'Nodes.py', line 4767, in analyse_expressions: >> SingleAssignmentNode(tseries.pyx:107:21) >> File 'Nodes.py', line 4872, in analyse_types: >> SingleAssignmentNode(tseries.pyx:107:21) >> File 'ExprNodes.py', line 7082, in analyse_types: >> TypecastNode(tseries.pyx:107:21, >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4274, in analyse_types: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4436, in analyse_attribute: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> >> Compiler crash traceback from this point on: >> File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >> line 4436, in analyse_attribute >> replacement_node = numpy_transform_attribute_node(self) >> File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >> line 18, in numpy_transform_attribute_node >> numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >> AttributeError: 'TypecastNode' object has no attribute 'entry' >> building 'pandas._tseries' extension >> creating build >> creating build/temp.linux-x86_64-2.7 >> creating build/temp.linux-x86_64-2.7/pandas >> creating build/temp.linux-x86_64-2.7/pandas/src >> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >> the result of a failed Cython compilation. >> error: command 'gcc' failed with exit status 1 >> >> >> ----- >> >> I kludged this particular line in the pandas/timeseries branch so it >> will build on git master Cython, but I was treated to dozens of >> failures, errors, and finally a segfault in the middle of the test >> suite. Suffice to say I'm not sure I would advise you to release the >> library in its current state until all of this is resolved. Happy to >> help however I can but I'm back to 0.15.1 for now. >> >> - Wes >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > It seems that the numpy stopgap solution broke something in Pandas, > I'm not sure what or how, but it leads to segfaults where code is > trying to retrieve objects from a numpy array that are NULL. I tried > disabling the numpy rewrites which unbreaks this with the cython > release branch, so I think we should do another RC either with the > attribute rewrite disabled or fixed. > > Dag, do you know what could have been broken by this fix that could > lead to these results? I can't imagine what causes a change like you say... one thing that could cause a segfault is that technically we should now call import_array in every module using numpy.pxd; while we don't do that. If a NumPy version is used where PyArray_DATA or similar is not a macro, you would segfault....that should be fixed... Dag From stefan_ml at behnel.de Sat Apr 14 16:18:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 16:18:18 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: <201204141216.05543.Arfrever.FTA@gmail.com> <4F8958C1.40404@behnel.de> Message-ID: <4F89872A.9040200@behnel.de> mark florisson, 14.04.2012 13:02: > I just pushed a fix here: > https://github.com/markflorisson88/cython/tree/release Note that I had already pushed a couple of other fixes into the release branch of the main repo. Stefan From markflorisson88 at gmail.com Sat Apr 14 17:32:31 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 16:32:31 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <4F898248.5030105@astro.uio.no> References:

<4F898248.5030105@astro.uio.no> Message-ID: On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: > On 04/14/2012 12:46 PM, mark florisson wrote: >> >> On 12 April 2012 22:00, Wes McKinney ?wrote: >>> >>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>> ?wrote: >>>> >>>> Yet another release candidate, this will hopefully be the last before >>>> the 0.16 release. You can grab it from here: >>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>> >>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>> goes through the object layer, which means direct assignment is no >>>> longer supported. >>>> >>>> If there are any problems, please let us know. >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> I'm unable to build pandas using git master Cython. I just released >>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>> >>> http://pypi.python.org/pypi/pandas >>> >>> For example: >>> >>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>> running build_ext >>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>> >>> Error compiling Cython file: >>> ------------------------------------------------------------ >>> ... >>> ? ? ? ?self.store = {} >>> >>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>> >>> ? ? ? ?for i in range(self.depth): >>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>> ------------------------------------------------------------ >>> >>> pandas/src/tseries.pyx:107:59: Compiler crash in >>> AnalyseExpressionsTransform >>> >>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>> ? ?as_name = u'MultiMap', >>> ? ?class_name = u'MultiMap', >>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>> multi-level indexing\n ? ?', >>> ? ?module_name = u'', >>> ? ?visibility = u'private') >>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>> ? ?modifiers = [...]/0, >>> ? ?name = u'__init__', >>> ? ?num_required_args = 2, >>> ? ?py_wrapper_required = True, >>> ? ?reqd_kw_flags_cname = '0', >>> ? ?used = True) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:96:8) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:106:8) >>> File 'Nodes.py', line 5903, in analyse_expressions: >>> ForInStatNode(tseries.pyx:106:8) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:107:21) >>> File 'Nodes.py', line 4767, in analyse_expressions: >>> SingleAssignmentNode(tseries.pyx:107:21) >>> File 'Nodes.py', line 4872, in analyse_types: >>> SingleAssignmentNode(tseries.pyx:107:21) >>> File 'ExprNodes.py', line 7082, in analyse_types: >>> TypecastNode(tseries.pyx:107:21, >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> File 'ExprNodes.py', line 4274, in analyse_types: >>> AttributeNode(tseries.pyx:107:59, >>> ? ?attribute = u'data', >>> ? ?initialized_check = True, >>> ? ?is_attribute = 1, >>> ? ?member = u'data', >>> ? ?needs_none_check = True, >>> ? ?op = '->', >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>> AttributeNode(tseries.pyx:107:59, >>> ? ?attribute = u'data', >>> ? ?initialized_check = True, >>> ? ?is_attribute = 1, >>> ? ?member = u'data', >>> ? ?needs_none_check = True, >>> ? ?op = '->', >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>> AttributeNode(tseries.pyx:107:59, >>> ? ?attribute = u'data', >>> ? ?initialized_check = True, >>> ? ?is_attribute = 1, >>> ? ?member = u'data', >>> ? ?needs_none_check = True, >>> ? ?op = '->', >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> >>> Compiler crash traceback from this point on: >>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>> line 4436, in analyse_attribute >>> ? ?replacement_node = numpy_transform_attribute_node(self) >>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>> line 18, in numpy_transform_attribute_node >>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>> building 'pandas._tseries' extension >>> creating build >>> creating build/temp.linux-x86_64-2.7 >>> creating build/temp.linux-x86_64-2.7/pandas >>> creating build/temp.linux-x86_64-2.7/pandas/src >>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>> the result of a failed Cython compilation. >>> error: command 'gcc' failed with exit status 1 >>> >>> >>> ----- >>> >>> I kludged this particular line in the pandas/timeseries branch so it >>> will build on git master Cython, but I was treated to dozens of >>> failures, errors, and finally a segfault in the middle of the test >>> suite. Suffice to say I'm not sure I would advise you to release the >>> library in its current state until all of this is resolved. Happy to >>> help however I can but I'm back to 0.15.1 for now. >>> >>> - Wes >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> It seems that the numpy stopgap solution broke something in Pandas, >> I'm not sure what or how, but it leads to segfaults where code is >> trying to retrieve objects from a numpy array that are NULL. I tried >> disabling the numpy rewrites which unbreaks this with the cython >> release branch, so I think we should do another RC either with the >> attribute rewrite disabled or fixed. >> >> Dag, do you know what could have been broken by this fix that could >> lead to these results? > > > I can't imagine what causes a change like you say... one thing that could > cause a segfault is that technically we should now call import_array in > every module using numpy.pxd; while we don't do that. If a NumPy version is > used where PyArray_DATA or similar is not a macro, you would > segfault....that should be fixed... > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Yeah that makes sense, but the thing is that pandas is already calling import_array everywhere, and the function calls themselves work, it's the result that's NULL. Now this could be a bug in pandas, but seeing that pandas works fine without the stopgap solution (that is, it doesn't pass all the tests but at least it doesn't segfault), I think it's something funky on our side. So I suppose I'll disable the fix for 0.16, and we can try to fix it for the next release. From robertwb at gmail.com Sat Apr 14 20:10:08 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 11:10:08 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F893BCC.7040706@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> Message-ID: On Sat, Apr 14, 2012 at 1:56 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 14.04.2012 10:41: >> Greg Ewing wrote: >>> Dag Sverre Seljebotn wrote: >>> >>>> 1) It doesn't work well with multiple interpreter states. Ok, nothing >>>> works with that at the moment, but it is on the roadmap for Python >>> >>> Is it really? I got the impression that it's not considered feasible, >>> since it would require massive changes to the entire implementation >>> and totally break the existing C API. Has someone thought of a way >>> around those problems? >> >> I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... > > IIRC, the last status was that even after this PEP, Py3 still has serious > issues with keeping extension modules in separate interpreters. And this > probably isn't worth doing anything about because it won't work without a > major effort in all sorts of places. And I never heard that any extension > module even tried to support this. > > I don't think we should invest too much thought into this direction. I had never even heard of this PEP before this thread, but this certainly seems reasonable to me. Aside from this, there is some value with the inlined signature in that a pure C library can easily support the ABI as well. Has anyone done any experiments/timings to see if having constants vs. globals even matters? - Robert From d.s.seljebotn at astro.uio.no Sat Apr 14 21:04:06 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 21:04:06 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> Message-ID: <4F89CA26.2080906@astro.uio.no> On 04/14/2012 08:10 PM, Robert Bradshaw wrote: > On Sat, Apr 14, 2012 at 1:56 AM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 14.04.2012 10:41: >>> Greg Ewing wrote: >>>> Dag Sverre Seljebotn wrote: >>>> >>>>> 1) It doesn't work well with multiple interpreter states. Ok, nothing >>>>> works with that at the moment, but it is on the roadmap for Python >>>> >>>> Is it really? I got the impression that it's not considered feasible, >>>> since it would require massive changes to the entire implementation >>>> and totally break the existing C API. Has someone thought of a way >>>> around those problems? >>> >>> I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... >> >> IIRC, the last status was that even after this PEP, Py3 still has serious >> issues with keeping extension modules in separate interpreters. And this >> probably isn't worth doing anything about because it won't work without a >> major effort in all sorts of places. And I never heard that any extension >> module even tried to support this. >> >> I don't think we should invest too much thought into this direction. A shame; short of getting rid of the GIL, multiple interpreter states would be my favourite shared-memory parallel computation approach, as they could share NumPy buffers (and other C-level structures), without worrying about allocating in process-shared memory (and most data structures, like std::map, won't even work (or, work portably and reliably) in process-shared memory anyway). Multiple seperate interpreter states would be a very nice way of getting the benefits of multi-threading without the disadvantages. > > I had never even heard of this PEP before this thread, but this > certainly seems reasonable to me. Aside from this, there is some value > with the inlined signature in that a pure C library can easily support > the ABI as well. Yes -- I think both "sides" of this discussion prefer their approach out of aesthetics more than performance :-) I'll post a revamped CEP in a minute to at least try to sum them up. > Has anyone done any experiments/timings to see if having constants vs. > globals even matters? It'd be interesting to see; won't have time myself until Monday earliest. Dag From d.s.seljebotn at astro.uio.no Sat Apr 14 21:08:13 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 21:08:13 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87530F.7050000@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> Message-ID: <4F89CB1D.6000109@astro.uio.no> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: > Travis Oliphant recently raised the issue on the NumPy list of what > mechanisms to use to box native functions produced by his Numba so that > SciPy functions can call it, e.g. (I'm making the numba part up): This thread is turning into one of those big ones... But I think it is really worth it in the end; I'm getting excited about the possibility down the road of importing functions using normal Python mechanisms and still have fast calls. Anyway, to organize discussion I've tried to revamp the CEP and describe both the intern-way and the strcmp-way. The wiki appears to be down, so I'll post it below... Dag = CEP 1000: Convention for native dispatches through Python callables = Many callable objects are simply wrappers around native code. This holds for any Cython function, f2py functions, manually written CPython extensions, Numba, etc. Obviously, when native code calls other native code, it would be nice to skip the significant cost of boxing and unboxing all the arguments. Early binding at compile-time is only possible between different Cython modules, not between all the tools listed above. [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects (and is out-of-date w.r.t. this CEP); this CEP is intended to be about a cross-project convention only. If a success, this CEP may be proposesd as a PEP in a modified form. Motivating example (looking a year or two into the future): {{{ @numba def f(x): return 2 * x @cython.inline def g(x : cython.double): return 3 * x from fortranmod import h print f(3) print g(3) print h(3) print scipy.integrate.quad(f, 0.2, 3) # fast callback! print scipy.integrate.quad(g, 0.2, 3) # fast callback! print scipy.integrate.quad(h, 0.2, 3) # fast callback! }}} == The native-call slot == We need ''fast'' access to probing whether a callable object supports this CEP. Other mechanisms, such as an attribute in a dict, is too slow for many purposes (quoting robertwb: "We're trying to get a 300ns dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, if you call a callable in a loop you can fetch the pointer outside of the loop. But in particular if this becomes a language feature in Cython it will be used in all sorts of places.) So we hack another type slot into existing and future CPython implementations in the following way: This CEP provides a C header that for all Python versions define a macro {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in {{{tp_flags}}} in the {{{PyTypeObject}}}. If present, then we extend {{{PyTypeObject}}} as follows: {{{ typedef struct { PyTypeObject tp_main; size_t tp_unofficial_flags; size_t tp_nativecall_offset; } PyUnofficialTypeObject; }}} {{{tp_unofficial_flags}}} is unused and should be all 0 for the time being, but can be used later to indicate features beyond this CEP. If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and the information for doing a native dispatch on a callable {{{obj}}} is located at {{{ (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; }}} === GIL-less accesss === It is OK to access the native-call table without holding the GIL. This should of course only be used to call functions that state in their signature that they don't need the GIL. This is important for JITted callables who would like to rewrite their table as more specializations gets added; if one needs to reallocate the table, the old table must linger along long enough that all threads that are currently accessing it are done with it. == Native dispatch descriptor == The final format for the descriptor is not agreed upon yet; this sums up the major alternatives. The descriptor should be a list of specializations/overload, each described by a function pointer and a signature specification string, such as "id)i" for {{{int f(int, double)}}}. The way it is stored must cater for two cases; first, when the caller expects one or more hard-coded signatures: {{{ if (obj has signature "id)i") { call; } else if (obj has signature "if)i") { call with promoted second argument; } else { box all arguments; PyObject_Call; } }}} The second is when a call stack is built dynamically while parsing the string. Since this has higher overhead anyway, optimizing for the first case makes sense. === Approach 1: Interning/run-time allocated IDs === 1A: Let each overload have a struct {{{ struct { size_t signature_id; char *signature; void *func_ptr; }; }}} Within each process run, there is a 1:1 between {{{signature}}} and {{{signature_id}}}. {{{signature_id}}} is allocated by some central registry. 1B: Intern the string instead: {{{ struct { char *signature; /* pointer must come from the central registry */ void *func_ptr; }; }}} However this is '''not'' trivial, since signature strings can be allocated on the heap (e.g., a JIT would do this), so interned strings must be memory managed and reference counted. This could be done by each object passing in the signature '''both''' when incref-ing and decref-ing the signature string in the interning machinery. Using Python {{{bytes}}} objects is another option. ==== Discussion ==== '''The cost of comparing a signature''': Comparing a global variable (needle) to a value that is guaranteed to already be in cache (candidate match) '''Pros:''' * Conceptually simple struct format. '''Cons:''' * Requires a registry for interning strings. This must be "handshaked" between the implementors of this CEP (probably by "first to get at {{{sys.modules["_nativecall"}}} sticks it there), as we can't ship a common dependency library for this CEP. === Approach 2: Efficient strcmp of verbatim signatures === The idea is to store the full signatures and the function pointers together in the same memory area, but still have some structure to allow for quick scanning through the list. Each entry has the structure {{{[signature_string, funcptr]}}} where: * The signature string has variable length, but the length is divisible by 8 bytes on all platforms. The {{{funcptr}}} is always 8 bytes (it is padded on 32-bit systems). * The total size of the entry should be divisible by 16 bytes (= the signature data should be 8 bytes, or 24 bytes, or...) * All but the first chunk of signature data should start with a continuation character "-", i.e. a really long signature string could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is inserted on all positions in the string divisible by 8, except the first. The point is that if you know a signature, you can quickly scan through the binary blob for the signature in 128 bit increments, without worrying about the variable size nature of each entry. The rules above protects against spurious matches. ==== Optional: Encoding ==== The strcmp approach can be made efficient for larger signatures by using a more efficient encoding than ASCII. E.g., an encoding could use 4 bits for the 12 most common symbols and 8 bits for 64 symbols (for a total of 78 symbols), of which some could be letter combinations ("Zd", "T{"). This should be reasonably simple to encode and decode. The CEP should provide C routines in a header file to work with the signatures. Callers that wish to parse the format string and build a call stack on the fly should probably work with the encoded representation. ==== Discussion ==== '''The cost of comparing a signature''': For the vast majority of functions, the cost is comparing a 64-bit number stored in the CPU instruction stream (needle) to a value that is guaranteed to already be in cache (candidate match). '''Pros:''' * Readability-wise, one can use the C switch statement to dispatch * "Stateless data", for compiled code it does not require any run-time initialization like interning does * One less pointer-dereference in the common case of a short signature '''Cons:''' * Long signatures will require more than 8 bytes to store and could thus be more expensive than interned strings * Format looks uglier in the form of literals in C source code == Signature strings == Example: The function {{{ int f(double x, float y); }}} would have the signature string {{{"df)i"}}} (or, to save space, {{{"idf"}}}). Fields would follow the PEP3118 extensions of the struct-module format string, but with some modifications: * The format should be canonical and fit for {{{strcmp}}}-like comparison: No whitespace, no field names (TBD: what else?) * TBD: Information about GIL requirements (nogil, with gil?), how exceptions are reported * TBD: Support for Cython-specific constructs like memoryview slices (so that arrays with strides and shape can be passed faster than passing an {{{"O"}}}). From markflorisson88 at gmail.com Sat Apr 14 23:00:26 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:00:26 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89CB1D.6000109@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so that >> SciPy functions can call it, e.g. (I'm making the numba part up): > > > > This thread is turning into one of those big ones... > > But I think it is really worth it in the end; I'm getting excited about the > possibility down the road of importing functions using normal Python > mechanisms and still have fast calls. > > Anyway, to organize discussion I've tried to revamp the CEP and describe > both the intern-way and the strcmp-way. > > The wiki appears to be down, so I'll post it below... > > Dag > > = CEP 1000: Convention for native dispatches through Python callables = > > Many callable objects are simply wrappers around native code. ?This > holds for any Cython function, f2py functions, manually > written CPython extensions, Numba, etc. > > Obviously, when native code calls other native code, it would be > nice to skip the significant cost of boxing and unboxing all the arguments. > Early binding at compile-time is only possible > between different Cython modules, not between all the tools > listed above. > > [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects > (and is out-of-date w.r.t. this CEP); this CEP is intended to be about > a cross-project convention only. If a success, this CEP may be > proposesd as a PEP in a modified form. > > Motivating example (looking a year or two into the future): > > {{{ > @numba > def f(x): return 2 * x > > @cython.inline > def g(x : cython.double): return 3 * x > > from fortranmod import h > > print f(3) > print g(3) > print h(3) > print scipy.integrate.quad(f, 0.2, 3) # fast callback! > print scipy.integrate.quad(g, 0.2, 3) # fast callback! > print scipy.integrate.quad(h, 0.2, 3) # fast callback! > > }}} > > == The native-call slot == > > We need ''fast'' access to probing whether a callable object supports > this CEP. ?Other mechanisms, such as an attribute in a dict, is too > slow for many purposes (quoting robertwb: "We're trying to get a 300ns > dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, > if you call a callable in a loop you can fetch the pointer outside > of the loop. But in particular if this becomes a language feature > in Cython it will be used in all sorts of places.) > > So we hack another type slot into existing and future CPython > implementations in the following way: This CEP provides a C header > that for all Python versions define a macro > {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in > {{{tp_flags}}} in the {{{PyTypeObject}}}. > > If present, then we extend {{{PyTypeObject}}} > as follows: > {{{ > typedef struct { > ? ?PyTypeObject tp_main; > ? ?size_t tp_unofficial_flags; > ? ?size_t tp_nativecall_offset; > } PyUnofficialTypeObject; > }}} > > {{{tp_unofficial_flags}}} is unused and should be all 0 for the time > being, but can be used later to indicate features beyond this CEP. > > If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and > the information for doing a native dispatch on a callable {{{obj}}} > is located at > {{{ > (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; > }}} > > === GIL-less accesss === > > It is OK to access the native-call table without holding the GIL. This > should of course only be used to call functions that state in their > signature that they don't need the GIL. > > This is important for JITted callables who would like to rewrite their > table as more specializations gets added; if one needs to reallocate > the table, the old table must linger along long enough that all > threads that are currently accessing it are done with it. > > == Native dispatch descriptor == > > The final format for the descriptor is not agreed upon yet; this sums > up the major alternatives. > > The descriptor should be a list of specializations/overload, each > described by a function pointer and a signature specification > string, such as "id)i" for {{{int f(int, double)}}}. > > The way it is stored must cater for two cases; first, when the caller > expects one or more hard-coded signatures: > {{{ > if (obj has signature "id)i") { > ? ?call; > } else if (obj has signature "if)i") { > ? ?call with promoted second argument; > } else { > ? ?box all arguments; > ? ?PyObject_Call; > } > }}} There may be a lot of promotion/demotion (you likely only want the former) combinations, especially for multiple arguments, so perhaps it makes sense to limit ourselves a bit. For instance for numeric scalar argument types we could limit to long (and the unsigned counterparts), double and double complex. So char, short and int scalars will be promoted to long, float to double and float complex to double complex. Anything bigger, like long long etc will be matched specifically. Promotions and associated demotions if necessary in the callee should be fairly cheap compared to checking all combinations or going through the python layer. > The second is when a call stack is built dynamically while parsing the > string. Since this has higher overhead anyway, optimizing for the first > case makes sense. > > === Approach 1: Interning/run-time allocated IDs === > > > 1A: Let each overload have a struct > {{{ > struct { > ? ?size_t signature_id; > ? ?char *signature; > ? ?void *func_ptr; > }; > }}} > Within each process run, there is a 1:1 between {{{signature}}} and > {{{signature_id}}}. {{{signature_id}}} is allocated by some central > registry. > > 1B: Intern the string instead: > {{{ > struct { > ? ?char *signature; /* pointer must come from the central registry */ > ? ?void *func_ptr; > }; > }}} > However this is '''not'' trivial, since signature strings can > be allocated on the heap (e.g., a JIT would do this), so interned strings > must be memory managed and reference counted. This could be done by > each object passing in the signature '''both''' when incref-ing and > decref-ing the signature string in the interning machinery. > Using Python {{{bytes}}} objects is another option. > > ==== Discussion ==== > > '''The cost of comparing a signature''': Comparing a global variable > (needle) > to a value that is guaranteed to already be in cache (candidate match) > > '''Pros:''' > > ?* Conceptually simple struct format. > > '''Cons:''' > > ?* Requires a registry for interning strings. This must be > ? "handshaked" between the implementors of this CEP (probably by > ? "first to get at {{{sys.modules["_nativecall"}}} sticks it there), > ? as we can't ship a common dependency library for this CEP. > > === Approach 2: Efficient strcmp of verbatim signatures === > > The idea is to store the full signatures and the function pointers together > in the same memory area, but still have some structure to allow for quick > scanning through the list. > > Each entry has the structure {{{[signature_string, funcptr]}}} > where: > > ?* The signature string has variable length, but the length is > ? divisible by 8 bytes on all platforms. The {{{funcptr}}} is always > ? 8 bytes (it is padded on 32-bit systems). > > ?* The total size of the entry should be divisible by 16 bytes (= the > ? signature data should be 8 bytes, or 24 bytes, or...) > > ?* All but the first chunk of signature data should start with a > ? continuation character "-", i.e. a really long signature string > ? could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is > ? inserted on all positions in the string divisible by 8, except the > ? first. > > The point is that if you know a signature, you can quickly scan > through the binary blob for the signature in 128 bit increments, > without worrying about the variable size nature of each entry. ?The > rules above protects against spurious matches. > > ==== Optional: Encoding ==== > > The strcmp approach can be made efficient for larger signatures by > using a more efficient encoding than ASCII. E.g., an encoding could > use 4 bits for the 12 most common symbols and 8 bits > for 64 symbols (for a total of 78 symbols), of which some could be > letter combinations ("Zd", "T{"). This should be reasonably simple > to encode and decode. > > The CEP should provide C routines in a header file to work with the > signatures. Callers that wish to parse the format string and build a > call stack on the fly should probably work with the encoded > representation. > > ==== Discussion ==== > > '''The cost of comparing a signature''': For the vast majority of > functions, the cost is comparing a 64-bit number stored in the CPU > instruction stream (needle) to a value that is guaranteed to already > be in cache (candidate match). > > '''Pros:''' > > ?* Readability-wise, one can use the C switch statement to dispatch > > ?* "Stateless data", for compiled code it does not require any > ? run-time initialization like interning does > > ?* One less pointer-dereference in the common case of a short > ? signature > > '''Cons:''' > > ?* Long signatures will require more than 8 bytes to store and could > ? thus be more expensive than interned strings > > ?* Format looks uglier in the form of literals in C source code > > > == Signature strings == > > Example: The function > {{{ > int f(double x, float y); > }}} > would have the signature string {{{"df)i"}}} (or, to save space, > {{{"idf"}}}). > > Fields would follow the PEP3118 extensions of the struct-module format > string, but with some modifications: > > ?* The format should be canonical and fit for {{{strcmp}}}-like > ? comparison: No whitespace, no field names (TBD: what else?) I think alignment is also a troublemaker. Maybe we should allow '@' (which cannot appear in the character string but will be the default, that is native size, alignment and byteorder) and '^', unaligned native size and byteorder (to be used for packed structs). > ?* TBD: Information about GIL requirements (nogil, with gil?), how > ? exceptions are reported Maybe that could be a separate list, to be consulted mostly for explicit casts (I think PyErr_Occurred() would be the default for non-object return types). > ?* TBD: Support for Cython-specific constructs like memoryview slices > ? (so that arrays with strides and shape can be passed faster than > ? passing an {{{"O"}}}). Definitely, maybe something simple like M{1f}, for a 1D memoryview slice of floats. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sat Apr 14 23:02:05 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 23:02:05 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89CB1D.6000109@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: <4F89E5CD.5060301@behnel.de> Hi, thanks for writing this up. Comments inline as I read through it. Dag Sverre Seljebotn, 14.04.2012 21:08: > === GIL-less accesss === > > It is OK to access the native-call table without holding the GIL. This > should of course only be used to call functions that state in their > signature that they don't need the GIL. > > This is important for JITted callables who would like to rewrite their > table as more specializations gets added; if one needs to reallocate > the table, the old table must linger along long enough that all > threads that are currently accessing it are done with it. The problem here is that changing the table in the face of threaded access is very likely to introduce race conditions, and the average library out there won't know when all threads are done with it. I don't think later modification is a good idea. > == Native dispatch descriptor == > > The final format for the descriptor is not agreed upon yet; this sums > up the major alternatives. > > The descriptor should be a list of specializations/overload While overloaded signatures are great for the callee, they make things much more complicated for the caller. It's no longer just one signature that either matches or not. Especially when we allow more than one expected signature, then each of them has to be compared against all exported signatures. We'll have to see what the runtime impact and the impact on the code complexity is, I guess. > each described by a function pointer and a signature specification > string, such as "id)i" for {{{int f(int, double)}}}. How do we deal with object argument types? Do we care on the caller side? Functions might have alternative signatures that differ in the type of their object parameters. Or should we handle this inside of the caller and expect that it's something like a fused function with internal dispatch in that case? Personally, I think there is not enough to gain from object parameters that we should handle it on the caller side. The callee can dispatch those if necessary. What about signatures that require an object when we have a C typed value? What about signatures that require a C typed argument when we have an arbitrary object value in our call parameters? We should also strip the "self" argument from the parameter list of methods. That's handled by the attribute lookup before even getting at the callable. > === Approach 1: Interning/run-time allocated IDs === > > > 1A: Let each overload have a struct > {{{ > struct { > size_t signature_id; > char *signature; > void *func_ptr; > }; > }}} > Within each process run, there is a 1:1 mapping/relation > between {{{signature}}} and > {{{signature_id}}}. {{{signature_id}}} is allocated by some central > registry. > > 1B: Intern the string instead: > {{{ > struct { > char *signature; /* pointer must come from the central registry */ > void *func_ptr; > }; > }}} > However this is '''not'' trivial, since signature strings can > be allocated on the heap (e.g., a JIT would do this), so interned strings > must be memory managed and reference counted. Not necessarily, they are really short strings that could just live forever, stored efficiently by the registry in a series of larger memory blocks. It would take a while to fill up enough memory with those to become problematic. Finding an efficiently lookup scheme for them might become interesting at some point, but that would also take a while. I don't expect real-world systems to have to deal with thousands of different runtime(!) discovered signatures during one interpreter lifetime. > ==== Discussion ==== > > '''The cost of comparing a signature''': Comparing a global variable (needle) > to a value that is guaranteed to already be in cache (candidate match) > > '''Pros:''' > > * Conceptually simple struct format. > > '''Cons:''' > > * Requires a registry for interning strings. This must be > "handshaked" between the implementors of this CEP (probably by > "first to get at {{{sys.modules["_nativecall"}}} sticks it there), > as we can't ship a common dependency library for this CEP. ... which would eventually end up in the stdlib, but could equally well come from PyPI for now. I don't see a problem with that. Using sys.modules (or another global store) instead of an explicit import allows for dependency injection, that's good. > === Approach 2: Efficient strcmp of verbatim signatures === > > The idea is to store the full signatures and the function pointers together > in the same memory area, but still have some structure to allow for quick > scanning through the list. > > Each entry has the structure {{{[signature_string, funcptr]}}} > where: > > * The signature string has variable length, but the length is > divisible by 8 bytes on all platforms. The {{{funcptr}}} is always > 8 bytes (it is padded on 32-bit systems). > > * The total size of the entry should be divisible by 16 bytes (= the > signature data should be 8 bytes, or 24 bytes, or...) > > * All but the first chunk of signature data should start with a > continuation character "-", i.e. a really long signature string > could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is > inserted on all positions in the string divisible by 8, except the > first. > > The point is that if you know a signature, you can quickly scan > through the binary blob for the signature in 128 bit increments, > without worrying about the variable size nature of each entry. The > rules above protects against spurious matches. Sounds pretty fast to me. Absolutely worth trying. And if we store the signature we compare against in the same format, we won't have to parse the signature string as such, we can really just compare the numeric values. Assuming that's really fast, that would allow the callee to optimistically export additional signatures, e.g. with compatible subtypes or easily coercible types, ordered by the expected overhead of processing the arguments (and the expected probability of being called), so that the caller would automatically hit the fastest call path first when traversing the list from start to end. The number of possible signatures would obviously explode at some point... Note that JITs could still be smart enough to avoid the traversal after a few loop iterations. One problem: if any of the call parameters is a plain object type, identity matches may not work anymore because we won't know what signature to expect. > ==== Optional: Encoding ==== > > The strcmp approach can be made efficient for larger signatures by > using a more efficient encoding than ASCII. E.g., an encoding could > use 4 bits for the 12 most common symbols and 8 bits > for 64 symbols (for a total of 78 symbols), of which some could be > letter combinations ("Zd", "T{"). This should be reasonably simple > to encode and decode. > > The CEP should provide C routines in a header file to work with the > signatures. Callers that wish to parse the format string and build a > call stack on the fly should probably work with the encoded > representation. Huffman codes can be processed bitwise from start to end, that would work. However, this would quickly die when we start adding arbitrary object types. That would require a global registry for user types again. A reason not to care about object types at the caller. Also, how do we encode struct/union argument types? > ==== Discussion ==== > > '''The cost of comparing a signature''': For the vast majority of > functions, the cost is comparing a 64-bit number stored in the CPU > instruction stream (needle) to a value that is guaranteed to already > be in cache (candidate match). > > '''Pros:''' > > * Readability-wise, one can use the C switch statement to dispatch > > * "Stateless data", for compiled code it does not require any > run-time initialization like interning does > > * One less pointer-dereference in the common case of a short > signature > > '''Cons:''' > > * Long signatures will require more than 8 bytes to store and could > thus be more expensive than interned strings We could also ignore trailing arguments and only dispatch based on a fixed number of first arguments. Callees with more arguments would then simply not export native signatures. > * Format looks uglier in the form of literals in C source code They are not meant for reading, and we can always generate a comment with a spelled-out readable signature next to it. > == Signature strings == > > Example: The function > {{{ > int f(double x, float y); > }}} > would have the signature string {{{"df)i"}}} (or, to save space, {{{"idf"}}}). > > Fields would follow the PEP3118 extensions of the struct-module format > string, but with some modifications: > > * The format should be canonical and fit for {{{strcmp}}}-like > comparison: No whitespace, no field names (TBD: what else?) > > * TBD: Information about GIL requirements (nogil, with gil?), how > exceptions are reported What about C++, including C++ exceptions? > * TBD: Support for Cython-specific constructs like memoryview slices > (so that arrays with strides and shape can be passed faster than > passing an {{{"O"}}}). Is this really Cython specific or would a generic Py_buffer struct work? Stefan From stefan_ml at behnel.de Sat Apr 14 23:06:13 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 23:06:13 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: <4F89E6C5.5070007@behnel.de> mark florisson, 14.04.2012 23:00: > On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >> * TBD: Information about GIL requirements (nogil, with gil?), how >> exceptions are reported > > Maybe that could be a separate list, to be consulted mostly for > explicit casts (I think PyErr_Occurred() would be the default for > non-object return types). Good idea. We could have an additional "flags" field for each signature (or maybe just each callable?) that would contain orthogonal information about exception handling and GIL requirements. Stefan From wesmckinn at gmail.com Sat Apr 14 23:13:45 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 14 Apr 2012 17:13:45 -0400 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References:

<4F898248.5030105@astro.uio.no> Message-ID: On Sat, Apr 14, 2012 at 11:32 AM, mark florisson wrote: > On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: >> On 04/14/2012 12:46 PM, mark florisson wrote: >>> >>> On 12 April 2012 22:00, Wes McKinney ?wrote: >>>> >>>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>>> ?wrote: >>>>> >>>>> Yet another release candidate, this will hopefully be the last before >>>>> the 0.16 release. You can grab it from here: >>>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>>> >>>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>>> goes through the object layer, which means direct assignment is no >>>>> longer supported. >>>>> >>>>> If there are any problems, please let us know. >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> >>>> I'm unable to build pandas using git master Cython. I just released >>>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>>> >>>> http://pypi.python.org/pypi/pandas >>>> >>>> For example: >>>> >>>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>>> running build_ext >>>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>>> >>>> Error compiling Cython file: >>>> ------------------------------------------------------------ >>>> ... >>>> ? ? ? ?self.store = {} >>>> >>>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>>> >>>> ? ? ? ?for i in range(self.depth): >>>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>>> ------------------------------------------------------------ >>>> >>>> pandas/src/tseries.pyx:107:59: Compiler crash in >>>> AnalyseExpressionsTransform >>>> >>>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>>> ? ?as_name = u'MultiMap', >>>> ? ?class_name = u'MultiMap', >>>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>>> multi-level indexing\n ? ?', >>>> ? ?module_name = u'', >>>> ? ?visibility = u'private') >>>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>>> ? ?modifiers = [...]/0, >>>> ? ?name = u'__init__', >>>> ? ?num_required_args = 2, >>>> ? ?py_wrapper_required = True, >>>> ? ?reqd_kw_flags_cname = '0', >>>> ? ?used = True) >>>> File 'Nodes.py', line 342, in analyse_expressions: >>>> StatListNode(tseries.pyx:96:8) >>>> File 'Nodes.py', line 342, in analyse_expressions: >>>> StatListNode(tseries.pyx:106:8) >>>> File 'Nodes.py', line 5903, in analyse_expressions: >>>> ForInStatNode(tseries.pyx:106:8) >>>> File 'Nodes.py', line 342, in analyse_expressions: >>>> StatListNode(tseries.pyx:107:21) >>>> File 'Nodes.py', line 4767, in analyse_expressions: >>>> SingleAssignmentNode(tseries.pyx:107:21) >>>> File 'Nodes.py', line 4872, in analyse_types: >>>> SingleAssignmentNode(tseries.pyx:107:21) >>>> File 'ExprNodes.py', line 7082, in analyse_types: >>>> TypecastNode(tseries.pyx:107:21, >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> File 'ExprNodes.py', line 4274, in analyse_types: >>>> AttributeNode(tseries.pyx:107:59, >>>> ? ?attribute = u'data', >>>> ? ?initialized_check = True, >>>> ? ?is_attribute = 1, >>>> ? ?member = u'data', >>>> ? ?needs_none_check = True, >>>> ? ?op = '->', >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>>> AttributeNode(tseries.pyx:107:59, >>>> ? ?attribute = u'data', >>>> ? ?initialized_check = True, >>>> ? ?is_attribute = 1, >>>> ? ?member = u'data', >>>> ? ?needs_none_check = True, >>>> ? ?op = '->', >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>>> AttributeNode(tseries.pyx:107:59, >>>> ? ?attribute = u'data', >>>> ? ?initialized_check = True, >>>> ? ?is_attribute = 1, >>>> ? ?member = u'data', >>>> ? ?needs_none_check = True, >>>> ? ?op = '->', >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> >>>> Compiler crash traceback from this point on: >>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>>> line 4436, in analyse_attribute >>>> ? ?replacement_node = numpy_transform_attribute_node(self) >>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>>> line 18, in numpy_transform_attribute_node >>>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>>> building 'pandas._tseries' extension >>>> creating build >>>> creating build/temp.linux-x86_64-2.7 >>>> creating build/temp.linux-x86_64-2.7/pandas >>>> creating build/temp.linux-x86_64-2.7/pandas/src >>>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>>> the result of a failed Cython compilation. >>>> error: command 'gcc' failed with exit status 1 >>>> >>>> >>>> ----- >>>> >>>> I kludged this particular line in the pandas/timeseries branch so it >>>> will build on git master Cython, but I was treated to dozens of >>>> failures, errors, and finally a segfault in the middle of the test >>>> suite. Suffice to say I'm not sure I would advise you to release the >>>> library in its current state until all of this is resolved. Happy to >>>> help however I can but I'm back to 0.15.1 for now. >>>> >>>> - Wes >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> It seems that the numpy stopgap solution broke something in Pandas, >>> I'm not sure what or how, but it leads to segfaults where code is >>> trying to retrieve objects from a numpy array that are NULL. I tried >>> disabling the numpy rewrites which unbreaks this with the cython >>> release branch, so I think we should do another RC either with the >>> attribute rewrite disabled or fixed. >>> >>> Dag, do you know what could have been broken by this fix that could >>> lead to these results? >> >> >> I can't imagine what causes a change like you say... one thing that could >> cause a segfault is that technically we should now call import_array in >> every module using numpy.pxd; while we don't do that. If a NumPy version is >> used where PyArray_DATA or similar is not a macro, you would >> segfault....that should be fixed... >> >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Yeah that makes sense, but the thing is that pandas is already calling > import_array everywhere, and the function calls themselves work, it's > the result that's NULL. Now this could be a bug in pandas, but seeing > that pandas works fine without the stopgap solution (that is, it > doesn't pass all the tests but at least it doesn't segfault), I think > it's something funky on our side. > > So I suppose I'll disable the fix for 0.16, and we can try to fix it > for the next release. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Where is the bug in pandas / bad memory access? Maybe something I can work around? From markflorisson88 at gmail.com Sat Apr 14 23:15:22 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:15:22 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89E5CD.5060301@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> Message-ID: On 14 April 2012 22:02, Stefan Behnel wrote: > Hi, > > thanks for writing this up. Comments inline as I read through it. > > Dag Sverre Seljebotn, 14.04.2012 21:08: >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. > > The problem here is that changing the table in the face of threaded access > is very likely to introduce race conditions, and the average library out > there won't know when all threads are done with it. I don't think later > modification is a good idea. > > >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload > > While overloaded signatures are great for the callee, they make things much > more complicated for the caller. It's no longer just one signature that > either matches or not. Especially when we allow more than one expected > signature, then each of them has to be compared against all exported > signatures. > > We'll have to see what the runtime impact and the impact on the code > complexity is, I guess. > > >> each described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. > > How do we deal with object argument types? Do we care on the caller side? > Functions might have alternative signatures that differ in the type of > their object parameters. Or should we handle this inside of the caller and > expect that it's something like a fused function with internal dispatch in > that case? > > Personally, I think there is not enough to gain from object parameters that > we should handle it on the caller side. The callee can dispatch those if > necessary. > > What about signatures that require an object when we have a C typed value? > > What about signatures that require a C typed argument when we have an > arbitrary object value in our call parameters? > > We should also strip the "self" argument from the parameter list of > methods. That's handled by the attribute lookup before even getting at the > callable. > > >> === Approach 1: Interning/run-time allocated IDs === >> >> >> 1A: Let each overload have a struct >> {{{ >> struct { >> ? ? size_t signature_id; >> ? ? char *signature; >> ? ? void *func_ptr; >> }; >> }}} >> Within each process run, there is a 1:1 > > mapping/relation > >> between {{{signature}}} and >> {{{signature_id}}}. {{{signature_id}}} is allocated by some central >> registry. >> >> 1B: Intern the string instead: >> {{{ >> struct { >> ? ? char *signature; /* pointer must come from the central registry */ >> ? ? void *func_ptr; >> }; >> }}} >> However this is '''not'' trivial, since signature strings can >> be allocated on the heap (e.g., a JIT would do this), so interned strings >> must be memory managed and reference counted. > > Not necessarily, they are really short strings that could just live > forever, stored efficiently by the registry in a series of larger memory > blocks. It would take a while to fill up enough memory with those to become > problematic. Finding an efficiently lookup scheme for them might become > interesting at some point, but that would also take a while. > > I don't expect real-world systems to have to deal with thousands of > different runtime(!) discovered signatures during one interpreter lifetime. > > >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': Comparing a global variable (needle) >> to a value that is guaranteed to already be in cache (candidate match) >> >> '''Pros:''' >> >> ?* Conceptually simple struct format. >> >> '''Cons:''' >> >> ?* Requires a registry for interning strings. This must be >> ? ?"handshaked" between the implementors of this CEP (probably by >> ? ?"first to get at {{{sys.modules["_nativecall"}}} sticks it there), >> ? ?as we can't ship a common dependency library for this CEP. > > ... which would eventually end up in the stdlib, but could equally well > come from PyPI for now. I don't see a problem with that. > > Using sys.modules (or another global store) instead of an explicit import > allows for dependency injection, that's good. > > >> === Approach 2: Efficient strcmp of verbatim signatures === >> >> The idea is to store the full signatures and the function pointers together >> in the same memory area, but still have some structure to allow for quick >> scanning through the list. >> >> Each entry has the structure {{{[signature_string, funcptr]}}} >> where: >> >> ?* The signature string has variable length, but the length is >> ? ?divisible by 8 bytes on all platforms. The {{{funcptr}}} is always >> ? ?8 bytes (it is padded on 32-bit systems). >> >> ?* The total size of the entry should be divisible by 16 bytes (= the >> ? ?signature data should be 8 bytes, or 24 bytes, or...) >> >> ?* All but the first chunk of signature data should start with a >> ? ?continuation character "-", i.e. a really long signature string >> ? ?could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is >> ? ?inserted on all positions in the string divisible by 8, except the >> ? ?first. >> >> The point is that if you know a signature, you can quickly scan >> through the binary blob for the signature in 128 bit increments, >> without worrying about the variable size nature of each entry. ?The >> rules above protects against spurious matches. > > Sounds pretty fast to me. Absolutely worth trying. And if we store the > signature we compare against in the same format, we won't have to parse the > signature string as such, we can really just compare the numeric values. > Assuming that's really fast, that would allow the callee to optimistically > export additional signatures, e.g. with compatible subtypes or easily > coercible types, ordered by the expected overhead of processing the > arguments (and the expected probability of being called), so that the > caller would automatically hit the fastest call path first when traversing > the list from start to end. The number of possible signatures would > obviously explode at some point... > > Note that JITs could still be smart enough to avoid the traversal after a > few loop iterations. > > One problem: if any of the call parameters is a plain object type, identity > matches may not work anymore because we won't know what signature to expect. > > >> ==== Optional: Encoding ==== >> >> The strcmp approach can be made efficient for larger signatures by >> using a more efficient encoding than ASCII. E.g., an encoding could >> use 4 bits for the 12 most common symbols and 8 bits >> for 64 symbols (for a total of 78 symbols), of which some could be >> letter combinations ("Zd", "T{"). This should be reasonably simple >> to encode and decode. >> >> The CEP should provide C routines in a header file to work with the >> signatures. Callers that wish to parse the format string and build a >> call stack on the fly should probably work with the encoded >> representation. > > Huffman codes can be processed bitwise from start to end, that would work. > > However, this would quickly die when we start adding arbitrary object > types. That would require a global registry for user types again. A reason > not to care about object types at the caller. > > Also, how do we encode struct/union argument types? > > >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': For the vast majority of >> functions, the cost is comparing a 64-bit number stored in the CPU >> instruction stream (needle) to a value that is guaranteed to already >> be in cache (candidate match). >> >> '''Pros:''' >> >> ?* Readability-wise, one can use the C switch statement to dispatch >> >> ?* "Stateless data", for compiled code it does not require any >> ? ?run-time initialization like interning does >> >> ?* One less pointer-dereference in the common case of a short >> ? ?signature >> >> '''Cons:''' >> >> ?* Long signatures will require more than 8 bytes to store and could >> ? ?thus be more expensive than interned strings > > We could also ignore trailing arguments and only dispatch based on a fixed > number of first arguments. Callees with more arguments would then simply > not export native signatures. > > >> ?* Format looks uglier in the form of literals in C source code > > They are not meant for reading, and we can always generate a comment with a > spelled-out readable signature next to it. > > >> == Signature strings == >> >> Example: The function >> {{{ >> int f(double x, float y); >> }}} >> would have the signature string {{{"df)i"}}} (or, to save space, {{{"idf"}}}). >> >> Fields would follow the PEP3118 extensions of the struct-module format >> string, but with some modifications: >> >> ?* The format should be canonical and fit for {{{strcmp}}}-like >> ? ?comparison: No whitespace, no field names (TBD: what else?) >> >> ?* TBD: Information about GIL requirements (nogil, with gil?), how >> ? ?exceptions are reported > > What about C++, including C++ exceptions? > > >> ?* TBD: Support for Cython-specific constructs like memoryview slices >> ? ?(so that arrays with strides and shape can be passed faster than >> ? ?passing an {{{"O"}}}). > > Is this really Cython specific or would a generic Py_buffer struct work? That could work through simple unboxing wrapper functions, but it would add some overhead, specifically because it would have to check the buffer's object, and if it didn't exist or was not a memoryview object, it would have to create one (checking whether something is a memoryview object would also be a pain, as each module has a different memoryview type). That could still be feasible for interaction with Cython functions from non-Cython code. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sat Apr 14 23:21:28 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:21:28 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References:

<4F898248.5030105@astro.uio.no>

Message-ID: On 14 April 2012 22:13, Wes McKinney wrote: > On Sat, Apr 14, 2012 at 11:32 AM, mark florisson > wrote: >> On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: >>> On 04/14/2012 12:46 PM, mark florisson wrote: >>>> >>>> On 12 April 2012 22:00, Wes McKinney ?wrote: >>>>> >>>>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>>>> ?wrote: >>>>>> >>>>>> Yet another release candidate, this will hopefully be the last before >>>>>> the 0.16 release. You can grab it from here: >>>>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>>>> >>>>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>>>> goes through the object layer, which means direct assignment is no >>>>>> longer supported. >>>>>> >>>>>> If there are any problems, please let us know. >>>>>> _______________________________________________ >>>>>> cython-devel mailing list >>>>>> cython-devel at python.org >>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>> >>>>> >>>>> I'm unable to build pandas using git master Cython. I just released >>>>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>>>> >>>>> http://pypi.python.org/pypi/pandas >>>>> >>>>> For example: >>>>> >>>>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>>>> running build_ext >>>>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>>>> >>>>> Error compiling Cython file: >>>>> ------------------------------------------------------------ >>>>> ... >>>>> ? ? ? ?self.store = {} >>>>> >>>>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>>>> >>>>> ? ? ? ?for i in range(self.depth): >>>>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>>>> ------------------------------------------------------------ >>>>> >>>>> pandas/src/tseries.pyx:107:59: Compiler crash in >>>>> AnalyseExpressionsTransform >>>>> >>>>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>>>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>>>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>>>> ? ?as_name = u'MultiMap', >>>>> ? ?class_name = u'MultiMap', >>>>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>>>> multi-level indexing\n ? ?', >>>>> ? ?module_name = u'', >>>>> ? ?visibility = u'private') >>>>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>>>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>>>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>>>> ? ?modifiers = [...]/0, >>>>> ? ?name = u'__init__', >>>>> ? ?num_required_args = 2, >>>>> ? ?py_wrapper_required = True, >>>>> ? ?reqd_kw_flags_cname = '0', >>>>> ? ?used = True) >>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>> StatListNode(tseries.pyx:96:8) >>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>> StatListNode(tseries.pyx:106:8) >>>>> File 'Nodes.py', line 5903, in analyse_expressions: >>>>> ForInStatNode(tseries.pyx:106:8) >>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>> StatListNode(tseries.pyx:107:21) >>>>> File 'Nodes.py', line 4767, in analyse_expressions: >>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>> File 'Nodes.py', line 4872, in analyse_types: >>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>> File 'ExprNodes.py', line 7082, in analyse_types: >>>>> TypecastNode(tseries.pyx:107:21, >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> File 'ExprNodes.py', line 4274, in analyse_types: >>>>> AttributeNode(tseries.pyx:107:59, >>>>> ? ?attribute = u'data', >>>>> ? ?initialized_check = True, >>>>> ? ?is_attribute = 1, >>>>> ? ?member = u'data', >>>>> ? ?needs_none_check = True, >>>>> ? ?op = '->', >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>>>> AttributeNode(tseries.pyx:107:59, >>>>> ? ?attribute = u'data', >>>>> ? ?initialized_check = True, >>>>> ? ?is_attribute = 1, >>>>> ? ?member = u'data', >>>>> ? ?needs_none_check = True, >>>>> ? ?op = '->', >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>>>> AttributeNode(tseries.pyx:107:59, >>>>> ? ?attribute = u'data', >>>>> ? ?initialized_check = True, >>>>> ? ?is_attribute = 1, >>>>> ? ?member = u'data', >>>>> ? ?needs_none_check = True, >>>>> ? ?op = '->', >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> >>>>> Compiler crash traceback from this point on: >>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>>>> line 4436, in analyse_attribute >>>>> ? ?replacement_node = numpy_transform_attribute_node(self) >>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>>>> line 18, in numpy_transform_attribute_node >>>>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>>>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>>>> building 'pandas._tseries' extension >>>>> creating build >>>>> creating build/temp.linux-x86_64-2.7 >>>>> creating build/temp.linux-x86_64-2.7/pandas >>>>> creating build/temp.linux-x86_64-2.7/pandas/src >>>>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>>>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>>>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>>>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>>>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>>>> the result of a failed Cython compilation. >>>>> error: command 'gcc' failed with exit status 1 >>>>> >>>>> >>>>> ----- >>>>> >>>>> I kludged this particular line in the pandas/timeseries branch so it >>>>> will build on git master Cython, but I was treated to dozens of >>>>> failures, errors, and finally a segfault in the middle of the test >>>>> suite. Suffice to say I'm not sure I would advise you to release the >>>>> library in its current state until all of this is resolved. Happy to >>>>> help however I can but I'm back to 0.15.1 for now. >>>>> >>>>> - Wes >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> >>>> It seems that the numpy stopgap solution broke something in Pandas, >>>> I'm not sure what or how, but it leads to segfaults where code is >>>> trying to retrieve objects from a numpy array that are NULL. I tried >>>> disabling the numpy rewrites which unbreaks this with the cython >>>> release branch, so I think we should do another RC either with the >>>> attribute rewrite disabled or fixed. >>>> >>>> Dag, do you know what could have been broken by this fix that could >>>> lead to these results? >>> >>> >>> I can't imagine what causes a change like you say... one thing that could >>> cause a segfault is that technically we should now call import_array in >>> every module using numpy.pxd; while we don't do that. If a NumPy version is >>> used where PyArray_DATA or similar is not a macro, you would >>> segfault....that should be fixed... >>> >>> Dag >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> Yeah that makes sense, but the thing is that pandas is already calling >> import_array everywhere, and the function calls themselves work, it's >> the result that's NULL. Now this could be a bug in pandas, but seeing >> that pandas works fine without the stopgap solution (that is, it >> doesn't pass all the tests but at least it doesn't segfault), I think >> it's something funky on our side. >> >> So I suppose I'll disable the fix for 0.16, and we can try to fix it >> for the next release. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Where is the bug in pandas / bad memory access? Maybe something I can > work around? > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel It may have something to do with the Sliders, I'm not sure, but without looking carefully at them they look somewhat dangerous. Anyway, here is a traceback from the Cython debugger: #7 0x00000000080dd760 in () at /home/mark/apps/bin/nosetests:8 8 load_entry_point('nose==1.1.2', 'console_scripts', 'nosetests')() #18 0x00000000080dd760 in __init__() at /home/mark/apps/lib/python2.7/site-packages/nose/core.py:118 118 **extra_args) #25 0x00000000080dd760 in __init__() at /home/mark/apps/lib/python2.7/unittest/main.py:95 95 self.runTests() #28 0x00000000080dd760 in runTests() at /home/mark/apps/lib/python2.7/site-packages/nose/core.py:197 197 result = self.testRunner.run(self.test) #31 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/core.py:61 61 test(result) #41 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #46 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #56 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/unittest/suite.py:65 65 return self.run(*args, **kwds) #61 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:74 74 test(result) #71 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #76 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #86 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #91 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #101 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #106 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #116 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #121 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #131 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/case.py:45 45 return self.run(*arg, **kwarg) #136 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/case.py:133 133 self.runTest(result) #139 0x00000000080dd760 in runTest() at /home/mark/apps/lib/python2.7/site-packages/nose/case.py:151 151 test(result) #149 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/unittest/case.py:376 376 return self.run(*args, **kwds) #154 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/unittest/case.py:318 318 testMethod() #157 0x00000000080dd760 in test_as_index_series_return_frame() at /home/mark/code/pandas/pandas/tests/test_groupby.py:710 710 expected = grouped.agg(np.sum).ix[:, ['A', 'C']] #161 0x00000000080dd760 in agg() at /home/mark/code/pandas/pandas/core/groupby.py:282 282 return self.aggregate(func, *args, **kwargs) #166 0x00000000080dd760 in aggregate() at /home/mark/code/pandas/pandas/core/groupby.py:1050 1050 result = self._aggregate_generic(arg, *args---Type to continue, or q to quit--- ;49;00m, **kwargs) #171 0x00000000080dd760 in _aggregate_generic() at /home/mark/code/pandas/pandas/core/groupby.py:1103 1103 return self._aggregate_item_by_item(func, *args, **kwargs) #176 0x00000000080dd760 in _aggregate_item_by_item() at /home/mark/code/pandas/pandas/core/groupby.py:1137 1137 result[item] = colg.agg(func, *args, **kwargs) #181 0x00000000080dd760 in agg() at /home/mark/code/pandas/pandas/core/groupby.py:282 282 return self.aggregate(func, *args, **kwargs) #186 0x00000000080dd760 in aggregate() at /home/mark/code/pandas/pandas/core/groupby.py:795 795 return self._python_agg_general(func_or_funcs, *args, **kwargs) #191 0x00000000080dd760 in _python_agg_general() at /home/mark/code/pandas/pandas/core/groupby.py:370 370 comp_ids, max_group) #194 0x00000000080dd760 in _aggregate_series() at /home/mark/code/pandas/pandas/core/groupby.py:421 421 return self._aggregate_series_fast(obj, func, group_index, ngroups) #197 0x00000000080dd760 in _aggregate_series_fast() at /home/mark/code/pandas/pandas/core/groupby.py:437 437 result, counts = grouper.get_result() #199 0x000000000091880e in get_result() at /home/mark/code/pandas/pandas/src/tseries.pyx:127 127 else: #204 0x00000000080dd760 in () at /home/mark/code/pandas/pandas/core/groupby.py:361 361 agg_func = lambda x: func(x, *args, **kwargs) #209 0x00000000080dd760 in sum() at /home/mark/apps/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1455 1455 return sum(axis, dtype, out) #213 0x00000000080dd760 in sum() at /home/mark/code/pandas/pandas/core/series.py:862 862 return nanops.nansum(self.values, skipna=skipna) #217 0x00000000080dd760 in f() at /home/mark/code/pandas/pandas/core/nanops.py:28 28 result = alt(values, axis=axis, skipna=skipna, **kwargs) #222 0x00000000080dd760 in _nansum() at /home/mark/code/pandas/pandas/core/nanops.py:48 48 mask = isnull(values) #225 0x00000000080dd760 in isnull() at /home/mark/code/pandas/pandas/core/common.py:60 60 vec = lib.isnullobj(obj.ravel()) #227 0x000000000088efe0 in isnullobj() at /home/mark/code/pandas/pandas/src/tseries.pyx:224 224 cpdef checknull(object val): Actually that last line is wrong, as the debugger is confused by Cython's 'include' statement (that has to be fixed as well at some point :). The error occurs on line 240 in isnullobj on the statement 'val = arr[i]', because arr[i] is a NULL PyObject *, so the incref fails. If you have any idea why the stopgap solution results in different behaviour, please let us know. From markflorisson88 at gmail.com Sat Apr 14 23:22:50 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:22:50 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References:

<4F898248.5030105@astro.uio.no>

Message-ID: On 14 April 2012 22:21, mark florisson wrote: > On 14 April 2012 22:13, Wes McKinney wrote: >> On Sat, Apr 14, 2012 at 11:32 AM, mark florisson >> wrote: >>> On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: >>>> On 04/14/2012 12:46 PM, mark florisson wrote: >>>>> >>>>> On 12 April 2012 22:00, Wes McKinney ?wrote: >>>>>> >>>>>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>>>>> ?wrote: >>>>>>> >>>>>>> Yet another release candidate, this will hopefully be the last before >>>>>>> the 0.16 release. You can grab it from here: >>>>>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>>>>> >>>>>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>>>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>>>>> goes through the object layer, which means direct assignment is no >>>>>>> longer supported. >>>>>>> >>>>>>> If there are any problems, please let us know. >>>>>>> _______________________________________________ >>>>>>> cython-devel mailing list >>>>>>> cython-devel at python.org >>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>> >>>>>> >>>>>> I'm unable to build pandas using git master Cython. I just released >>>>>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>>>>> >>>>>> http://pypi.python.org/pypi/pandas >>>>>> >>>>>> For example: >>>>>> >>>>>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>>>>> running build_ext >>>>>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>>>>> >>>>>> Error compiling Cython file: >>>>>> ------------------------------------------------------------ >>>>>> ... >>>>>> ? ? ? ?self.store = {} >>>>>> >>>>>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>>>>> >>>>>> ? ? ? ?for i in range(self.depth): >>>>>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> pandas/src/tseries.pyx:107:59: Compiler crash in >>>>>> AnalyseExpressionsTransform >>>>>> >>>>>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>>>>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>>>>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>>>>> ? ?as_name = u'MultiMap', >>>>>> ? ?class_name = u'MultiMap', >>>>>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>>>>> multi-level indexing\n ? ?', >>>>>> ? ?module_name = u'', >>>>>> ? ?visibility = u'private') >>>>>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>>>>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>>>>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>>>>> ? ?modifiers = [...]/0, >>>>>> ? ?name = u'__init__', >>>>>> ? ?num_required_args = 2, >>>>>> ? ?py_wrapper_required = True, >>>>>> ? ?reqd_kw_flags_cname = '0', >>>>>> ? ?used = True) >>>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>>> StatListNode(tseries.pyx:96:8) >>>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>>> StatListNode(tseries.pyx:106:8) >>>>>> File 'Nodes.py', line 5903, in analyse_expressions: >>>>>> ForInStatNode(tseries.pyx:106:8) >>>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>>> StatListNode(tseries.pyx:107:21) >>>>>> File 'Nodes.py', line 4767, in analyse_expressions: >>>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>>> File 'Nodes.py', line 4872, in analyse_types: >>>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>>> File 'ExprNodes.py', line 7082, in analyse_types: >>>>>> TypecastNode(tseries.pyx:107:21, >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> File 'ExprNodes.py', line 4274, in analyse_types: >>>>>> AttributeNode(tseries.pyx:107:59, >>>>>> ? ?attribute = u'data', >>>>>> ? ?initialized_check = True, >>>>>> ? ?is_attribute = 1, >>>>>> ? ?member = u'data', >>>>>> ? ?needs_none_check = True, >>>>>> ? ?op = '->', >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>>>>> AttributeNode(tseries.pyx:107:59, >>>>>> ? ?attribute = u'data', >>>>>> ? ?initialized_check = True, >>>>>> ? ?is_attribute = 1, >>>>>> ? ?member = u'data', >>>>>> ? ?needs_none_check = True, >>>>>> ? ?op = '->', >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>>>>> AttributeNode(tseries.pyx:107:59, >>>>>> ? ?attribute = u'data', >>>>>> ? ?initialized_check = True, >>>>>> ? ?is_attribute = 1, >>>>>> ? ?member = u'data', >>>>>> ? ?needs_none_check = True, >>>>>> ? ?op = '->', >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> >>>>>> Compiler crash traceback from this point on: >>>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>>>>> line 4436, in analyse_attribute >>>>>> ? ?replacement_node = numpy_transform_attribute_node(self) >>>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>>>>> line 18, in numpy_transform_attribute_node >>>>>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>>>>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>>>>> building 'pandas._tseries' extension >>>>>> creating build >>>>>> creating build/temp.linux-x86_64-2.7 >>>>>> creating build/temp.linux-x86_64-2.7/pandas >>>>>> creating build/temp.linux-x86_64-2.7/pandas/src >>>>>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>>>>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>>>>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>>>>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>>>>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>>>>> the result of a failed Cython compilation. >>>>>> error: command 'gcc' failed with exit status 1 >>>>>> >>>>>> >>>>>> ----- >>>>>> >>>>>> I kludged this particular line in the pandas/timeseries branch so it >>>>>> will build on git master Cython, but I was treated to dozens of >>>>>> failures, errors, and finally a segfault in the middle of the test >>>>>> suite. Suffice to say I'm not sure I would advise you to release the >>>>>> library in its current state until all of this is resolved. Happy to >>>>>> help however I can but I'm back to 0.15.1 for now. >>>>>> >>>>>> - Wes >>>>>> _______________________________________________ >>>>>> cython-devel mailing list >>>>>> cython-devel at python.org >>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>> >>>>> >>>>> It seems that the numpy stopgap solution broke something in Pandas, >>>>> I'm not sure what or how, but it leads to segfaults where code is >>>>> trying to retrieve objects from a numpy array that are NULL. I tried >>>>> disabling the numpy rewrites which unbreaks this with the cython >>>>> release branch, so I think we should do another RC either with the >>>>> attribute rewrite disabled or fixed. >>>>> >>>>> Dag, do you know what could have been broken by this fix that could >>>>> lead to these results? >>>> >>>> >>>> I can't imagine what causes a change like you say... one thing that could >>>> cause a segfault is that technically we should now call import_array in >>>> every module using numpy.pxd; while we don't do that. If a NumPy version is >>>> used where PyArray_DATA or similar is not a macro, you would >>>> segfault....that should be fixed... >>>> >>>> Dag >>>> >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> Yeah that makes sense, but the thing is that pandas is already calling >>> import_array everywhere, and the function calls themselves work, it's >>> the result that's NULL. Now this could be a bug in pandas, but seeing >>> that pandas works fine without the stopgap solution (that is, it >>> doesn't pass all the tests but at least it doesn't segfault), I think >>> it's something funky on our side. >>> >>> So I suppose I'll disable the fix for 0.16, and we can try to fix it >>> for the next release. >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> Where is the bug in pandas / bad memory access? Maybe something I can >> work around? >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > It may have something to do with the Sliders, I'm not sure, but > without looking carefully at them they look somewhat dangerous. > Anyway, here is a traceback from the Cython debugger: > > #7 ?0x00000000080dd760 in () at /home/mark/apps/bin/nosetests:8 > ? ? ? ? 8 ? ? ? load_entry_point('nose==1.1.2', 'console_scripts', > 'nosetests')() > #18 0x00000000080dd760 in __init__() at > /home/mark/apps/lib/python2.7/site-packages/nose/core.py:118 > ? ? ? 118 ? ? ? ? ? ? ? ?**extra_args) > #25 0x00000000080dd760 in __init__() at > /home/mark/apps/lib/python2.7/unittest/main.py:95 > ? ? ? ?95 ? ? ? ? ? ?self.runTests() > #28 0x00000000080dd760 in runTests() at > /home/mark/apps/lib/python2.7/site-packages/nose/core.py:197 > ? ? ? 197 ? ? ? ? ? ?result = self.testRunner.run(self.test) > #31 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/core.py:61 > ? ? ? ?61 ? ? ? ? ? ?test(result) > #41 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #46 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #56 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/unittest/suite.py:65 > ? ? ? ?65 ? ? ? ? ? ?return self.run(*args, **kwds) > #61 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:74 > ? ? ? ?74 ? ? ? ? ? ? ? ?test(result) > #71 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #76 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #86 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #91 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #101 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #106 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #116 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #121 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #131 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/case.py:45 > ? ? ? ?45 ? ? ? ? ? ?return self.run(*arg, **kwarg) > #136 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/case.py:133 > ? ? ? 133 ? ? ? ? ? ? ? ? ? ?self.runTest(result) > #139 0x00000000080dd760 in runTest() at > /home/mark/apps/lib/python2.7/site-packages/nose/case.py:151 > ? ? ? 151 ? ? ? ? ? ?test(result) > #149 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/unittest/case.py:376 > ? ? ? 376 ? ? ? ? ? ?return self.run(*args, **kwds) > #154 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/unittest/case.py:318 > ? ? ? 318 ? ? ? ? ? ? ? ? ? ? ? ?testMethod() > #157 0x00000000080dd760 in test_as_index_series_return_frame() at > /home/mark/code/pandas/pandas/tests/test_groupby.py:710 > ? ? ? 710 ? ? ? ? ? ?expected = grouped.agg(np.sum).ix[:, ['A', 'C']] > #161 0x00000000080dd760 in agg() at > /home/mark/code/pandas/pandas/core/groupby.py:282 > ? ? ? 282 ? ? ? ? ? ?return self.aggregate(func, *args, **kwargs) > #166 0x00000000080dd760 in aggregate() at > /home/mark/code/pandas/pandas/core/groupby.py:1050 > ? ? ?1050 ? ? ? ? ? ? ? ? ? ?result = self._aggregate_generic(arg, > *args---Type to continue, or q to quit--- > ;49;00m, **kwargs) > #171 0x00000000080dd760 in _aggregate_generic() at > /home/mark/code/pandas/pandas/core/groupby.py:1103 > ? ? ?1103 ? ? ? ? ? ? ? ? ? ?return > self._aggregate_item_by_item(func, *args, **kwargs) > #176 0x00000000080dd760 in _aggregate_item_by_item() at > /home/mark/code/pandas/pandas/core/groupby.py:1137 > ? ? ?1137 ? ? ? ? ? ? ? ? ? ?result[item] = colg.agg(func, *args, **kwargs) > #181 0x00000000080dd760 in agg() at > /home/mark/code/pandas/pandas/core/groupby.py:282 > ? ? ? 282 ? ? ? ? ? ?return self.aggregate(func, *args, **kwargs) > #186 0x00000000080dd760 in aggregate() at > /home/mark/code/pandas/pandas/core/groupby.py:795 > ? ? ? 795 ? ? ? ? ? ? ? ? ? ?return > self._python_agg_general(func_or_funcs, *args, **kwargs) > #191 0x00000000080dd760 in _python_agg_general() at > /home/mark/code/pandas/pandas/core/groupby.py:370 > ? ? ? 370 > comp_ids, max_group) > #194 0x00000000080dd760 in _aggregate_series() at > /home/mark/code/pandas/pandas/core/groupby.py:421 > ? ? ? 421 ? ? ? ? ? ? ? ?return self._aggregate_series_fast(obj, > func, group_index, ngroups) > #197 0x00000000080dd760 in _aggregate_series_fast() at > /home/mark/code/pandas/pandas/core/groupby.py:437 > ? ? ? 437 ? ? ? ? ? ?result, counts = grouper.get_result() > #199 0x000000000091880e in get_result() at > /home/mark/code/pandas/pandas/src/tseries.pyx:127 > ? ? ? 127 ? ? ? ? ? ? ? ?else: > #204 0x00000000080dd760 in () at > /home/mark/code/pandas/pandas/core/groupby.py:361 > ? ? ? 361 ? ? ? ? ? ?agg_func = lambda x: func(x, *args, **kwargs) > #209 0x00000000080dd760 in sum() at > /home/mark/apps/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1455 > ? ? ?1455 ? ? ? ?return sum(axis, dtype, out) > #213 0x00000000080dd760 in sum() at > /home/mark/code/pandas/pandas/core/series.py:862 > ? ? ? 862 ? ? ? ? ? ?return nanops.nansum(self.values, skipna=skipna) > #217 0x00000000080dd760 in f() at > /home/mark/code/pandas/pandas/core/nanops.py:28 > ? ? ? ?28 ? ? ? ? ? ? ? ? ? ?result = alt(values, axis=axis, > skipna=skipna, **kwargs) > #222 0x00000000080dd760 in _nansum() at > /home/mark/code/pandas/pandas/core/nanops.py:48 > ? ? ? ?48 ? ? ? ?mask = isnull(values) > #225 0x00000000080dd760 in isnull() at > /home/mark/code/pandas/pandas/core/common.py:60 > ? ? ? ?60 ? ? ? ? ? ? ? ?vec = lib.isnullobj(obj.ravel()) > #227 0x000000000088efe0 in isnullobj() at > /home/mark/code/pandas/pandas/src/tseries.pyx:224 > ? ? ? 224 ? ?cpdef checknull(object val): > > Actually that last line is wrong, as the debugger is confused by > Cython's 'include' statement (that has to be fixed as well at some > point :). The error occurs on line 240 in isnullobj on the statement > 'val = arr[i]', because arr[i] is a NULL PyObject *, so the incref > fails. > > If you have any idea why the stopgap solution results in different > behaviour, please let us know. (The get_result() is actually from reduce.pyx, not from tseries.pyx, but again the debugger is confused by the include of reduce.pyx). From d.s.seljebotn at astro.uio.no Sat Apr 14 23:30:04 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 23:30:04 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89E6C5.5070007@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E6C5.5070007@behnel.de> Message-ID: <911f5414-8247-44a7-bc14-68c1362139ed@email.android.com> Stefan Behnel wrote: >mark florisson, 14.04.2012 23:00: >> On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >>> * TBD: Information about GIL requirements (nogil, with gil?), how >>> exceptions are reported >> >> Maybe that could be a separate list, to be consulted mostly for >> explicit casts (I think PyErr_Occurred() would be the default for >> non-object return types). > >Good idea. We could have an additional "flags" field for each signature >(or >maybe just each callable?) that would contain orthogonal information >about >exception handling and GIL requirements. I don't think gil/nogil is orthogonal at all; I think you could export both versions as two different overloads (so that one can jump past gil-acquisition in with-gil-functions, etc) Dag > >Stefan >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From greg.ewing at canterbury.ac.nz Sun Apr 15 02:28:59 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Apr 2012 12:28:59 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com>

<66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> Message-ID: <4F8A164B.1050507@canterbury.ac.nz> Robert Bradshaw wrote: > Has anyone done any experiments/timings to see if having constants vs. > globals even matters? My gut feeling is that one extra memory read is going to be insignificant compared to the time taken by the call itself and whatever it does. But of course gut feelings are always better when backed up (or refuted!) by measurements. -- Greg From greg.ewing at canterbury.ac.nz Sun Apr 15 03:07:43 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Apr 2012 13:07:43 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89CB1D.6000109@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: <4F8A1F5F.3030602@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > if (obj has signature "id)i") { This is an aside, but is it really necessary to define the signature syntax in a way that involves unmatched parens? Some editors (such as the one I like to use) get confused by this, even when they're inside quotes. The answer "get a better editor" would be entirely appropriate if there were some advantage to this syntax, over a non-unbalanced one, but I can't see any. -- Greg From robertwb at gmail.com Sun Apr 15 07:59:01 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 22:59:01 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: > On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so that >>> SciPy functions can call it, e.g. (I'm making the numba part up): >> >> >> >> This thread is turning into one of those big ones... >> >> But I think it is really worth it in the end; I'm getting excited about the >> possibility down the road of importing functions using normal Python >> mechanisms and still have fast calls. >> >> Anyway, to organize discussion I've tried to revamp the CEP and describe >> both the intern-way and the strcmp-way. >> >> The wiki appears to be down, so I'll post it below... >> >> Dag >> >> = CEP 1000: Convention for native dispatches through Python callables = >> >> Many callable objects are simply wrappers around native code. ?This >> holds for any Cython function, f2py functions, manually >> written CPython extensions, Numba, etc. >> >> Obviously, when native code calls other native code, it would be >> nice to skip the significant cost of boxing and unboxing all the arguments. >> Early binding at compile-time is only possible >> between different Cython modules, not between all the tools >> listed above. >> >> [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects >> (and is out-of-date w.r.t. this CEP); this CEP is intended to be about >> a cross-project convention only. If a success, this CEP may be >> proposesd as a PEP in a modified form. >> >> Motivating example (looking a year or two into the future): >> >> {{{ >> @numba >> def f(x): return 2 * x >> >> @cython.inline >> def g(x : cython.double): return 3 * x >> >> from fortranmod import h >> >> print f(3) >> print g(3) >> print h(3) >> print scipy.integrate.quad(f, 0.2, 3) # fast callback! >> print scipy.integrate.quad(g, 0.2, 3) # fast callback! >> print scipy.integrate.quad(h, 0.2, 3) # fast callback! >> >> }}} >> >> == The native-call slot == >> >> We need ''fast'' access to probing whether a callable object supports >> this CEP. ?Other mechanisms, such as an attribute in a dict, is too >> slow for many purposes (quoting robertwb: "We're trying to get a 300ns >> dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, >> if you call a callable in a loop you can fetch the pointer outside >> of the loop. But in particular if this becomes a language feature >> in Cython it will be used in all sorts of places.) >> >> So we hack another type slot into existing and future CPython >> implementations in the following way: This CEP provides a C header >> that for all Python versions define a macro >> {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in >> {{{tp_flags}}} in the {{{PyTypeObject}}}. >> >> If present, then we extend {{{PyTypeObject}}} >> as follows: >> {{{ >> typedef struct { >> ? ?PyTypeObject tp_main; >> ? ?size_t tp_unofficial_flags; >> ? ?size_t tp_nativecall_offset; >> } PyUnofficialTypeObject; >> }}} >> >> {{{tp_unofficial_flags}}} is unused and should be all 0 for the time >> being, but can be used later to indicate features beyond this CEP. >> >> If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and >> the information for doing a native dispatch on a callable {{{obj}}} >> is located at >> {{{ >> (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; >> }}} >> >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. >> >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload, each >> described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. >> >> The way it is stored must cater for two cases; first, when the caller >> expects one or more hard-coded signatures: >> {{{ >> if (obj has signature "id)i") { >> ? ?call; >> } else if (obj has signature "if)i") { >> ? ?call with promoted second argument; >> } else { >> ? ?box all arguments; >> ? ?PyObject_Call; >> } >> }}} > > There may be a lot of promotion/demotion (you likely only want the > former) combinations, especially for multiple arguments, so perhaps it > makes sense to limit ourselves a bit. For instance for numeric scalar > argument types we could limit to long (and the unsigned counterparts), > double and double complex. > > So char, short and int scalars will be > promoted to long, float to double and float complex to double complex. > Anything bigger, like long long etc will be matched specifically. > Promotions and associated demotions if necessary in the callee should > be fairly cheap compared to checking all combinations or going through > the python layer. True, though this could be a convention rather than a requirement of the spec. Long vs. < long seems natural, but are there any systems where (scalar) float still has an advantage over double? Of course pointers like float* vs double* can't be promoted, so we would still need this kind of type declaration. >> The second is when a call stack is built dynamically while parsing the >> string. Since this has higher overhead anyway, optimizing for the first >> case makes sense. >> >> === Approach 1: Interning/run-time allocated IDs === >> >> >> 1A: Let each overload have a struct >> {{{ >> struct { >> ? ?size_t signature_id; >> ? ?char *signature; >> ? ?void *func_ptr; >> }; >> }}} >> Within each process run, there is a 1:1 between {{{signature}}} and >> {{{signature_id}}}. {{{signature_id}}} is allocated by some central >> registry. >> >> 1B: Intern the string instead: >> {{{ >> struct { >> ? ?char *signature; /* pointer must come from the central registry */ >> ? ?void *func_ptr; >> }; >> }}} >> However this is '''not'' trivial, since signature strings can >> be allocated on the heap (e.g., a JIT would do this), so interned strings >> must be memory managed and reference counted. This could be done by >> each object passing in the signature '''both''' when incref-ing and >> decref-ing the signature string in the interning machinery. >> Using Python {{{bytes}}} objects is another option. >> >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': Comparing a global variable >> (needle) >> to a value that is guaranteed to already be in cache (candidate match) >> >> '''Pros:''' >> >> ?* Conceptually simple struct format. >> >> '''Cons:''' >> >> ?* Requires a registry for interning strings. This must be >> ? "handshaked" between the implementors of this CEP (probably by >> ? "first to get at {{{sys.modules["_nativecall"}}} sticks it there), >> ? as we can't ship a common dependency library for this CEP. >> >> === Approach 2: Efficient strcmp of verbatim signatures === >> >> The idea is to store the full signatures and the function pointers together >> in the same memory area, but still have some structure to allow for quick >> scanning through the list. >> >> Each entry has the structure {{{[signature_string, funcptr]}}} >> where: >> >> ?* The signature string has variable length, but the length is >> ? divisible by 8 bytes on all platforms. The {{{funcptr}}} is always >> ? 8 bytes (it is padded on 32-bit systems). >> >> ?* The total size of the entry should be divisible by 16 bytes (= the >> ? signature data should be 8 bytes, or 24 bytes, or...) >> >> ?* All but the first chunk of signature data should start with a >> ? continuation character "-", i.e. a really long signature string >> ? could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is >> ? inserted on all positions in the string divisible by 8, except the >> ? first. >> >> The point is that if you know a signature, you can quickly scan >> through the binary blob for the signature in 128 bit increments, >> without worrying about the variable size nature of each entry. ?The >> rules above protects against spurious matches. Note that these two approaches need not be mutually exclusive; a cutoff could be established giving the best (and worst) of both. >> ==== Optional: Encoding ==== >> >> The strcmp approach can be made efficient for larger signatures by >> using a more efficient encoding than ASCII. E.g., an encoding could >> use 4 bits for the 12 most common symbols and 8 bits >> for 64 symbols (for a total of 78 symbols), of which some could be >> letter combinations ("Zd", "T{"). This should be reasonably simple >> to encode and decode. >> >> The CEP should provide C routines in a header file to work with the >> signatures. Callers that wish to parse the format string and build a >> call stack on the fly should probably work with the encoded >> representation. >> >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': For the vast majority of >> functions, the cost is comparing a 64-bit number stored in the CPU >> instruction stream (needle) to a value that is guaranteed to already >> be in cache (candidate match). >> >> '''Pros:''' >> >> ?* Readability-wise, one can use the C switch statement to dispatch >> >> ?* "Stateless data", for compiled code it does not require any >> ? run-time initialization like interning does >> >> ?* One less pointer-dereference in the common case of a short >> ? signature >> >> '''Cons:''' >> >> ?* Long signatures will require more than 8 bytes to store and could >> ? thus be more expensive than interned strings >> >> ?* Format looks uglier in the form of literals in C source code >> >> >> == Signature strings == >> >> Example: The function >> {{{ >> int f(double x, float y); >> }}} >> would have the signature string {{{"df)i"}}} (or, to save space, >> {{{"idf"}}}). >> >> Fields would follow the PEP3118 extensions of the struct-module format >> string, but with some modifications: >> >> ?* The format should be canonical and fit for {{{strcmp}}}-like >> ? comparison: No whitespace, no field names (TBD: what else?) > > I think alignment is also a troublemaker. Maybe we should allow '@' > (which cannot appear in the character string but will be the default, > that is native size, alignment and byteorder) and '^', unaligned > native size and byteorder (to be used for packed structs). > >> ?* TBD: Information about GIL requirements (nogil, with gil?), how >> ? exceptions are reported > > Maybe that could be a separate list, to be consulted mostly for > explicit casts (I think PyErr_Occurred() would be the default for > non-object return types). > >> ?* TBD: Support for Cython-specific constructs like memoryview slices >> ? (so that arrays with strides and shape can be passed faster than >> ? passing an {{{"O"}}}). > > Definitely, maybe something simple like M{1f}, for a 1D memoryview > slice of floats. It would certainly be useful to have special syntax for memory views (after nailing down a well-defined ABI for them) and builtin types. Being able to declare something as taking a "sage.rings.integer.Integer" could also prove useful, but could result in long (and prefix-sharing) signatures, favoring the runtime-allocated ids. From robertwb at gmail.com Sun Apr 15 08:07:13 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 23:07:13 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A164B.1050507@canterbury.ac.nz> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no>