From arfrever.fta at gmail.com Sun Apr 1 20:23:27 2012 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Sun, 1 Apr 2012 20:23:27 +0200 Subject: [Cython] Cython 0.16 Release Candidate In-Reply-To: References: Message-ID: <201204012023.30671.Arfrever.FTA@gmail.com> All tests pass with Python 2.6 (2.6.7 release). All tests pass with Python 2.7 (snapshot of 2.7 branch, revision 3623c3e6c049). All tests pass with Python 3.1 (3.1.4 release). 4 failures with Python 3.2 (snapshot of 3.2 branch, revision 0a4a6f98bd8e). Failures with Python 3.2: ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.cpp:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 24: invalid continuation byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/c/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.c:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc0/work/Cython-0.16rc0/tests-3.2/run/cpp/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.cpp:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 24: invalid start byte ---------------------------------------------------------------------- Ran 6475 tests in 2225.023s FAILED (failures=4) ALL DONE -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From stefan_ml at behnel.de Mon Apr 2 13:13:29 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 02 Apr 2012 13:13:29 +0200 Subject: [Cython] [cython-users] GSoC 2012 In-Reply-To: References: <4F58B3DA.5070602@behnel.de> <4F5C5D62.20003@behnel.de> Message-ID: <4F7989D9.5030508@behnel.de> Vitja Makarov, 11.03.2012 09:51: > 2012/3/11 Stefan Behnel: >> mark florisson, 11.03.2012 07:44: >>> - better type inference, that would be enabled by default and again >>> handle thing like reassignments of variables and fallbacks to the >>> default object type. With entry caching Cython could build a database >>> of types ((extension) classes, functions, variables) used in the >>> modules and functions that are compiled (also def functions), and >>> infer the types used and specialize on those. Maybe a switch should be >>> added to cython to handle circular dependencies, or maybe with the >>> distutils preprocessing it can run all the type inference first and >>> keep track of unresolved entries, and try to fill those in after >>> building the database. For bonus points the user can be allowed to >>> write plugins to aid the process. >> >> That would be my favourite. We definitely need control flow driven type >> inference, local type specialisation, variable renaming, etc. Maybe even >> whole program (or at least module) analysis, like ShedSkin and PyPy do for >> their restricted Python dialects. Any serious step towards that goal would >> be a good outcome of a GSoC. > > I think we should be careful here and try to avoid making Cython code > more complicated. I agree that WPA is probably way out of scope. However, control flow driven type inference would allow us to infer the type of a variable in a given block, e.g. for code like this: if isinstance(x, list): ... else: ... or handle cases like this: def test(x): x = list(x) # ... do read-only stuff with x below this point ... Here, we currently infer that x is an unknown object that is being assigned to twice, even though it's obviously a list in all interesting parts of the function. Stefan From vitja.makarov at gmail.com Mon Apr 2 14:14:20 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 2 Apr 2012 16:14:20 +0400 Subject: [Cython] [cython-users] GSoC 2012 In-Reply-To: <4F7989D9.5030508@behnel.de> References: <4F58B3DA.5070602@behnel.de> <4F5C5D62.20003@behnel.de> <4F7989D9.5030508@behnel.de> Message-ID: 2012/4/2 Stefan Behnel : > Vitja Makarov, 11.03.2012 09:51: >> 2012/3/11 Stefan Behnel: >>> mark florisson, 11.03.2012 07:44: >>>> - better type inference, that would be enabled by default and again >>>> handle thing like reassignments of variables and fallbacks to the >>>> default object type. With entry caching Cython could build a database >>>> of types ((extension) classes, functions, variables) used in the >>>> modules and functions that are compiled (also def functions), and >>>> infer the types used and specialize on those. Maybe a switch should be >>>> added to cython to handle circular dependencies, or maybe with the >>>> distutils preprocessing it can run all the type inference first and >>>> keep track of unresolved entries, and try to fill those in after >>>> building the database. For bonus points the user can be allowed to >>>> write plugins to aid the process. >>> >>> That would be my favourite. We definitely need control flow driven type >>> inference, local type specialisation, variable renaming, etc. Maybe even >>> whole program (or at least module) analysis, like ShedSkin and PyPy do for >>> their restricted Python dialects. Any serious step towards that goal would >>> be a good outcome of a GSoC. >> >> I think we should be careful here and try to avoid making Cython code >> more complicated. > > I agree that WPA is probably way out of scope. However, control flow driven > type inference would allow us to infer the type of a variable in a given > block, e.g. for code like this: > > ?if isinstance(x, list): > ? ? ?... > ?else: > ? ? ?... > > or handle cases like this: > > ?def test(x): > ? ? ?x = list(x) > ? ? ?# ... do read-only stuff with x below this point ... > > Here, we currently infer that x is an unknown object that is being assigned > to twice, even though it's obviously a list in all interesting parts of the > function. > What to do if an entry is of PyObject type in some block and of some C-type in another? Should it be splitten into two different entries? -- vitja. From stefan_ml at behnel.de Mon Apr 2 14:23:22 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 02 Apr 2012 14:23:22 +0200 Subject: [Cython] [cython-users] GSoC 2012 In-Reply-To: References: <4F58B3DA.5070602@behnel.de> <4F5C5D62.20003@behnel.de> <4F7989D9.5030508@behnel.de> Message-ID: <4F799A3A.80006@behnel.de> Vitja Makarov, 02.04.2012 14:14: > 2012/4/2 Stefan Behnel: >> Vitja Makarov, 11.03.2012 09:51: >>> 2012/3/11 Stefan Behnel: >>>> mark florisson, 11.03.2012 07:44: >>>>> - better type inference, that would be enabled by default and again >>>>> handle thing like reassignments of variables and fallbacks to the >>>>> default object type. With entry caching Cython could build a database >>>>> of types ((extension) classes, functions, variables) used in the >>>>> modules and functions that are compiled (also def functions), and >>>>> infer the types used and specialize on those. Maybe a switch should be >>>>> added to cython to handle circular dependencies, or maybe with the >>>>> distutils preprocessing it can run all the type inference first and >>>>> keep track of unresolved entries, and try to fill those in after >>>>> building the database. For bonus points the user can be allowed to >>>>> write plugins to aid the process. >>>> >>>> That would be my favourite. We definitely need control flow driven type >>>> inference, local type specialisation, variable renaming, etc. Maybe even >>>> whole program (or at least module) analysis, like ShedSkin and PyPy do for >>>> their restricted Python dialects. Any serious step towards that goal would >>>> be a good outcome of a GSoC. >>> >>> I think we should be careful here and try to avoid making Cython code >>> more complicated. >> >> I agree that WPA is probably way out of scope. However, control flow driven >> type inference would allow us to infer the type of a variable in a given >> block, e.g. for code like this: >> >> if isinstance(x, list): >> ... >> else: >> ... >> >> or handle cases like this: >> >> def test(x): >> x = list(x) >> # ... do read-only stuff with x below this point ... >> >> Here, we currently infer that x is an unknown object that is being assigned >> to twice, even though it's obviously a list in all interesting parts of the >> function. >> > > What to do if an entry is of PyObject type in some block and of some > C-type in another? > > Should it be split into two different entries? Yes, that's what I meant with "variable renaming". I admit that I have no idea how complex that would be, though... Stefan From stefan_ml at behnel.de Tue Apr 3 13:59:56 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 03 Apr 2012 13:59:56 +0200 Subject: [Cython] class optimisations (Re: [cython-users] How to pass Cython flags from Distutils?) In-Reply-To: References: <4F683927.10408@mnw-scan.com> <4F798797.2070807@behnel.de> <4F7A6813.3080709@mnw-scan.com> <4F7A921D.1010700@behnel.de> Message-ID: <4F7AE63C.6090805@behnel.de> [moving this discussion from cython-users to cython-devel] Robert Bradshaw, 03.04.2012 09:43: > On Mon, Apr 2, 2012 at 11:01 PM, Stefan Behnel wrote: >> Robert Bradshaw, 03.04.2012 07:51: >>> auto_cpdef is expiremental >> >> Is that another word for "deprecated"? > > No, it's another word for "incomplete." Ah, just a typo then. > Can something be deprecated if > it was never even finished? It's probably something we should > eventually do by default as an optimization, at least for methods, as > well as letting compiled classes become cdef classes (minus the > semantic idiosyncrasies) whenever possible (can we always detect this? We can at least start with the "obviously safe" cases, assuming we find any. A "__slots__" field would be a good indicator, for example. And when we get extension types to have a __dict__, that should fix a lot of the differences already. > What about subclasses that want to multiply-inherit? You can inherit from multiple extension types in a Python type, and classes with more than one parent aren't candidates anyway. So this doesn't restrict us. > It may still be > an option worth finishing up, making it easy to automatically take a > (slightly-incompatible) step towards static binding. Yes, the option could be extended to include classes at some point, before we go for more automatic default optimisations. That would keep the changes explicit at the beginning. Stefan From wesmckinn at gmail.com Tue Apr 3 23:18:41 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 3 Apr 2012 17:18:41 -0400 Subject: [Cython] Bug report with 0.16 RC Message-ID: I don't have a Trac account yet, but wanted to report this bug with the 0.16 RC. This function worked fine under 0.15.1: @cython.wraparound(False) @cython.boundscheck(False) def is_lexsorted(list list_of_arrays): cdef: int i Py_ssize_t n, nlevels int32_t k, cur, pre ndarray arr nlevels = len(list_of_arrays) n = len(list_of_arrays[0]) cdef int32_t **vecs = malloc(nlevels * sizeof(int32_t*)) for i from 0 <= i < nlevels: vecs[i] = ( list_of_arrays[i]).data # assume uniqueness?? for i from 1 <= i < n: for k from 0 <= k < nlevels: cur = vecs[k][i] pre = vecs[k][i-1] if cur == pre: continue elif cur > pre: break else: return False free(vecs) return True gives this error: python setup.py build_ext --inplace running build_ext cythoning pandas/src/tseries.pyx to pandas/src/tseries.c Error compiling Cython file: ------------------------------------------------------------ ... nlevels = len(list_of_arrays) n = len(list_of_arrays[0]) cdef int32_t **vecs = malloc(nlevels * sizeof(int32_t*)) for i from 0 <= i < nlevels: vecs[i] = ( list_of_arrays[i]).data ^ ------------------------------------------------------------ pandas/src/groupby.pyx:120:59: Compiler crash in AnalyseExpressionsTransform ModuleNode.body = StatListNode(tseries.pyx:1:0) StatListNode.stats[52] = StatListNode(groupby.pyx:4:0) StatListNode.stats[6] = CompilerDirectivesNode(groupby.pyx:109:0) CompilerDirectivesNode.body = StatListNode(groupby.pyx:109:0) StatListNode.stats[0] = DefNode(groupby.pyx:109:0, modifiers = [...]/0, name = u'is_lexsorted', num_required_args = 1, py_wrapper_required = True, reqd_kw_flags_cname = '0', used = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(groupby.pyx:110:4, is_terminator = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(groupby.pyx:119:4) File 'Nodes.py', line 6054, in analyse_expressions: ForFromStatNode(groupby.pyx:119:4, relation1 = u'<=', relation2 = u'<') File 'Nodes.py', line 342, in analyse_expressions: StatListNode(groupby.pyx:120:18) File 'Nodes.py', line 4778, in analyse_expressions: SingleAssignmentNode(groupby.pyx:120:18) File 'Nodes.py', line 4883, in analyse_types: SingleAssignmentNode(groupby.pyx:120:18) File 'ExprNodes.py', line 7079, in analyse_types: TypecastNode(groupby.pyx:120:18, result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4274, in analyse_types: AttributeNode(groupby.pyx:120:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: AttributeNode(groupby.pyx:120:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4436, in analyse_attribute: AttributeNode(groupby.pyx:120:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) Compiler crash traceback from this point on: File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", line 4436, in analyse_attribute replacement_node = numpy_transform_attribute_node(self) File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", line 18, in numpy_transform_attribute_node numpy_pxd_scope = node.obj.entry.type.scope.parent_scope AttributeError: 'TypecastNode' object has no attribute 'entry' From stefan_ml at behnel.de Mon Apr 9 09:16:10 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 09 Apr 2012 09:16:10 +0200 Subject: [Cython] Cython on PyPy is (mostly) functional Message-ID: <4F828CBA.7030805@behnel.de> Hi, Cython is now mostly functional on the latest PyPy nightly builds. https://sage.math.washington.edu:8091/hudson/job/cython-scoder-pypy-nightly/ There are still crashers and a number of tests are disabled for that reason, but the number of passing tests makes it fair to consider it usable (if it works, it works). Most of the failing tests are due to bugs in PyPy's cpyext (the C-API compatibility layer), and most of the crashers as well. Some doctests just fail due to different exception messages, PyPy has a famous history of that. Also, basically any test for __dealloc__() methods is bound to fail because PyPy's garbage collector has no way of making sure that they have been called at a given point. Still, it's worth taking another look through the test results, because Cython can sometimes work around problems in cpyext more easily than it would be to really fix them on PyPy side. One major source of problems are borrowed references, because PyPy cannot easily guarantee that they stay alive in C space when all owned references are in Python space. Their memory management can move objects around, for example, and cpyext can't block that because it can't know when a borrowed reference dies. That means that something as ubiquitous in Cython as PyTuple_GET_ITEM() may not always work well, and is also far from being as fast in cpyext as in CPython. The crashers can be seen in a forked complete test run in addition to the stripped test job above: https://sage.math.washington.edu:8091/hudson/job/cython-scoder-pypy-nightly-safe/lastBuild/consoleFull Interestingly, specifically the new features, i.e. memory views and fused functions, current account for a number of crashes. Likely not that hard to fix on our side, but needs investigation. I put up a pull request with the current changes: https://github.com/cython/cython/pull/110 Nothing controversial, really, but worth looking through to see what kind of problems to expect when writing code for PyPy. Once these are merged, I'll copy over my PyPy test jobs in Jenkins to the cython-devel jobs. That means that you should start taking a look at those after pushing your changes and get used to either writing your code in a portable way or providing a fallback code path for PyPy. Use the CYTHON_COMPILING_IN_(PYPY|CPYTHON) macros for that. At some point, it'll become interesting to revisit these specialisations and to start benchmarking them in PyPy. However, given that PyPy's entire cpyext hasn't received any major optimisation yet, it'd be a waste of time to optimise for it on our side right now. The emphasis is clearly on making it safely work at all. Stefan From dalcinl at gmail.com Tue Apr 10 20:32:37 2012 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 10 Apr 2012 21:32:37 +0300 Subject: [Cython] never used numpy.pxd, but now my code is failing Message-ID: Is there any way to disable special-casing of numpy arrays? IMHO, if I'm not using Cython's numpy.pxd file, Cython should let me decide how to manage the beast. Error compiling Cython file: ------------------------------------------------------------ ... if ((nm != PyArray_DIM(aj, 0)) or (nm != PyArray_DIM(av, 0)) or (si*bs * sj*bs != sv)): raise ValueError( ("input arrays have incompatible shapes: " "rows.shape=%s, cols.shape=%s, vals.shape=%s") % (ai.shape, aj.shape, av.shape)) ^ ------------------------------------------------------------ PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From d.s.seljebotn at astro.uio.no Tue Apr 10 21:52:39 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 21:52:39 +0200 Subject: [Cython] never used numpy.pxd, but now my code is failing In-Reply-To: References: Message-ID: <4F848F87.1090608@astro.uio.no> On 04/10/2012 08:32 PM, Lisandro Dalcin wrote: > Is there any way to disable special-casing of numpy arrays? IMHO, if > I'm not using Cython's numpy.pxd file, Cython should let me decide how > to manage the beast. > > > Error compiling Cython file: > ------------------------------------------------------------ > ... > if ((nm != PyArray_DIM(aj, 0)) or > (nm != PyArray_DIM(av, 0)) or > (si*bs * sj*bs != sv)): raise ValueError( > ("input arrays have incompatible shapes: " > "rows.shape=%s, cols.shape=%s, vals.shape=%s") % > (ai.shape, aj.shape, av.shape)) > ^ > ------------------------------------------------------------ > > PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object > Whoops, sorry about that. I patched on yet another hack here: https://github.com/dagss/cython/commit/6f2271d2b3390d869a53d15b2b70769df029b218 Even if there's been a lot of trouble with these hacks I hope it can still go in; it is important in order to keep a significant part of the Cython userbase happy. Dag From d.s.seljebotn at astro.uio.no Tue Apr 10 21:53:14 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 10 Apr 2012 21:53:14 +0200 Subject: [Cython] never used numpy.pxd, but now my code is failing In-Reply-To: <4F848F87.1090608@astro.uio.no> References: <4F848F87.1090608@astro.uio.no> Message-ID: <4F848FAA.6080205@astro.uio.no> On 04/10/2012 09:52 PM, Dag Sverre Seljebotn wrote: > On 04/10/2012 08:32 PM, Lisandro Dalcin wrote: >> Is there any way to disable special-casing of numpy arrays? IMHO, if >> I'm not using Cython's numpy.pxd file, Cython should let me decide how >> to manage the beast. >> >> >> Error compiling Cython file: >> ------------------------------------------------------------ >> ... >> if ((nm != PyArray_DIM(aj, 0)) or >> (nm != PyArray_DIM(av, 0)) or >> (si*bs * sj*bs != sv)): raise ValueError( >> ("input arrays have incompatible shapes: " >> "rows.shape=%s, cols.shape=%s, vals.shape=%s") % >> (ai.shape, aj.shape, av.shape)) >> ^ >> ------------------------------------------------------------ >> >> PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object >> > > Whoops, sorry about that. I patched on yet another hack here: > > https://github.com/dagss/cython/commit/6f2271d2b3390d869a53d15b2b70769df029b218 BTW, that's the _numpy branch. Dag > > > Even if there's been a lot of trouble with these hacks I hope it can > still go in; it is important in order to keep a significant part of the > Cython userbase happy. From dalcinl at gmail.com Tue Apr 10 23:41:46 2012 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 11 Apr 2012 00:41:46 +0300 Subject: [Cython] never used numpy.pxd, but now my code is failing In-Reply-To: <4F848FAA.6080205@astro.uio.no> References: <4F848F87.1090608@astro.uio.no> <4F848FAA.6080205@astro.uio.no> Message-ID: On 10 April 2012 22:53, Dag Sverre Seljebotn wrote: > On 04/10/2012 09:52 PM, Dag Sverre Seljebotn wrote: >> >> On 04/10/2012 08:32 PM, Lisandro Dalcin wrote: >>> >>> Is there any way to disable special-casing of numpy arrays? IMHO, if >>> I'm not using Cython's numpy.pxd file, Cython should let me decide how >>> to manage the beast. >>> >>> >>> Error compiling Cython file: >>> ------------------------------------------------------------ >>> ... >>> if ((nm != PyArray_DIM(aj, 0)) or >>> (nm != PyArray_DIM(av, 0)) or >>> (si*bs * sj*bs != sv)): raise ValueError( >>> ("input arrays have incompatible shapes: " >>> "rows.shape=%s, cols.shape=%s, vals.shape=%s") % >>> (ai.shape, aj.shape, av.shape)) >>> ^ >>> ------------------------------------------------------------ >>> >>> PETSc/petscmat.pxi:683:11: Cannot convert 'npy_intp *' to Python object >>> >> >> Whoops, sorry about that. I patched on yet another hack here: >> >> >> https://github.com/dagss/cython/commit/6f2271d2b3390d869a53d15b2b70769df029b218 > > > BTW, that's the _numpy branch. > The fix worked for me. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From markflorisson88 at gmail.com Wed Apr 11 17:19:40 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Wed, 11 Apr 2012 16:19:40 +0100 Subject: [Cython] NumPy dependency in Jenkins builds In-Reply-To: <4F6CD389.6000808@behnel.de> References: <4F6C880D.6010208@behnel.de> <4F6CD389.6000808@behnel.de> Message-ID: On 23 March 2012 19:48, Stefan Behnel wrote: > Stefan Behnel, 23.03.2012 15:26: > > mark florisson, 23.03.2012 14:26: > >> This may be OT for this thread > > > > ... in which case it's quite common to start a new one ... > > > >> but sas numpy removed at some point from Jenkins? I'm seeing this for > >> all python versions since Februari 25: > >> > >> Following tests excluded because of missing dependencies on your > >> system: > >> run.memoryviewattrs > >> run.numpy_ValueError_T172 > >> run.numpy_bufacc_T155 > >> run.numpy_cimport > >> run.numpy_memoryview > >> run.numpy_parallel > >> run.numpy_test > >> ALL DONE > > > > May be my fault. I think when I unified the build jobs, I might have > > disabled it because we didn't have a NumPy version for Py3 at the time. > > > > I'll look into it. > > I've re-enabled them for all CPython builds except for the latest py3k > branch. NumPy 1.6.1 doesn't compile there due to the new Unicode buffer > layout (PEP 383). > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel > How do I enable numpy in my own build configuration? Should I re-clone the cython-devel ones? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Apr 11 19:04:31 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 11 Apr 2012 19:04:31 +0200 Subject: [Cython] NumPy dependency in Jenkins builds In-Reply-To: References: <4F6C880D.6010208@behnel.de> <4F6CD389.6000808@behnel.de> Message-ID: <4F85B99F.6010107@behnel.de> mark florisson, 11.04.2012 17:19: > On 23 March 2012 19:48, Stefan Behnel wrote: >> Stefan Behnel, 23.03.2012 15:26: >>> mark florisson, 23.03.2012 14:26: >>>> Following tests excluded because of missing dependencies on your >>>> system: >>>> run.memoryviewattrs >>>> run.numpy_ValueError_T172 >>>> run.numpy_bufacc_T155 >>>> run.numpy_cimport >>>> run.numpy_memoryview >>>> run.numpy_parallel >>>> run.numpy_test >>>> ALL DONE >>> >>> May be my fault. I think when I unified the build jobs, I might have >>> disabled it because we didn't have a NumPy version for Py3 at the time. >>> >>> I'll look into it. >> >> I've re-enabled them for all CPython builds except for the latest py3k >> branch. NumPy 1.6.1 doesn't compile there due to the new Unicode buffer >> layout (PEP 383). > > How do I enable numpy in my own build configuration? Should I re-clone the > cython-devel ones? You can do that, yes. Alternatively, just write "pyXY-ext" instead of "pyXY" in the PYVERSION axis. Doing that in the test jobs is enough, the build jobs don't need it. Stefan From markflorisson88 at gmail.com Thu Apr 12 14:08:00 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 12 Apr 2012 13:08:00 +0100 Subject: [Cython] NumPy dependency in Jenkins builds In-Reply-To: <4F85B99F.6010107@behnel.de> References: <4F6C880D.6010208@behnel.de> <4F6CD389.6000808@behnel.de> <4F85B99F.6010107@behnel.de> Message-ID: On 11 April 2012 18:04, Stefan Behnel wrote: > mark florisson, 11.04.2012 17:19: >> On 23 March 2012 19:48, Stefan Behnel wrote: >>> Stefan Behnel, 23.03.2012 15:26: >>>> mark florisson, 23.03.2012 14:26: >>>>> Following tests excluded because of missing dependencies on your >>>>> system: >>>>> ? ?run.memoryviewattrs >>>>> ? ?run.numpy_ValueError_T172 >>>>> ? ?run.numpy_bufacc_T155 >>>>> ? ?run.numpy_cimport >>>>> ? ?run.numpy_memoryview >>>>> ? ?run.numpy_parallel >>>>> ? ?run.numpy_test >>>>> ALL DONE >>>> >>>> May be my fault. I think when I unified the build jobs, I might have >>>> disabled it because we didn't have a NumPy version for Py3 at the time. >>>> >>>> I'll look into it. >>> >>> I've re-enabled them for all CPython builds except for the latest py3k >>> branch. NumPy 1.6.1 doesn't compile there due to the new Unicode buffer >>> layout (PEP 383). >> >> How do I enable numpy in my own build configuration? Should I re-clone the >> cython-devel ones? > > You can do that, yes. Alternatively, just write "pyXY-ext" instead of > "pyXY" in the PYVERSION axis. Doing that in the test jobs is enough, the > build jobs don't need it. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Ok, thanks Stefan. From markflorisson88 at gmail.com Thu Apr 12 16:38:37 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 12 Apr 2012 15:38:37 +0100 Subject: [Cython] Cython 0.16 RC 1 Message-ID: Yet another release candidate, this will hopefully be the last before the 0.16 release. You can grab it from here: http://wiki.cython.org/ReleaseNotes-0.16 There were several fixes for the numpy attribute rewrite, memoryviews and fused types. Accessing the 'base' attribute of a typed ndarray now goes through the object layer, which means direct assignment is no longer supported. If there are any problems, please let us know. From markflorisson88 at gmail.com Thu Apr 12 20:21:24 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 12 Apr 2012 19:21:24 +0100 Subject: [Cython] pyregr test suite Message-ID: Hey, Could we run the pyregr test suite manually instead of automatically? It takes a lot of resources to build, and a single simple push to the cython-devel branch results in the build slots being hogged for hours, making the continuous development a lot less 'continuous'. We could just decide to run the pyregr suite every so often, or whenever we make an addition or change that could actually affect Python code (if one updates a test then there is no use in running pyregr for instance). Mark From robertwb at gmail.com Thu Apr 12 22:21:11 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 12 Apr 2012 13:21:11 -0700 Subject: [Cython] pyregr test suite In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 11:21 AM, mark florisson wrote: > Hey, > > Could we run the pyregr test suite manually instead of automatically? > It takes a lot of resources to build, and a single simple push to the > cython-devel branch results in the build slots being hogged for hours, > making the continuous development a lot less 'continuous'. We could > just decide to run the pyregr suite every so often, or whenever we > make an addition or change that could actually affect Python code (if > one updates a test then there is no use in running pyregr for > instance). +1 to manual + periodic for these tests. Alternatively we could make them depend on each other, so at most one core is consumed. - Robert From wesmckinn at gmail.com Thu Apr 12 23:00:29 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 12 Apr 2012 17:00:29 -0400 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: On Thu, Apr 12, 2012 at 10:38 AM, mark florisson wrote: > Yet another release candidate, this will hopefully be the last before > the 0.16 release. You can grab it from here: > http://wiki.cython.org/ReleaseNotes-0.16 > > There were several fixes for the numpy attribute rewrite, memoryviews > and fused types. Accessing the 'base' attribute of a typed ndarray now > goes through the object layer, which means direct assignment is no > longer supported. > > If there are any problems, please let us know. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I'm unable to build pandas using git master Cython. I just released pandas 0.7.3 today which has no issues at all with 0.15.1: http://pypi.python.org/pypi/pandas For example: 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace running build_ext cythoning pandas/src/tseries.pyx to pandas/src/tseries.c Error compiling Cython file: ------------------------------------------------------------ ... self.store = {} ptr = malloc(self.depth * sizeof(int32_t*)) for i in range(self.depth): ptr[i] = ( label_arrays[i]).data ^ ------------------------------------------------------------ pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform ModuleNode.body = StatListNode(tseries.pyx:1:0) StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, as_name = u'MultiMap', class_name = u'MultiMap', doc = u'\n Need to come up with a better data structure for multi-level indexing\n ', module_name = u'', visibility = u'private') CClassDefNode.body = StatListNode(tseries.pyx:91:4) StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) StatListNode.stats[0] = DefNode(tseries.pyx:95:4, modifiers = [...]/0, name = u'__init__', num_required_args = 2, py_wrapper_required = True, reqd_kw_flags_cname = '0', used = True) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:96:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:106:8) File 'Nodes.py', line 5903, in analyse_expressions: ForInStatNode(tseries.pyx:106:8) File 'Nodes.py', line 342, in analyse_expressions: StatListNode(tseries.pyx:107:21) File 'Nodes.py', line 4767, in analyse_expressions: SingleAssignmentNode(tseries.pyx:107:21) File 'Nodes.py', line 4872, in analyse_types: SingleAssignmentNode(tseries.pyx:107:21) File 'ExprNodes.py', line 7082, in analyse_types: TypecastNode(tseries.pyx:107:21, result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4274, in analyse_types: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) File 'ExprNodes.py', line 4436, in analyse_attribute: AttributeNode(tseries.pyx:107:59, attribute = u'data', initialized_check = True, is_attribute = 1, member = u'data', needs_none_check = True, op = '->', result_is_used = True, use_managed_ref = True) Compiler crash traceback from this point on: File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", line 4436, in analyse_attribute replacement_node = numpy_transform_attribute_node(self) File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", line 18, in numpy_transform_attribute_node numpy_pxd_scope = node.obj.entry.type.scope.parent_scope AttributeError: 'TypecastNode' object has no attribute 'entry' building 'pandas._tseries' extension creating build creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/pandas creating build/temp.linux-x86_64-2.7/pandas/src gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o build/temp.linux-x86_64-2.7/pandas/src/tseries.o pandas/src/tseries.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation. error: command 'gcc' failed with exit status 1 ----- I kludged this particular line in the pandas/timeseries branch so it will build on git master Cython, but I was treated to dozens of failures, errors, and finally a segfault in the middle of the test suite. Suffice to say I'm not sure I would advise you to release the library in its current state until all of this is resolved. Happy to help however I can but I'm back to 0.15.1 for now. - Wes From d.s.seljebotn at astro.uio.no Thu Apr 12 23:32:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 12 Apr 2012 23:32:11 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: <4F8749DB.2080403@astro.uio.no> On 04/12/2012 11:00 PM, Wes McKinney wrote: > On Thu, Apr 12, 2012 at 10:38 AM, mark florisson > wrote: >> Yet another release candidate, this will hopefully be the last before >> the 0.16 release. You can grab it from here: >> http://wiki.cython.org/ReleaseNotes-0.16 >> >> There were several fixes for the numpy attribute rewrite, memoryviews >> and fused types. Accessing the 'base' attribute of a typed ndarray now >> goes through the object layer, which means direct assignment is no >> longer supported. >> >> If there are any problems, please let us know. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > I'm unable to build pandas using git master Cython. I just released > pandas 0.7.3 today which has no issues at all with 0.15.1: It is no surprise that master doesn't work. Can you try again with the "release" branch? (We should obviously start to tell people which git branch to fetch in addition to the tarball. And perhaps create a "devel" branch and let master be betas and release candidates.) Dag From d.s.seljebotn at astro.uio.no Fri Apr 13 00:11:27 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 00:11:27 +0200 Subject: [Cython] CEP1000: Native dispatch through callables Message-ID: <4F87530F.7050000@astro.uio.no> Travis Oliphant recently raised the issue on the NumPy list of what mechanisms to use to box native functions produced by his Numba so that SciPy functions can call it, e.g. (I'm making the numba part up): @numba # Compiles function using LLVM def f(x): return 3 * x print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! Obviously, we want something standard, so that Cython functions can also be called in a fast way. This is very similar to CEP 523 (http://wiki.cython.org/enhancements/nativecall), but rather than Cython-to-Cython, we want something that both SciPy, NumPy, numba, Cython, f2py, fwrap can implement. Here's my proposal; Travis seems happy to implement something like it for numba and parts of SciPy: http://wiki.cython.org/enhancements/nativecall Obviously this is (in a modified form) PEP-material, but I think it is much better to just get it working with a nice range of tools first (makes the PEP application stronger as well). Feedback most welcome! Dag From d.s.seljebotn at astro.uio.no Fri Apr 13 00:34:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 00:34:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87530F.7050000@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> Message-ID: <4F875867.3070401@astro.uio.no> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: > Travis Oliphant recently raised the issue on the NumPy list of what > mechanisms to use to box native functions produced by his Numba so that > SciPy functions can call it, e.g. (I'm making the numba part up): > > @numba # Compiles function using LLVM > def f(x): > return 3 * x > > print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! > > Obviously, we want something standard, so that Cython functions can also > be called in a fast way. > > This is very similar to CEP 523 > (http://wiki.cython.org/enhancements/nativecall), but rather than > Cython-to-Cython, we want something that both SciPy, NumPy, numba, > Cython, f2py, fwrap can implement. > > Here's my proposal; Travis seems happy to implement something like it > for numba and parts of SciPy: > > http://wiki.cython.org/enhancements/nativecall I'm sorry. HERE is the CEP: http://wiki.cython.org/enhancements/cep1000 Since writing that yesterday, I've moved more in the direction of wanting a zero-terminated list of overloads instead of providing a count, and have the fast protocol jump over the header (since version is available elsewhere), and just demand that the structure is sizeof(void*)-aligned in the first place rather than the complicated padding. Dag From robertwb at gmail.com Fri Apr 13 01:38:33 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 12 Apr 2012 16:38:33 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F875867.3070401@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> Message-ID: On Thu, Apr 12, 2012 at 3:34 PM, Dag Sverre Seljebotn wrote: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so that >> SciPy functions can call it, e.g. (I'm making the numba part up): >> >> @numba # Compiles function using LLVM >> def f(x): >> return 3 * x >> >> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >> >> Obviously, we want something standard, so that Cython functions can also >> be called in a fast way. >> >> This is very similar to CEP 523 >> (http://wiki.cython.org/enhancements/nativecall), but rather than >> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >> Cython, f2py, fwrap can implement. >> >> Here's my proposal; Travis seems happy to implement something like it >> for numba and parts of SciPy: >> >> http://wiki.cython.org/enhancements/nativecall > > > I'm sorry. HERE is the CEP: > > http://wiki.cython.org/enhancements/cep1000 > > Since writing that yesterday, I've moved more in the direction of wanting a > zero-terminated list of overloads instead of providing a count, and have the > fast protocol jump over the header (since version is available elsewhere), > and just demand that the structure is sizeof(void*)-aligned in the first > place rather than the complicated padding. Great idea to coordinate with the many other projects here. Eventually this could maybe even be a PEP. Somewhat related, I'd like to add support for Go-style interfaces. These would essentially be vtables of pre-fetched function pointers, and could play very nicely with this interface. Have you given any thought as to what happens if __call__ is re-assigned for an object (or subclass of an object) supporting this interface? Or is this out of scope? Minor nit: I don't think should_dereference is worth branching on, if one wants to save the allocation one can still use a variable-sized type and point to oneself. Yes, that's an extra dereference, but the memory is already likely close and it greatly simplifies the logic. But I could be wrong here. Also, I'm not sure the type registration will scale, especially if every callable type wanted to get registered. (E.g. currently closures and generators are new types...) Where to draw the line? (Perhaps things could get registered lazily on the first __nativecall__ lookup, as they're likely to be looked up again?) - Robert From stefan_ml at behnel.de Fri Apr 13 07:11:23 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:11:23 +0200 Subject: [Cython] pyregr test suite In-Reply-To: References: Message-ID: <4F87B57B.6000807@behnel.de> Robert Bradshaw, 12.04.2012 22:21: > On Thu, Apr 12, 2012 at 11:21 AM, mark florisson wrote: >> Could we run the pyregr test suite manually instead of automatically? >> It takes a lot of resources to build, and a single simple push to the >> cython-devel branch results in the build slots being hogged for hours, >> making the continuous development a lot less 'continuous'. We could >> just decide to run the pyregr suite every so often, or whenever we >> make an addition or change that could actually affect Python code (if >> one updates a test then there is no use in running pyregr for >> instance). > > +1 to manual + periodic for these tests. Alternatively we could make > them depend on each other, so at most one core is consumed. Ok, I'll set it up. Stefan From stefan_ml at behnel.de Fri Apr 13 07:17:37 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:17:37 +0200 Subject: [Cython] pyregr test suite In-Reply-To: References: Message-ID: <4F87B6F1.8090804@behnel.de> mark florisson, 12.04.2012 20:21: > Could we run the pyregr test suite manually instead of automatically? > It takes a lot of resources to build, and a single simple push to the > cython-devel branch results in the build slots being hogged for hours, Careful here. It takes a lot of time, yes, but the reason it currently takes ages is that the Py3k tests have started to hang in the test_tempfile tests at some point and don't terminate any more. May be a bug in the tests or a problem on our side, don't know. Stefan From stefan_ml at behnel.de Fri Apr 13 07:22:52 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:22:52 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <4F8749DB.2080403@astro.uio.no> References: <4F8749DB.2080403@astro.uio.no> Message-ID: <4F87B82C.3060405@behnel.de> Dag Sverre Seljebotn, 12.04.2012 23:32: > (We should obviously start to tell people which git branch to fetch in > addition to the tarball. +1 > And perhaps create a "devel" branch and let master > be betas and release candidates.) -1 I think we should just always merge release branches back into the master, especially when we make a release. Alternatively, we could start naming release branches "Cython-0.16" etc. and leave them open - but I find the way we currently do it ok. A tag is usually better than an open branch for the way we make our releases. Stefan From stefan_ml at behnel.de Fri Apr 13 07:24:38 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 07:24:38 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F875867.3070401@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> Message-ID: <4F87B896.5050000@behnel.de> Dag Sverre Seljebotn, 13.04.2012 00:34: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so that >> SciPy functions can call it, e.g. (I'm making the numba part up): >> >> @numba # Compiles function using LLVM >> def f(x): >> return 3 * x >> >> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >> >> Obviously, we want something standard, so that Cython functions can also >> be called in a fast way. >> >> This is very similar to CEP 523 >> (http://wiki.cython.org/enhancements/nativecall), but rather than >> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >> Cython, f2py, fwrap can implement. >> >> Here's my proposal; Travis seems happy to implement something like it >> for numba and parts of SciPy: >> >> http://wiki.cython.org/enhancements/nativecall > > I'm sorry. HERE is the CEP: > > http://wiki.cython.org/enhancements/cep1000 Some general remarks: I'm all for doing something in this direction and have been hinting at it on the PyPy mailing list for a while, without reaction so far. I'll trigger them again, with a pointer to this discussion and the CEP. PyPy should be totally interested in a generic way to do fast calls into wrapped C code in general and Cython implemented functions specifically. Their JIT would then look at the function at runtime and unwrap it. There's PEP 362 which proposes a Signature object. It seems to have attracted some interest lately and Guido seems to like it also. I think we should come up with a way to add a C level interface to that, instead of designing something entirely separate. http://www.python.org/dev/peps/pep-0362/ Stefan From d.s.seljebotn at astro.uio.no Fri Apr 13 10:52:07 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 10:52:07 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> Message-ID: <4F87E937.9050705@astro.uio.no> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: > On Thu, Apr 12, 2012 at 3:34 PM, Dag Sverre Seljebotn > wrote: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so that >>> SciPy functions can call it, e.g. (I'm making the numba part up): >>> >>> @numba # Compiles function using LLVM >>> def f(x): >>> return 3 * x >>> >>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>> >>> Obviously, we want something standard, so that Cython functions can also >>> be called in a fast way. >>> >>> This is very similar to CEP 523 >>> (http://wiki.cython.org/enhancements/nativecall), but rather than >>> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >>> Cython, f2py, fwrap can implement. >>> >>> Here's my proposal; Travis seems happy to implement something like it >>> for numba and parts of SciPy: >>> >>> http://wiki.cython.org/enhancements/nativecall >> >> >> I'm sorry. HERE is the CEP: >> >> http://wiki.cython.org/enhancements/cep1000 >> >> Since writing that yesterday, I've moved more in the direction of wanting a >> zero-terminated list of overloads instead of providing a count, and have the >> fast protocol jump over the header (since version is available elsewhere), >> and just demand that the structure is sizeof(void*)-aligned in the first >> place rather than the complicated padding. > > Great idea to coordinate with the many other projects here. Eventually > this could maybe even be a PEP. > > Somewhat related, I'd like to add support for Go-style interfaces. > These would essentially be vtables of pre-fetched function pointers, > and could play very nicely with this interface. Yep; but you agree that this can be done in isolation without considering vtables first? > Have you given any thought as to what happens if __call__ is > re-assigned for an object (or subclass of an object) supporting this > interface? Or is this out of scope? Out-of-scope, I'd say. Though you can always write an object that detects if you assign to __call__... > Minor nit: I don't think should_dereference is worth branching on, if > one wants to save the allocation one can still use a variable-sized > type and point to oneself. Yes, that's an extra dereference, but the > memory is already likely close and it greatly simplifies the logic. > But I could be wrong here. Those minor nits are exactly what I seek; since Travis will have the first implementation in numba<->SciPy, I just want to make sure that what he does will work efficiently work Cython. Can we perhaps just require that the information is embedded in the object? I must admit that when I wrote that I was mostly thinking of JIT-style code generation, where you only use should_dereference for code-generation. But yes, by converting the table to a C structure you can do without a JIT. > > Also, I'm not sure the type registration will scale, especially if > every callable type wanted to get registered. (E.g. currently closures > and generators are new types...) Where to draw the line? (Perhaps > things could get registered lazily on the first __nativecall__ lookup, > as they're likely to be looked up again?) Right... if we do some work to synchronize the types for Cython modules generated by the same version of Cython, we're left with 3-4 types for Cython, right? Then a couple for numba and one for f2py; so on the order of 10? An alternative is do something funny in the type object to get across the offset-in-object information (abusing the docstring, or introduce our own flag which means that the type object has an additional non-standard field at the end). Dag From d.s.seljebotn at astro.uio.no Fri Apr 13 11:13:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 11:13:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87B896.5050000@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> Message-ID: <4F87EE2B.20702@astro.uio.no> On 04/13/2012 07:24 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 13.04.2012 00:34: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so that >>> SciPy functions can call it, e.g. (I'm making the numba part up): >>> >>> @numba # Compiles function using LLVM >>> def f(x): >>> return 3 * x >>> >>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>> >>> Obviously, we want something standard, so that Cython functions can also >>> be called in a fast way. >>> >>> This is very similar to CEP 523 >>> (http://wiki.cython.org/enhancements/nativecall), but rather than >>> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >>> Cython, f2py, fwrap can implement. >>> >>> Here's my proposal; Travis seems happy to implement something like it >>> for numba and parts of SciPy: >>> >>> http://wiki.cython.org/enhancements/nativecall >> >> I'm sorry. HERE is the CEP: >> >> http://wiki.cython.org/enhancements/cep1000 > > Some general remarks: > > I'm all for doing something in this direction and have been hinting at it > on the PyPy mailing list for a while, without reaction so far. I'll trigger > them again, with a pointer to this discussion and the CEP. PyPy should be > totally interested in a generic way to do fast calls into wrapped C code in > general and Cython implemented functions specifically. Their JIT would then > look at the function at runtime and unwrap it. > > There's PEP 362 which proposes a Signature object. It seems to have > attracted some interest lately and Guido seems to like it also. I think we > should come up with a way to add a C level interface to that, instead of > designing something entirely separate. > > http://www.python.org/dev/peps/pep-0362/ Well, provided that you still want an efficient representation that can be strcmp-ed in dispatch codes, this seems to boil down to using a Signature object rather than a capsule (with a C interface), and store it in __signature__ rather than __fastcall__, and perhaps provide a slot in the type object for a function returning it. I really think the right approach is to prove the concept outside of the standardization process first; a) by the time a PEP would be accepted it will have been years since Travis had time to work on this, b) as far as the slot in the type object goes, we're left with users on Python 2.4 today; a Python 3.4+ solution is not really a solution. Dag From stefan_ml at behnel.de Fri Apr 13 11:15:11 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 11:15:11 +0200 Subject: [Cython] PyPy sprint in Leipzig, June 22-27 (was: Re: CEP1000: Native dispatch through callables) In-Reply-To: <4F87B896.5050000@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> Message-ID: <4F87EE9F.4070205@behnel.de> Stefan Behnel, 13.04.2012 07:24: > Dag Sverre Seljebotn, 13.04.2012 00:34: >> http://wiki.cython.org/enhancements/cep1000 > > I'm all for doing something in this direction and have been hinting at it > on the PyPy mailing list for a while, without reaction so far. I'll trigger > them again, with a pointer to this discussion and the CEP. PyPy should be > totally interested in a generic way to do fast calls into wrapped C code in > general and Cython implemented functions specifically. Their JIT would then > look at the function at runtime and unwrap it. BTW, there will be a PyPy sprint in Leipzig from June 22-27. If anyone's interested in coordinating with PyPy on this and other topics, that might be a good place to go for a day or two. http://permalink.gmane.org/gmane.comp.python.pypy/9896 Stefan From robertwb at gmail.com Fri Apr 13 12:17:09 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 03:17:09 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87E937.9050705@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >> >> On Thu, Apr 12, 2012 at 3:34 PM, Dag Sverre Seljebotn >> ?wrote: >>> >>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>>> >>>> >>>> Travis Oliphant recently raised the issue on the NumPy list of what >>>> mechanisms to use to box native functions produced by his Numba so that >>>> SciPy functions can call it, e.g. (I'm making the numba part up): >>>> >>>> @numba # Compiles function using LLVM >>>> def f(x): >>>> return 3 * x >>>> >>>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>>> >>>> Obviously, we want something standard, so that Cython functions can also >>>> be called in a fast way. >>>> >>>> This is very similar to CEP 523 >>>> (http://wiki.cython.org/enhancements/nativecall), but rather than >>>> Cython-to-Cython, we want something that both SciPy, NumPy, numba, >>>> Cython, f2py, fwrap can implement. >>>> >>>> Here's my proposal; Travis seems happy to implement something like it >>>> for numba and parts of SciPy: >>>> >>>> http://wiki.cython.org/enhancements/nativecall >>> >>> >>> >>> I'm sorry. HERE is the CEP: >>> >>> http://wiki.cython.org/enhancements/cep1000 >>> >>> Since writing that yesterday, I've moved more in the direction of wanting >>> a >>> zero-terminated list of overloads instead of providing a count, and have >>> the >>> fast protocol jump over the header (since version is available >>> elsewhere), >>> and just demand that the structure is sizeof(void*)-aligned in the first >>> place rather than the complicated padding. >> >> >> Great idea to coordinate with the many other projects here. Eventually >> this could maybe even be a PEP. >> >> Somewhat related, I'd like to add support for Go-style interfaces. >> These would essentially be vtables of pre-fetched function pointers, >> and could play very nicely with this interface. > > > Yep; but you agree that this can be done in isolation without considering > vtables first? Yes, for sure. >> Have you given any thought as to what happens if __call__ is >> re-assigned for an object (or subclass of an object) supporting this >> interface? Or is this out of scope? > > > Out-of-scope, I'd say. Though you can always write an object that detects if > you assign to __call__... > > >> Minor nit: I don't think should_dereference is worth branching on, if >> one wants to save the allocation one can still use a variable-sized >> type and point to oneself. Yes, that's an extra dereference, but the >> memory is already likely close and it greatly simplifies the logic. >> But I could be wrong here. > > > Those minor nits are exactly what I seek; since Travis will have the first > implementation in numba<->SciPy, I just want to make sure that what he does > will work efficiently work Cython. +1 I have to admit building/invoking these var-arg-sized __nativecall__ records seems painful. Here's another suggestion: struct { void* pointer; size_t signature; // compressed binary representation, 95% coverage char* long_signature; // used if signature is not representable in a size_t, as indicated by signature = 0 } record; These char* could optionally be allocated at the end of the record* for optimal locality. We could even dispense with the binary signature, but having that option allows us to avoid strcmp for stuff like d)d and ffi)f. > Can we perhaps just require that the information is embedded in the object? I think not, this would require variably-sized objects (and also use up the variable sized nature). Given that this is in a portion of the program that is iterating over a Python tuple, I think the extra deference here is non-consequential. > I must admit that when I wrote that I was mostly thinking of JIT-style code > generation, where you only use should_dereference for code-generation. But > yes, by converting the table to a C structure you can do without a JIT. > > >> >> Also, I'm not sure the type registration will scale, especially if >> every callable type wanted to get registered. (E.g. currently closures >> and generators are new types...) Where to draw the line? (Perhaps >> things could get registered lazily on the first __nativecall__ lookup, >> as they're likely to be looked up again?) > > > Right... if we do some work to synchronize the types for Cython modules > generated by the same version of Cython, we're left with 3-4 types for > Cython, right? Then a couple for numba and one for f2py; so on the order of > 10? No, I think each closure is its own type. > An alternative is do something funny in the type object to get across the > offset-in-object information (abusing the docstring, or introduce our own > flag which means that the type object has an additional non-standard field > at the end). It's a hack, but the flag + non-standard field idea might just work... Ah, don't you just love C :) - Robert From stefan_ml at behnel.de Fri Apr 13 11:35:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 11:35:18 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87EE2B.20702@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> <4F87EE2B.20702@astro.uio.no> Message-ID: <4F87F356.9040607@behnel.de> Dag Sverre Seljebotn, 13.04.2012 11:13: > On 04/13/2012 07:24 AM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 13.04.2012 00:34: >>> http://wiki.cython.org/enhancements/cep1000 >> >> There's PEP 362 which proposes a Signature object. It seems to have >> attracted some interest lately and Guido seems to like it also. I think we >> should come up with a way to add a C level interface to that, instead of >> designing something entirely separate. >> >> http://www.python.org/dev/peps/pep-0362/ > > Well, provided that you still want an efficient representation that can be > strcmp-ed in dispatch codes, this seems to boil down to using a Signature > object rather than a capsule (with a C interface), and store it in > __signature__ rather than __fastcall__, and perhaps provide a slot in the > type object for a function returning it. Basically, yes. I was just bringing it up because we should keep it in mind when designing a solution. Moving it into the Signature object would also allow C signature introspection from Python code, for example. It would obviously need a straight C level way to access it. I'm not sure it has to be a function, though. I would prefer a simple array of structs that map signature strings to function pointers. Like the PyMethodDef struct. > I really think the right approach is to prove the concept outside of the > standardization process first; a) by the time a PEP would be accepted it > will have been years since Travis had time to work on this, b) as far as > the slot in the type object goes, we're left with users on Python 2.4 > today; a Python 3.4+ solution is not really a solution. Sure. But nothing keeps us from backporting at least parts of it to older Pythons, like we did for so many other things. Stefan From stefan_ml at behnel.de Fri Apr 13 13:38:56 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 13:38:56 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> Message-ID: <4F881050.7000302@behnel.de> Robert Bradshaw, 13.04.2012 12:17: > On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>> Have you given any thought as to what happens if __call__ is >>> re-assigned for an object (or subclass of an object) supporting this >>> interface? Or is this out of scope? >> >> Out-of-scope, I'd say. Though you can always write an object that detects if >> you assign to __call__... +1 for out of scope. This is a pure C level feature. >>> Minor nit: I don't think should_dereference is worth branching on, if >>> one wants to save the allocation one can still use a variable-sized >>> type and point to oneself. Yes, that's an extra dereference, but the >>> memory is already likely close and it greatly simplifies the logic. >>> But I could be wrong here. >> >> >> Those minor nits are exactly what I seek; since Travis will have the first >> implementation in numba<->SciPy, I just want to make sure that what he does >> will work efficiently work Cython. > > +1 > > I have to admit building/invoking these var-arg-sized __nativecall__ > records seems painful. Here's another suggestion: > > struct { > void* pointer; > size_t signature; // compressed binary representation, 95% coverage > char* long_signature; // used if signature is not representable in > a size_t, as indicated by signature = 0 > } record; > > These char* could optionally be allocated at the end of the record* > for optimal locality. We could even dispense with the binary > signature, but having that option allows us to avoid strcmp for stuff > like d)d and ffi)f. Assuming we use literals and a const char* for the signature, the C compiler would cut down the number of signature strings automatically for us. And a pointer comparison is the same as a size_t comparison. That would only apply at a per-module level, though, so it would require an indirection for the signature IDs. But it would avoid a global registry. Another idea would be to set the signature ID field to 0 at the beginning and call a C-API function to let the current runtime assign an ID > 0, unique for the currently running application. Then every user would only have to parse the signature once to adapt to the respective ID and could otherwise branch based on it directly. For Cython, we could generate a static ID variable for each typed call that we found in the sources. When encountering a C signature on a callable, either a) the ID variable is still empty (initial case), then we parse the signature to see if it matches the expected signature. If it does, we assign the corresponding ID to the static ID variable and issue a direct call. If b) the ID field is already set (normal case), we compare the signature IDs directly and issue a C call it they match. If the IDs do not match, we issue a normal Python call. >> Right... if we do some work to synchronize the types for Cython modules >> generated by the same version of Cython, we're left with 3-4 types for >> Cython, right? Then a couple for numba and one for f2py; so on the order of >> 10? > > No, I think each closure is its own type. And that even applies to fused functions, right? They'd have one closure for each type combination. >> An alternative is do something funny in the type object to get across the >> offset-in-object information (abusing the docstring, or introduce our own >> flag which means that the type object has an additional non-standard field >> at the end). > > It's a hack, but the flag + non-standard field idea might just work... Plus, it wouldn't have to stay a non-standard field. If it's accepted into CPython 3.4, we could safely use it in all existing versions of CPython. Stefan From d.s.seljebotn at astro.uio.no Fri Apr 13 13:59:45 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 13:59:45 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881050.7000302@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> Message-ID: <4F881531.4090406@astro.uio.no> On 04/13/2012 01:38 PM, Stefan Behnel wrote: > Robert Bradshaw, 13.04.2012 12:17: >> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>> Have you given any thought as to what happens if __call__ is >>>> re-assigned for an object (or subclass of an object) supporting this >>>> interface? Or is this out of scope? >>> >>> Out-of-scope, I'd say. Though you can always write an object that detects if >>> you assign to __call__... > > +1 for out of scope. This is a pure C level feature. > > >>>> Minor nit: I don't think should_dereference is worth branching on, if >>>> one wants to save the allocation one can still use a variable-sized >>>> type and point to oneself. Yes, that's an extra dereference, but the >>>> memory is already likely close and it greatly simplifies the logic. >>>> But I could be wrong here. >>> >>> >>> Those minor nits are exactly what I seek; since Travis will have the first >>> implementation in numba<->SciPy, I just want to make sure that what he does >>> will work efficiently work Cython. >> >> +1 >> >> I have to admit building/invoking these var-arg-sized __nativecall__ >> records seems painful. Here's another suggestion: >> >> struct { >> void* pointer; >> size_t signature; // compressed binary representation, 95% coverage Once you start passing around functions that take memory view slices as arguments, that 95% estimate will be off I think. >> char* long_signature; // used if signature is not representable in >> a size_t, as indicated by signature = 0 >> } record; >> >> These char* could optionally be allocated at the end of the record* >> for optimal locality. We could even dispense with the binary >> signature, but having that option allows us to avoid strcmp for stuff >> like d)d and ffi)f. > > Assuming we use literals and a const char* for the signature, the C > compiler would cut down the number of signature strings automatically for > us. And a pointer comparison is the same as a size_t comparison. I'll go one further: Intern Python bytes objects. It's just a PyObject*, but it's *required* (or just strongly encouraged) to have gone through sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) Obviously in a PEP you'd have a C-API function for such interning (completely standalone utility). Performance of interning operation itself doesn't matter... Unless CPython has interning features itself, like in Java? Was that present back in the day and then ripped out? Requiring interning is somewhat less elegant in one way, but it makes a lot of other stuff much simpler. That gives us struct { void *pointer; PyBytesObject *signature; } record; and then you allocate a NULL-terminated arrays of these for all the overloads. > > That would only apply at a per-module level, though, so it would require an > indirection for the signature IDs. But it would avoid a global registry. > > Another idea would be to set the signature ID field to 0 at the beginning > and call a C-API function to let the current runtime assign an ID> 0, > unique for the currently running application. Then every user would only > have to parse the signature once to adapt to the respective ID and could > otherwise branch based on it directly. > > For Cython, we could generate a static ID variable for each typed call that > we found in the sources. When encountering a C signature on a callable, > either a) the ID variable is still empty (initial case), then we parse the > signature to see if it matches the expected signature. If it does, we > assign the corresponding ID to the static ID variable and issue a direct > call. If b) the ID field is already set (normal case), we compare the > signature IDs directly and issue a C call it they match. If the IDs do not > match, we issue a normal Python call. > > >>> Right... if we do some work to synchronize the types for Cython modules >>> generated by the same version of Cython, we're left with 3-4 types for >>> Cython, right? Then a couple for numba and one for f2py; so on the order of >>> 10? >> >> No, I think each closure is its own type. > > And that even applies to fused functions, right? They'd have one closure > for each type combination. > > >>> An alternative is do something funny in the type object to get across the >>> offset-in-object information (abusing the docstring, or introduce our own >>> flag which means that the type object has an additional non-standard field >>> at the end). >> >> It's a hack, but the flag + non-standard field idea might just work... > > Plus, it wouldn't have to stay a non-standard field. If it's accepted into > CPython 3.4, we could safely use it in all existing versions of CPython. Sounds good. Perhaps just find a single "extended", then add a new flag field in our payload, in case we need to extend the types object yet again later and run out of unused flag bits (TBD: figure out how many unused flag bits there are). Dag From njs at pobox.com Fri Apr 13 14:19:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Apr 2012 13:19:38 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87E937.9050705@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 9:52 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >> Also, I'm not sure the type registration will scale, especially if >> every callable type wanted to get registered. (E.g. currently closures >> and generators are new types...) Where to draw the line? (Perhaps >> things could get registered lazily on the first __nativecall__ lookup, >> as they're likely to be looked up again?) > > > Right... if we do some work to synchronize the types for Cython modules > generated by the same version of Cython, we're left with 3-4 types for > Cython, right? Then a couple for numba and one for f2py; so on the order of > 10? > > An alternative is do something funny in the type object to get across the > offset-in-object information (abusing the docstring, or introduce our own > flag which means that the type object has an additional non-standard field > at the end). In Python 2.7, it looks like there may be a few TP_FLAG bits free -- 15 and 16 are labeled "reserved for stackless python", and 2, 11, 22 don't have anything defined. There may also be an unused ssize_t field ob_size at the beginning of the type object -- for some reason PyTypeObject is declared as variable size (using PyObject_VAR_HEAD), but I don't see any variable-size fields in it, the docs claim that the ob_size field is a "historical artifact that is maintained for binary compatibility...Always set this field to zero", and Include/object.h has a definition for a PyHeapTypeObject which has a PyTypeObject as its first member, which would not work if PyTypeObject had variable size. Grep says that the only place where ob_type->ob_size is accessed is in Objects/typeobject.c:object_sizeof(), which at first glance appears to be a bug, and anyway I don't think anyone cares whether __sizeof__ on C-callable objects is exactly correct. One could use this for an offset, or even a pointer. One could also add a field easily by just subclassing PyTypeObject. The Signature thing seems like a distraction to me. Signature is intended as just a nice convenient format for looking up stuff that's otherwise stored in more obscure ways -- the API equivalent of pretty-printing. The important thing here is getting the C-level dispatch right. -- Nathaniel From njs at pobox.com Fri Apr 13 14:25:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Apr 2012 13:25:38 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 12:59 PM, Dag Sverre Seljebotn wrote: > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? http://docs.python.org/library/functions.html#intern ? (C API: PyString_InternInPlace, moved from __builtin__.intern to sys.intern in Py3.) - N From stefan_ml at behnel.de Fri Apr 13 14:27:48 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 14:27:48 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: <4F881BC4.1070004@behnel.de> Dag Sverre Seljebotn, 13.04.2012 13:59: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> Robert Bradshaw, 13.04.2012 12:17: >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does will work efficiently work Cython. >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> void* pointer; >>> size_t signature; // compressed binary representation, 95% coverage > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. Yes, I really think it makes sense to keeps IDs unique only over the runtime of the application. (Note that using ssize_t instead of size_t would allow setting the ID to -1 to disable signature matching, in case that's ever needed.) >>> char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, > but it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that > present back in the day and then ripped out? AFAIR, it always had to be done explicitly and is only available for unicode objects in Py3 (and only for bytes objects in Py2). The CPython parser also does it for identifiers, but it's not done automatically for anything else. It's also not cheap to do - it would require a weakref dict to accommodate for the temporary allocation of large strings, and weak references have a certain overhead. In any case, this is an entirely different use case that should be handled differently from normal string interning. > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > void *pointer; > PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the overloads. However, the problem is the setup. These references will have to be created at init time and discarded during runtime termination. Not a problem for Cython generated code, but some overhead for hand written code. Since the size of these structs is not a problem, I'd prefer keeping Python objects out of the game and using an ssize_t ID instead, inferred from a char* signature at module init time by calling a C-API function. That avoids the need for any cleanup. Stefan From markflorisson88 at gmail.com Fri Apr 13 14:46:27 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 13:46:27 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881050.7000302@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> Message-ID: On 13 April 2012 12:38, Stefan Behnel wrote: > Robert Bradshaw, 13.04.2012 12:17: >> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>> Have you given any thought as to what happens if __call__ is >>>> re-assigned for an object (or subclass of an object) supporting this >>>> interface? Or is this out of scope? >>> >>> Out-of-scope, I'd say. Though you can always write an object that detects if >>> you assign to __call__... > > +1 for out of scope. This is a pure C level feature. > > >>>> Minor nit: I don't think should_dereference is worth branching on, if >>>> one wants to save the allocation one can still use a variable-sized >>>> type and point to oneself. Yes, that's an extra dereference, but the >>>> memory is already likely close and it greatly simplifies the logic. >>>> But I could be wrong here. >>> >>> >>> Those minor nits are exactly what I seek; since Travis will have the first >>> implementation in numba<->SciPy, I just want to make sure that what he does >>> will work efficiently work Cython. >> >> +1 >> >> I have to admit building/invoking these var-arg-sized __nativecall__ >> records seems painful. Here's another suggestion: >> >> struct { >> ? ? void* pointer; >> ? ? size_t signature; // compressed binary representation, 95% coverage >> ? ? char* long_signature; // used if signature is not representable in >> a size_t, as indicated by signature = 0 >> } record; >> >> These char* could optionally be allocated at the end of the record* >> for optimal locality. We could even dispense with the binary >> signature, but having that option allows us to avoid strcmp for stuff >> like d)d and ffi)f. > > Assuming we use literals and a const char* for the signature, the C > compiler would cut down the number of signature strings automatically for > us. And a pointer comparison is the same as a size_t comparison. > > That would only apply at a per-module level, though, so it would require an > indirection for the signature IDs. But it would avoid a global registry. > > Another idea would be to set the signature ID field to 0 at the beginning > and call a C-API function to let the current runtime assign an ID > 0, > unique for the currently running application. Then every user would only > have to parse the signature once to adapt to the respective ID and could > otherwise branch based on it directly. > > For Cython, we could generate a static ID variable for each typed call that > we found in the sources. When encountering a C signature on a callable, > either a) the ID variable is still empty (initial case), then we parse the > signature to see if it matches the expected signature. If it does, we > assign the corresponding ID to the static ID variable and issue a direct > call. If b) the ID field is already set (normal case), we compare the > signature IDs directly and issue a C call it they match. If the IDs do not > match, we issue a normal Python call. > > >>> Right... if we do some work to synchronize the types for Cython modules >>> generated by the same version of Cython, we're left with 3-4 types for >>> Cython, right? Then a couple for numba and one for f2py; so on the order of >>> 10? >> >> No, I think each closure is its own type. > > And that even applies to fused functions, right? They'd have one closure > for each type combination. > Hm, there is only one type for the function (CyFunction), but there is a different type for the closure scope for each closure. The same goes for FusedFunction, there is only one type, and each instance contains a dict of specializations (mapping signatures to PyCFunctions). (But each module still has different function types of course). >>> An alternative is do something funny in the type object to get across the >>> offset-in-object information (abusing the docstring, or introduce our own >>> flag which means that the type object has an additional non-standard field >>> at the end). >> >> It's a hack, but the flag + non-standard field idea might just work... > > Plus, it wouldn't have to stay a non-standard field. If it's accepted into > CPython 3.4, we could safely use it in all existing versions of CPython. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Fri Apr 13 14:48:54 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 13:48:54 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On 13 April 2012 12:59, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> >> Robert Bradshaw, 13.04.2012 12:17: >>> >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> >>>>> Have you given any thought as to what happens if __call__ is >>>>> re-assigned for an object (or subclass of an object) supporting this >>>>> interface? Or is this out of scope? >>>> >>>> >>>> Out-of-scope, I'd say. Though you can always write an object that >>>> detects if >>>> you assign to __call__... >> >> >> +1 for out of scope. This is a pure C level feature. >> >> >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the >>>> first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does >>>> will work efficiently work Cython. >>> >>> >>> +1 >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> ? ? void* pointer; >>> ? ? size_t signature; // compressed binary representation, 95% coverage > > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. > It kind of depends on which arguments types and how many arguments you will allow, and whether or not collisions would be fine (which would imply ID comparison + strcmp()). >>> ? ? char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? > > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > ? ?void *pointer; > ? ?PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the > overloads. > Interesting. What I like about size_t it that it could define a deterministic ordering, which means specializations could be stored in a binary search tree in array form. Cython would precompute the size_t for the specialization it needs (and maybe account for promotions as well). >> >> That would only apply at a per-module level, though, so it would require >> an >> indirection for the signature IDs. But it would avoid a global registry. >> >> Another idea would be to set the signature ID field to 0 at the beginning >> and call a C-API function to let the current runtime assign an ID> ?0, >> unique for the currently running application. Then every user would only >> have to parse the signature once to adapt to the respective ID and could >> otherwise branch based on it directly. >> >> For Cython, we could generate a static ID variable for each typed call >> that >> we found in the sources. When encountering a C signature on a callable, >> either a) the ID variable is still empty (initial case), then we parse the >> signature to see if it matches the expected signature. If it does, we >> assign the corresponding ID to the static ID variable and issue a direct >> call. If b) the ID field is already set (normal case), we compare the >> signature IDs directly and issue a C call it they match. If the IDs do not >> match, we issue a normal Python call. >> >> >>>> Right... if we do some work to synchronize the types for Cython modules >>>> generated by the same version of Cython, we're left with 3-4 types for >>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>> of >>>> 10? >>> >>> >>> No, I think each closure is its own type. >> >> >> And that even applies to fused functions, right? They'd have one closure >> for each type combination. >> >> >>>> An alternative is do something funny in the type object to get across >>>> the >>>> offset-in-object information (abusing the docstring, or introduce our >>>> own >>>> flag which means that the type object has an additional non-standard >>>> field >>>> at the end). >>> >>> >>> It's a hack, but the flag + non-standard field idea might just work... >> >> >> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >> CPython 3.4, we could safely use it in all existing versions of CPython. > > > Sounds good. Perhaps just find a single "extended", then add a new flag > field in our payload, in case we need to extend the types object yet again > later and run out of unused flag bits (TBD: figure out how many unused flag > bits there are). > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Maybe it would be a good idea if there was a third project that defined this functionality in header files which projects could include (or in case of Cython directly inject into the generated C files). E.g. a function to check for the native interface, and a function that given an array of signature strings and function pointers builds the ABI information (and computes the ID), and one that given an ID and signature string finds the right specialization. The project should also expose a simple type system for the types we care about, and be able to generate signature strings and IDs for signatures. An optimization for the common case would be to only look at the first entry in the ABI information directly and compare that for the non-overloaded case, and otherwise do a logarithmic lookup, with a final fallback to calling through the Python layer. From stefan_ml at behnel.de Fri Apr 13 14:48:05 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 14:48:05 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881BC4.1070004@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> Message-ID: <4F882085.6070304@behnel.de> Stefan Behnel, 13.04.2012 14:27: > Dag Sverre Seljebotn, 13.04.2012 13:59: >> Requiring interning is somewhat less elegant in one way, but it makes a lot >> of other stuff much simpler. >> >> That gives us >> >> struct { >> void *pointer; >> PyBytesObject *signature; >> } record; >> >> and then you allocate a NULL-terminated arrays of these for all the overloads. > > However, the problem is the setup. These references will have to be created > at init time and discarded during runtime termination. Not a problem for > Cython generated code, but some overhead for hand written code. > > Since the size of these structs is not a problem, I'd prefer keeping Python > objects out of the game and using an ssize_t ID instead, inferred from a > char* signature at module init time by calling a C-API function. That > avoids the need for any cleanup. Actually, we could even use interned char* values. Nothing keeps that C-API setup function from reassigning the "char* signature" field to the char* buffer of an internally allocated byte string. Except that we'd have to *require* users to use literals or otherwise statically allocated C strings in that field. Hmm, maybe not the best idea ever... Stefan From markflorisson88 at gmail.com Fri Apr 13 15:01:30 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 14:01:30 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F882085.6070304@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> <4F882085.6070304@behnel.de> Message-ID: On 13 April 2012 13:48, Stefan Behnel wrote: > Stefan Behnel, 13.04.2012 14:27: >> Dag Sverre Seljebotn, 13.04.2012 13:59: >>> Requiring interning is somewhat less elegant in one way, but it makes a lot >>> of other stuff much simpler. >>> >>> That gives us >>> >>> struct { >>> ? ? void *pointer; >>> ? ? PyBytesObject *signature; >>> } record; >>> >>> and then you allocate a NULL-terminated arrays of these for all the overloads. >> >> However, the problem is the setup. These references will have to be created >> at init time and discarded during runtime termination. Not a problem for >> Cython generated code, but some overhead for hand written code. >> >> Since the size of these structs is not a problem, I'd prefer keeping Python >> objects out of the game and using an ssize_t ID instead, inferred from a >> char* signature at module init time by calling a C-API function. That >> avoids the need for any cleanup. > > Actually, we could even use interned char* values. Nothing keeps that C-API > setup function from reassigning the "char* signature" field to the char* > buffer of an internally allocated byte string. Except that we'd have to > *require* users to use literals or otherwise statically allocated C strings > in that field. Hmm, maybe not the best idea ever... > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel You could create a module shared by all versions and projects, which exposes a function 'get_signature', which given a char *signature returns the pointer that should be used in the ABI signature type information. You can then always compare by identity. From d.s.seljebotn at astro.uio.no Fri Apr 13 15:27:34 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 15:27:34 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> <4F882085.6070304@behnel.de> Message-ID: <4F8829C6.2050903@astro.uio.no> On 04/13/2012 03:01 PM, mark florisson wrote: > On 13 April 2012 13:48, Stefan Behnel wrote: >> Stefan Behnel, 13.04.2012 14:27: >>> Dag Sverre Seljebotn, 13.04.2012 13:59: >>>> Requiring interning is somewhat less elegant in one way, but it makes a lot >>>> of other stuff much simpler. >>>> >>>> That gives us >>>> >>>> struct { >>>> void *pointer; >>>> PyBytesObject *signature; >>>> } record; >>>> >>>> and then you allocate a NULL-terminated arrays of these for all the overloads. >>> >>> However, the problem is the setup. These references will have to be created >>> at init time and discarded during runtime termination. Not a problem for >>> Cython generated code, but some overhead for hand written code. >>> >>> Since the size of these structs is not a problem, I'd prefer keeping Python >>> objects out of the game and using an ssize_t ID instead, inferred from a >>> char* signature at module init time by calling a C-API function. That >>> avoids the need for any cleanup. >> >> Actually, we could even use interned char* values. Nothing keeps that C-API >> setup function from reassigning the "char* signature" field to the char* >> buffer of an internally allocated byte string. Except that we'd have to >> *require* users to use literals or otherwise statically allocated C strings >> in that field. Hmm, maybe not the best idea ever... >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > You could create a module shared by all versions and projects, which > exposes a function 'get_signature', which given a char *signature > returns the pointer that should be used in the ABI signature type > information. You can then always compare by identity. I fail to see how this is different from what I proposed, with interning bytes objects (which I still prefer; although the binary-search features of direct comparison makes that attractive too). BTW, any proposal that requires an actual project/library that both Cython and NumPy depends on will fail in the real world. Dag From stefan_ml at behnel.de Fri Apr 13 15:52:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 15:52:44 +0200 Subject: [Cython] pyregr test suite In-Reply-To: <4F87B57B.6000807@behnel.de> References: <4F87B57B.6000807@behnel.de> Message-ID: <4F882FAC.90008@behnel.de> Stefan Behnel, 13.04.2012 07:11: > Robert Bradshaw, 12.04.2012 22:21: >> On Thu, Apr 12, 2012 at 11:21 AM, mark florisson wrote: >>> Could we run the pyregr test suite manually instead of automatically? >>> It takes a lot of resources to build, and a single simple push to the >>> cython-devel branch results in the build slots being hogged for hours, >>> making the continuous development a lot less 'continuous'. We could >>> just decide to run the pyregr suite every so often, or whenever we >>> make an addition or change that could actually affect Python code (if >>> one updates a test then there is no use in running pyregr for >>> instance). >> >> +1 to manual + periodic for these tests. Alternatively we could make >> them depend on each other, so at most one core is consumed. > > Ok, I'll set it up. They are now triggered by the (nightly) CPython builds and the four configurations run sequentially (there's an option for that), starting with the C tests. I would recommend configuring your own pyregr test jobs (if you have any) for manual runs by disabling all of their triggers. Stefan From markflorisson88 at gmail.com Fri Apr 13 16:18:01 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 13 Apr 2012 15:18:01 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8829C6.2050903@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F881BC4.1070004@behnel.de> <4F882085.6070304@behnel.de> <4F8829C6.2050903@astro.uio.no> Message-ID: On 13 April 2012 14:27, Dag Sverre Seljebotn wrote: > On 04/13/2012 03:01 PM, mark florisson wrote: >> >> On 13 April 2012 13:48, Stefan Behnel ?wrote: >>> >>> Stefan Behnel, 13.04.2012 14:27: >>>> >>>> Dag Sverre Seljebotn, 13.04.2012 13:59: >>>>> >>>>> Requiring interning is somewhat less elegant in one way, but it makes a >>>>> lot >>>>> of other stuff much simpler. >>>>> >>>>> That gives us >>>>> >>>>> struct { >>>>> ? ? void *pointer; >>>>> ? ? PyBytesObject *signature; >>>>> } record; >>>>> >>>>> and then you allocate a NULL-terminated arrays of these for all the >>>>> overloads. >>>> >>>> >>>> However, the problem is the setup. These references will have to be >>>> created >>>> at init time and discarded during runtime termination. Not a problem for >>>> Cython generated code, but some overhead for hand written code. >>>> >>>> Since the size of these structs is not a problem, I'd prefer keeping >>>> Python >>>> objects out of the game and using an ssize_t ID instead, inferred from a >>>> char* signature at module init time by calling a C-API function. That >>>> avoids the need for any cleanup. >>> >>> >>> Actually, we could even use interned char* values. Nothing keeps that >>> C-API >>> setup function from reassigning the "char* signature" field to the char* >>> buffer of an internally allocated byte string. Except that we'd have to >>> *require* users to use literals or otherwise statically allocated C >>> strings >>> in that field. Hmm, maybe not the best idea ever... >>> >>> Stefan >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> You could create a module shared by all versions and projects, which >> exposes a function 'get_signature', which given a char *signature >> returns the pointer that should be used in the ABI signature type >> information. You can then always compare by identity. > > > I fail to see how this is different from what I proposed, with interning > bytes objects (which I still prefer; although the binary-search features of > direct comparison makes that attractive too). It's not really different, more a response to Stefan's comment. > BTW, any proposal that requires an actual project/library that both Cython > and NumPy depends on will fail in the real world. That's fine as long as they use the same way to expose ABI information. As a courtesy though, we could do it anyway, which makes it easier for those respective projects to understand what's involved, how to implement it, and they can then decide whether they want to ship that project as part of their own project. > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at gmail.com Fri Apr 13 19:26:17 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 10:26:17 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F881531.4090406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> >> Robert Bradshaw, 13.04.2012 12:17: >>> >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> >>>>> Have you given any thought as to what happens if __call__ is >>>>> re-assigned for an object (or subclass of an object) supporting this >>>>> interface? Or is this out of scope? >>>> >>>> >>>> Out-of-scope, I'd say. Though you can always write an object that >>>> detects if >>>> you assign to __call__... >> >> >> +1 for out of scope. This is a pure C level feature. >> >> >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the >>>> first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does >>>> will work efficiently work Cython. >>> >>> >>> +1 >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> ? ? void* pointer; >>> ? ? size_t signature; // compressed binary representation, 95% coverage > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. We have (on the high-performance systems we care about) 64-bits here. If we limit ourselves to a 6-bit alphabet, that gives a trivial encoding for up to 10 chars. We could be more clever here (Huffman coding) but that might be overkill. More importantly though, the "complicated" signatures are likely to be so cheap that the strcmp overhead matters. >>> ? ? char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? > > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > ? ?void *pointer; > ? ?PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the > overloads. Global interning is a nice idea. The one drawback I see is that it becomes much more expensive for dynamically calculated signatures. >> >> That would only apply at a per-module level, though, so it would require >> an >> indirection for the signature IDs. But it would avoid a global registry. >> >> Another idea would be to set the signature ID field to 0 at the beginning >> and call a C-API function to let the current runtime assign an ID> ?0, >> unique for the currently running application. Then every user would only >> have to parse the signature once to adapt to the respective ID and could >> otherwise branch based on it directly. >> >> For Cython, we could generate a static ID variable for each typed call >> that >> we found in the sources. When encountering a C signature on a callable, >> either a) the ID variable is still empty (initial case), then we parse the >> signature to see if it matches the expected signature. If it does, we >> assign the corresponding ID to the static ID variable and issue a direct >> call. If b) the ID field is already set (normal case), we compare the >> signature IDs directly and issue a C call it they match. If the IDs do not >> match, we issue a normal Python call. If I understand correctly, you're proposing struct { char* sig; long id; } sig_t; Where comparison would (sometimes?) compute id from sig by augmenting a global counter and dict? Might be expensive to bootstrap, but eventually all relevant ids would be filled in and it would be quick. Interesting. I wonder what the performance penalty would be over assuming id is statically computed lots of the time, and using that to compare against fixed values. And there's memory locality issues as well. >>>> Right... if we do some work to synchronize the types for Cython modules >>>> generated by the same version of Cython, we're left with 3-4 types for >>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>> of >>>> 10? >>> >>> >>> No, I think each closure is its own type. >> >> >> And that even applies to fused functions, right? They'd have one closure >> for each type combination. >> >> >>>> An alternative is do something funny in the type object to get across >>>> the >>>> offset-in-object information (abusing the docstring, or introduce our >>>> own >>>> flag which means that the type object has an additional non-standard >>>> field >>>> at the end). >>> >>> >>> It's a hack, but the flag + non-standard field idea might just work... >> >> >> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >> CPython 3.4, we could safely use it in all existing versions of CPython. > > > Sounds good. Perhaps just find a single "extended", then add a new flag > field in our payload, in case we need to extend the types object yet again > later and run out of unused flag bits (TBD: figure out how many unused flag > bits there are). > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From robertwb at gmail.com Fri Apr 13 19:26:34 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 10:26:34 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 5:48 AM, mark florisson wrote: > On 13 April 2012 12:59, Dag Sverre Seljebotn wrote: >> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>> >>> Robert Bradshaw, 13.04.2012 12:17: >>>> >>>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>>> >>>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>>> >>>>>> Have you given any thought as to what happens if __call__ is >>>>>> re-assigned for an object (or subclass of an object) supporting this >>>>>> interface? Or is this out of scope? >>>>> >>>>> >>>>> Out-of-scope, I'd say. Though you can always write an object that >>>>> detects if >>>>> you assign to __call__... >>> >>> >>> +1 for out of scope. This is a pure C level feature. >>> >>> >>>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>>> one wants to save the allocation one can still use a variable-sized >>>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>>> memory is already likely close and it greatly simplifies the logic. >>>>>> But I could be wrong here. >>>>> >>>>> >>>>> >>>>> Those minor nits are exactly what I seek; since Travis will have the >>>>> first >>>>> implementation in numba<->SciPy, I just want to make sure that what he >>>>> does >>>>> will work efficiently work Cython. >>>> >>>> >>>> +1 >>>> >>>> I have to admit building/invoking these var-arg-sized __nativecall__ >>>> records seems painful. Here's another suggestion: >>>> >>>> struct { >>>> ? ? void* pointer; >>>> ? ? size_t signature; // compressed binary representation, 95% coverage >> >> >> Once you start passing around functions that take memory view slices as >> arguments, that 95% estimate will be off I think. >> > > It kind of depends on which arguments types and how many arguments you > will allow, and whether or not collisions would be fine (which would > imply ID comparison + strcmp()). Interesting idea, though this has the drawback of doubling (at least) the overhead of the simple (important) case as well as memory requirements/locality issues. >>>> ? ? char* long_signature; // used if signature is not representable in >>>> a size_t, as indicated by signature = 0 >>>> } record; >>>> >>>> These char* could optionally be allocated at the end of the record* >>>> for optimal locality. We could even dispense with the binary >>>> signature, but having that option allows us to avoid strcmp for stuff >>>> like d)d and ffi)f. >>> >>> >>> Assuming we use literals and a const char* for the signature, the C >>> compiler would cut down the number of signature strings automatically for >>> us. And a pointer comparison is the same as a size_t comparison. >> >> >> I'll go one further: Intern Python bytes objects. It's just a PyObject*, but >> it's *required* (or just strongly encouraged) to have gone through >> >> sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) >> >> Obviously in a PEP you'd have a C-API function for such interning >> (completely standalone utility). Performance of interning operation itself >> doesn't matter... >> >> Unless CPython has interning features itself, like in Java? Was that present >> back in the day and then ripped out? >> >> Requiring interning is somewhat less elegant in one way, but it makes a lot >> of other stuff much simpler. >> >> That gives us >> >> struct { >> ? ?void *pointer; >> ? ?PyBytesObject *signature; >> } record; >> >> and then you allocate a NULL-terminated arrays of these for all the >> overloads. >> > > Interesting. What I like about size_t it that it could define a > deterministic ordering, which means specializations could be stored in > a binary search tree in array form. I think the number of specializations would have to be quite large (>10, maybe 100) before a binary search wins out over a simple scan, but if we stored a count rather than did a null-terminated array teh lookup function could take this into account. (The header will already have plenty of room if we're storing a version number and want the records to be properly aligned.) Requiring them to be sorted would also allow us to abort on average half way through a scan. Of course prioritizing the "likely" signatures first may be more of a win. > Cython would precompute the size_t > for the specialization it needs (and maybe account for promotions as > well). Exactly. >>> That would only apply at a per-module level, though, so it would require >>> an >>> indirection for the signature IDs. But it would avoid a global registry. >>> >>> Another idea would be to set the signature ID field to 0 at the beginning >>> and call a C-API function to let the current runtime assign an ID> ?0, >>> unique for the currently running application. Then every user would only >>> have to parse the signature once to adapt to the respective ID and could >>> otherwise branch based on it directly. >>> >>> For Cython, we could generate a static ID variable for each typed call >>> that >>> we found in the sources. When encountering a C signature on a callable, >>> either a) the ID variable is still empty (initial case), then we parse the >>> signature to see if it matches the expected signature. If it does, we >>> assign the corresponding ID to the static ID variable and issue a direct >>> call. If b) the ID field is already set (normal case), we compare the >>> signature IDs directly and issue a C call it they match. If the IDs do not >>> match, we issue a normal Python call. >>> >>> >>>>> Right... if we do some work to synchronize the types for Cython modules >>>>> generated by the same version of Cython, we're left with 3-4 types for >>>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>>> of >>>>> 10? >>>> >>>> >>>> No, I think each closure is its own type. >>> >>> >>> And that even applies to fused functions, right? They'd have one closure >>> for each type combination. >>> >>> >>>>> An alternative is do something funny in the type object to get across >>>>> the >>>>> offset-in-object information (abusing the docstring, or introduce our >>>>> own >>>>> flag which means that the type object has an additional non-standard >>>>> field >>>>> at the end). >>>> >>>> >>>> It's a hack, but the flag + non-standard field idea might just work... >>> >>> >>> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >>> CPython 3.4, we could safely use it in all existing versions of CPython. >> >> >> Sounds good. Perhaps just find a single "extended", then add a new flag >> field in our payload, in case we need to extend the types object yet again >> later and run out of unused flag bits (TBD: figure out how many unused flag >> bits there are). >> >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Maybe it would be a good idea if there was a third project that > defined this functionality in header files which projects could > include (or in case of Cython directly inject into the generated C > files). E.g. a function to check for the native interface, and a > function that given an array of signature strings and function > pointers builds the ABI information (and computes the ID), and one > that given an ID and signature string finds the right specialization. > The project should also expose a simple type system for the types we > care about, and be able to generate signature strings and IDs for > signatures. > > An optimization for the common case would be to only look at the first > entry in the ABI information directly and compare that for the > non-overloaded case, and otherwise do a logarithmic lookup, with a > final fallback to calling through the Python layer. I think the ABI should be simple (and fully specified) enough to allow a trivial implementation, and we and others could ship our implementations as a tiny C library (or just a header file). - Robert From robertwb at gmail.com Fri Apr 13 20:21:28 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 11:21:28 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: On Fri, Apr 13, 2012 at 10:26 AM, Robert Bradshaw wrote: > On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn > wrote: >> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>> That would only apply at a per-module level, though, so it would require >>> an >>> indirection for the signature IDs. But it would avoid a global registry. >>> >>> Another idea would be to set the signature ID field to 0 at the beginning >>> and call a C-API function to let the current runtime assign an ID> ?0, >>> unique for the currently running application. Then every user would only >>> have to parse the signature once to adapt to the respective ID and could >>> otherwise branch based on it directly. >>> >>> For Cython, we could generate a static ID variable for each typed call >>> that >>> we found in the sources. When encountering a C signature on a callable, >>> either a) the ID variable is still empty (initial case), then we parse the >>> signature to see if it matches the expected signature. If it does, we >>> assign the corresponding ID to the static ID variable and issue a direct >>> call. If b) the ID field is already set (normal case), we compare the >>> signature IDs directly and issue a C call it they match. If the IDs do not >>> match, we issue a normal Python call. > > If I understand correctly, you're proposing > > struct { > ?char* sig; > ?long id; > } sig_t; > > Where comparison would (sometimes?) compute id from sig by augmenting > a global counter and dict? Might be expensive to bootstrap, but > eventually all relevant ids would be filled in and it would be quick. > Interesting. I wonder what the performance penalty would be over > assuming id is statically computed lots of the time, and using that to > compare against fixed values. And there's memory locality issues as > well. To clarify, I'd really like to have the following as fast as possible: if (callable.sig.id == X) { // yep, that's what I thought } else { // generic call } Alternatively, one can imagine wanting to do: switch (callable.sig.id) { case X: // I can do this case Y: // this is common and fast as well ... default: // generic call } There is some question about how promotion should work (e.g. should this flexibility reside in the caller or the callee (or both, though that could result in a quadratic number of comparisons)?) - Robert From stefan_ml at behnel.de Fri Apr 13 21:15:22 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 21:15:22 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87B896.5050000@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> Message-ID: <4F887B4A.6090504@behnel.de> Stefan Behnel, 13.04.2012 07:24: > Dag Sverre Seljebotn, 13.04.2012 00:34: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> http://wiki.cython.org/enhancements/cep1000 > > I'm all for doing something in this direction and have been hinting at it > on the PyPy mailing list for a while, without reaction so far. I'll trigger > them again, with a pointer to this discussion and the CEP. PyPy should be > totally interested in a generic way to do fast calls into wrapped C code in > general and Cython implemented functions specifically. Their JIT would then > look at the function at runtime and unwrap it. I just learned that the support in PyPy would be rather straight forward. It already supports calling native code with a known signature through their "rlib/libffi.py" module, so all that remains to be done on their side is mapping the encoded signature to their own signature configuration. Stefan From robertwb at gmail.com Fri Apr 13 21:26:44 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 12:26:44 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F887B4A.6090504@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> <4F887B4A.6090504@behnel.de> Message-ID: On Fri, Apr 13, 2012 at 12:15 PM, Stefan Behnel wrote: > Stefan Behnel, 13.04.2012 07:24: >> Dag Sverre Seljebotn, 13.04.2012 00:34: >>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> http://wiki.cython.org/enhancements/cep1000 >> >> I'm all for doing something in this direction and have been hinting at it >> on the PyPy mailing list for a while, without reaction so far. I'll trigger >> them again, with a pointer to this discussion and the CEP. PyPy should be >> totally interested in a generic way to do fast calls into wrapped C code in >> general and Cython implemented functions specifically. Their JIT would then >> look at the function at runtime and unwrap it. > > I just learned that the support in PyPy would be rather straight forward. > It already supports calling native code with a known signature through > their "rlib/libffi.py" module, Cool. > so all that remains to be done on their side > is mapping the encoded signature to their own signature configuration. Or looking into borrowing theirs? (We might want more extensibility, e.g. declaring buffer types and nogil/exception data. I assume ctypes has a signature declaration format as well, right?) - Robert From stefan_ml at behnel.de Fri Apr 13 21:50:15 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 21:50:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87B896.5050000@behnel.de> <4F887B4A.6090504@behnel.de> Message-ID: <4F888377.7020100@behnel.de> Robert Bradshaw, 13.04.2012 21:26: > On Fri, Apr 13, 2012 at 12:15 PM, Stefan Behnel wrote: >> Stefan Behnel, 13.04.2012 07:24: >>> Dag Sverre Seljebotn, 13.04.2012 00:34: >>>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>>> http://wiki.cython.org/enhancements/cep1000 >>> >>> I'm all for doing something in this direction and have been hinting at it >>> on the PyPy mailing list for a while, without reaction so far. I'll trigger >>> them again, with a pointer to this discussion and the CEP. PyPy should be >>> totally interested in a generic way to do fast calls into wrapped C code in >>> general and Cython implemented functions specifically. Their JIT would then >>> look at the function at runtime and unwrap it. >> >> I just learned that the support in PyPy would be rather straight forward. >> It already supports calling native code with a known signature through >> their "rlib/libffi.py" module, > > Cool. > >> so all that remains to be done on their side >> is mapping the encoded signature to their own signature configuration. > > Or looking into borrowing theirs? (We might want more extensibility, > e.g. declaring buffer types and nogil/exception data. I assume ctypes > has a signature declaration format as well, right?) PyPy's ctypes implementation is based on libffi. However, I think neither of the two has a declaration format (e.g. string based) other than the object based declaration notation. You basically pass them a sequence of type objects to declare the signature. That's not really easy to map to the C level - at least not efficiently... Stefan From stefan_ml at behnel.de Fri Apr 13 21:52:33 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Apr 2012 21:52:33 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: <4F888401.90309@behnel.de> Robert Bradshaw, 13.04.2012 20:21: > On Fri, Apr 13, 2012 at 10:26 AM, Robert Bradshaw wrote: >> On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: >>> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>>> That would only apply at a per-module level, though, so it would >>>> require an indirection for the signature IDs. But it would avoid a >>>> global registry. >>>> >>>> Another idea would be to set the signature ID field to 0 at the beginning >>>> and call a C-API function to let the current runtime assign an ID> 0, >>>> unique for the currently running application. Then every user would only >>>> have to parse the signature once to adapt to the respective ID and could >>>> otherwise branch based on it directly. >>>> >>>> For Cython, we could generate a static ID variable for each typed call >>>> that >>>> we found in the sources. When encountering a C signature on a callable, >>>> either a) the ID variable is still empty (initial case), then we parse the >>>> signature to see if it matches the expected signature. If it does, we >>>> assign the corresponding ID to the static ID variable and issue a direct >>>> call. If b) the ID field is already set (normal case), we compare the >>>> signature IDs directly and issue a C call it they match. If the IDs do not >>>> match, we issue a normal Python call. >> >> If I understand correctly, you're proposing >> >> struct { >> char* sig; >> long id; >> } sig_t; >> >> Where comparison would (sometimes?) compute id from sig by augmenting >> a global counter and dict? Might be expensive to bootstrap, but >> eventually all relevant ids would be filled in and it would be quick. Yes. If a function is only called once, the overhead won't matter. And starting from the second call, it would either be fast if the function signature matches or slow anyway if it doesn't match. >> Interesting. I wonder what the performance penalty would be over >> assuming id is statically computed lots of the time, and using that to >> compare against fixed values. And there's memory locality issues as >> well. > > To clarify, I'd really like to have the following as fast as possible: > > if (callable.sig.id == X) { > // yep, that's what I thought > } else { > // generic call > } > > Alternatively, one can imagine wanting to do: > > switch (callable.sig.id) { > case X: > // I can do this > case Y: > // this is common and fast as well > ... > default: > // generic call > } Yes, that's the idea. > There is some question about how promotion should work (e.g. should > this flexibility reside in the caller or the callee (or both, though > that could result in a quadratic number of comparisons)?) Callees could expose multiple signatures (which would result in a direct call for each, without further comparisons), then the caller would have to choose between those. However, if none matches exactly, the caller might want to promote its arguments and try more signatures. In any case, it's the caller that does the work, never the callee. We could generate code like this: /* cdef int x = ... * cdef long y = ... * cdef int z # interesting: what if z is not typed? * z = func(x, y) */ if (func.sig.id == id("[int,long] -> int")) { z = ((cast)func.cfunc) (x,y); } else if (sizeof(long) > sizeof(int) && (func.sig.id == id("[long,long] -> int"))) { z = ((cast)func.cfunc) ((long)x, y); } etc. ... else { /* pack and call as Python function */ } Meaning, the C compiler could reduce the amount of optimistic call code at compile time. Stefan From d.s.seljebotn at astro.uio.no Fri Apr 13 22:27:29 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 13 Apr 2012 22:27:29 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> Message-ID: <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Ah, I didn't think about 6-bit or huffman. Certainly helps. I'm almost +1 on your proposal now, but a couple of more ideas: 1) Let the key (the size_t) spill over to the next specialization entry if it is too large; and prepend that key with a continuation code (two size-ts could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using - as continuation). The key-based caller will expect a continuation if it knows about the specialization, and the prepended char will prevent spurios matches against the overspilled slot. We could even use the pointers for part of the continuation... 2) Separate the char* format strings from the keys, ie this memory layout: Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... Where nslots is larger than nspecs if there are continuations. OK, this is getting close to my original proposal, but the difference is the contiunation char, so that if you expect a short signature, you can safely scan every slot and branching and no null-checking necesarry. Dag -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Robert Bradshaw wrote: On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: > On 04/13/2012 01:38 PM, Stefan Behnel wrote: >> >> Robert Bradshaw, 13.04.2012 12:17: >>> >>> On Fri, Apr 13, 2012 at 1:52 AM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/13/2012 01:38 AM, Robert Bradshaw wrote: >>>>> >>>>> Have you given any thought as to what happens if __call__ is >>>>> re-assigned for an object (or subclass of an object) supporting this >>>>> interface? Or is this out of scope? >>>> >>>> >>>> Out-of-scope, I'd say. Though you can always write an object that >>>> detects if >>>> you assign to __call__... >> >> >> +1 for out of scope. This is a pure C level feature. >> >> >>>>> Minor nit: I don't think should_dereference is worth branching on, if >>>>> one wants to save the allocation one can still use a variable-sized >>>>> type and point to oneself. Yes, that's an extra dereference, but the >>>>> memory is already likely close and it greatly simplifies the logic. >>>>> But I could be wrong here. >>>> >>>> >>>> >>>> Those minor nits are exactly what I seek; since Travis will have the >>>> first >>>> implementation in numba<->SciPy, I just want to make sure that what he >>>> does >>>> will work efficiently work Cython. >>> >>> >>> +1 >>> >>> I have to admit building/invoking these var-arg-sized __nativecall__ >>> records seems painful. Here's another suggestion: >>> >>> struct { >>> void* pointer; >>> size_t signature; // compressed binary representation, 95% coverage > > Once you start passing around functions that take memory view slices as > arguments, that 95% estimate will be off I think. We have (on the high-performance systems we care about) 64-bits here. If we limit ourselves to a 6-bit alphabet, that gives a trivial encoding for up to 10 chars. We could be more clever here (Huffman coding) but that might be overkill. More importantly though, the "complicated" signatures are likely to be so cheap that the strcmp overhead matters. >>> char* long_signature; // used if signature is not representable in >>> a size_t, as indicated by signature = 0 >>> } record; >>> >>> These char* could optionally be allocated at the end of the record* >>> for optimal locality. We could even dispense with the binary >>> signature, but having that option allows us to avoid strcmp for stuff >>> like d)d and ffi)f. >> >> >> Assuming we use literals and a const char* for the signature, the C >> compiler would cut down the number of signature strings automatically for >> us. And a pointer comparison is the same as a size_t comparison. > > > I'll go one further: Intern Python bytes objects. It's just a PyObject*, but > it's *required* (or just strongly encouraged) to have gone through > > sig = sys.modules['_nativecall']['interned_db'].setdefault(sig, sig) > > Obviously in a PEP you'd have a C-API function for such interning > (completely standalone utility). Performance of interning operation itself > doesn't matter... > > Unless CPython has interning features itself, like in Java? Was that present > back in the day and then ripped out? > > Requiring interning is somewhat less elegant in one way, but it makes a lot > of other stuff much simpler. > > That gives us > > struct { > void *pointer; > PyBytesObject *signature; > } record; > > and then you allocate a NULL-terminated arrays of these for all the > overloads. Global interning is a nice idea. The one drawback I see is that it becomes much more expensive for dynamically calculated signatures. >> >> That would only apply at a per-module level, though, so it would require >> an >> indirection for the signature IDs. But it would avoid a global registry. >> >> Another idea would be to set the signature ID field to 0 at the beginning >> and call a C-API function to let the current runtime assign an ID> 0, >> unique for the currently running application. Then every user would only >> have to parse the signature once to adapt to the respective ID and could >> otherwise branch based on it directly. >> >> For Cython, we could generate a static ID variable for each typed call >> that >> we found in the sources. When encountering a C signature on a callable, >> either a) the ID variable is still empty (initial case), then we parse the >> signature to see if it matches the expected signature. If it does, we >> assign the corresponding ID to the static ID variable and issue a direct >> call. If b) the ID field is already set (normal case), we compare the >> signature IDs directly and issue a C call it they match. If the IDs do not >> match, we issue a normal Python call. If I understand correctly, you're proposing struct { char* sig; long id; } sig_t; Where comparison would (sometimes?) compute id from sig by augmenting a global counter and dict? Might be expensive to bootstrap, but eventually all relevant ids would be filled in and it would be quick. Interesting. I wonder what the performance penalty would be over assuming id is statically computed lots of the time, and using that to compare against fixed values. And there's memory locality issues as well. >>>> Right... if we do some work to synchronize the types for Cython modules >>>> generated by the same version of Cython, we're left with 3-4 types for >>>> Cython, right? Then a couple for numba and one for f2py; so on the order >>>> of >>>> 10? >>> >>> >>> No, I think each closure is its own type. >> >> >> And that even applies to fused functions, right? They'd have one closure >> for each type combination. >> >> >>>> An alternative is do something funny in the type object to get across >>>> the >>>> offset-in-object information (abusing the docstring, or introduce our >>>> own >>>> flag which means that the type object has an additional non-standard >>>> field >>>> at the end). >>> >>> >>> It's a hack, but the flag + non-standard field idea might just work... >> >> >> Plus, it wouldn't have to stay a non-standard field. If it's accepted into >> CPython 3.4, we could safely use it in all existing versions of CPython. > > > Sounds good. Perhaps just find a single "extended", then add a new flag > field in our payload, in case we need to extend the types object yet again > later and run out of unused flag bits (TBD: figure out how many unused flag > bits there are). > > Dag > >_____________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel _____________________________________________ cython-devel mailing list cython-devel at python.org http://mail.python.org/mailman/listinfo/cython-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at gmail.com Fri Apr 13 22:54:12 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 13:54:12 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F888401.90309@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <4F888401.90309@behnel.de> Message-ID: On Fri, Apr 13, 2012 at 12:52 PM, Stefan Behnel wrote: > Robert Bradshaw, 13.04.2012 20:21: >> On Fri, Apr 13, 2012 at 10:26 AM, Robert Bradshaw wrote: >>> On Fri, Apr 13, 2012 at 4:59 AM, Dag Sverre Seljebotn wrote: >>>> On 04/13/2012 01:38 PM, Stefan Behnel wrote: >>>>> That would only apply at a per-module level, though, so it would >>>>> require an indirection for the signature IDs. But it would avoid a >>>>> global registry. >>>>> >>>>> Another idea would be to set the signature ID field to 0 at the beginning >>>>> and call a C-API function to let the current runtime assign an ID> ?0, >>>>> unique for the currently running application. Then every user would only >>>>> have to parse the signature once to adapt to the respective ID and could >>>>> otherwise branch based on it directly. >>>>> >>>>> For Cython, we could generate a static ID variable for each typed call >>>>> that >>>>> we found in the sources. When encountering a C signature on a callable, >>>>> either a) the ID variable is still empty (initial case), then we parse the >>>>> signature to see if it matches the expected signature. If it does, we >>>>> assign the corresponding ID to the static ID variable and issue a direct >>>>> call. If b) the ID field is already set (normal case), we compare the >>>>> signature IDs directly and issue a C call it they match. If the IDs do not >>>>> match, we issue a normal Python call. >>> >>> If I understand correctly, you're proposing >>> >>> struct { >>> ?char* sig; >>> ?long id; >>> } sig_t; >>> >>> Where comparison would (sometimes?) compute id from sig by augmenting >>> a global counter and dict? Might be expensive to bootstrap, but >>> eventually all relevant ids would be filled in and it would be quick. > > Yes. If a function is only called once, the overhead won't matter. And > starting from the second call, it would either be fast if the function > signature matches or slow anyway if it doesn't match. There's still data locality issues, including the cached id for the caller as well as the callee. >>> Interesting. I wonder what the performance penalty would be over >>> assuming id is statically computed lots of the time, and using that to >>> compare against fixed values. And there's memory locality issues as >>> well. >> >> To clarify, I'd really like to have the following as fast as possible: >> >> if (callable.sig.id == X) { >> ? ?// yep, that's what I thought >> } else { >> ? ?// generic call >> } >> >> Alternatively, one can imagine wanting to do: >> >> switch (callable.sig.id) { >> ? ? case X: >> ? ? ? ? // I can do this >> ? ? case Y: >> ? ? ? ? // this is common and fast as well >> ? ? ... >> ? ? default: >> ? ? ? ? // generic call >> } > > Yes, that's the idea. > > >> There is some question about how promotion should work (e.g. should >> this flexibility reside in the caller or the callee (or both, though >> that could result in a quadratic number of comparisons)?) > > Callees could expose multiple signatures (which would result in a direct > call for each, without further comparisons), then the caller would have to > choose between those. However, if none matches exactly, the caller might > want to promote its arguments and try more signatures. In any case, it's > the caller that does the work, never the callee. > > We could generate code like this: > > ? ?/* cdef int x = ... > ? ? * cdef long y = ... > ? ? * cdef int z ? ? ? # interesting: what if z is not typed? > ? ? * z = func(x, y) > ? ? */ > > ? ?if (func.sig.id == id("[int,long] -> int")) { > ? ? ? ? z = ((cast)func.cfunc) (x,y); > ? ?} else if (sizeof(long) > sizeof(int) && > ? ? ? ? ? ? ? (func.sig.id == id("[long,long] -> int"))) { > ? ? ? ? z = ((cast)func.cfunc) ((long)x, y); > ? ?} etc. ... else { > ? ? ? ? /* pack and call as Python function */ > ? ?} > > Meaning, the C compiler could reduce the amount of optimistic call code at > compile time. Interesting idea. Alternatively, I wonder if the signature could reflect exactly-sized types rather than int/long/etc. Perhaps that would make the code more complicated on both ends... I'm assuming your id(...) is computed at compile time in this example, right? Otherwise it would get a bit messier. - Robert From robertwb at gmail.com Fri Apr 13 23:18:30 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 14:18:30 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 1:27 PM, Dag Sverre Seljebotn wrote: > Ah, I didn't think about 6-bit or huffman. Certainly helps. Yeah, we don't want to complicate the ABI too much, but I think something like 8 4-bit common chars and 32 6-bit other chars (or 128 8-bit other chars) wouldn't be outrageous. The fact that we only have to encode into a single word makes the algorithm very simple (though the majority of the time we'd spit out pre-encoded literals). We have a version number to play with this as well. > I'm almost +1 on your proposal now, but a couple of more ideas: > > 1) Let the key (the size_t) spill over to the next specialization entry if > it is too large; and prepend that key with a continuation code (two size-ts > could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using > - as continuation). The key-based caller will expect a continuation if it > knows about the specialization, and the prepended char will prevent spurios > matches against the overspilled slot. > > We could even use the pointers for part of the continuation... > > 2) Separate the char* format strings from the keys, ie this memory layout: > > Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... > > Where nslots is larger than nspecs if there are continuations. > > OK, this is getting close to my original proposal, but the difference is the > contiunation char, so that if you expect a short signature, you can safely > scan every slot and branching and no null-checking necesarry. I don't think we need nslots (though it might be interesting). My thought is that once you start futzing with variable-length keys, you might as well just compare char*s. If one is concerned about memory, one could force the sigcharptr to be aligned, and then the "keys" could be either sigcharptr or key depending on whether the least significant bit was set. One could easily scan for/switch on a key and scanning for a char* would be almost as easy (just don't dereference if the lsb is set). I don't see us being memory constrained, so (version,nspecs,futureuse),(key,sigcharptr,funcptr)*,optionalsigchardata* seems fine to me even if only one of key/sigchrptr is ever used per spec. Null-terminating the specs would work fine as well (one less thing to keep track of during iteration). - Robert From njs at pobox.com Fri Apr 13 23:24:44 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Apr 2012 22:24:44 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn wrote: > Ah, I didn't think about 6-bit or huffman. Certainly helps. > > I'm almost +1 on your proposal now, but a couple of more ideas: > > 1) Let the key (the size_t) spill over to the next specialization entry if > it is too large; and prepend that key with a continuation code (two size-ts > could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using > - as continuation). The key-based caller will expect a continuation if it > knows about the specialization, and the prepended char will prevent spurios > matches against the overspilled slot. > > We could even use the pointers for part of the continuation... I am really lost here. Why is any of this complicated encoding stuff better than interning? Interning takes one line of code, is incredibly cheap (one dict lookup per call site and function definition), and it lets you check any possible signature (even complicated ones involving memoryviews) by doing a single-word comparison. And best of all, you don't have to think hard to make sure you got the encoding right. ;-) On a 32-bit system, pointers are smaller than a size_t, but more expressive! You can still do binary search if you want, etc. Is the problem just that interning requires a runtime calculation? Because I feel like C users (like numpy) will want to compute these compressed codes at module-init anyway, and those of us with a fancy compiler capable of computing them ahead of time (like Cython) can instruct that fancy compiler to compute them at module-init time just as easily? -- Nathaniel From robertwb at gmail.com Fri Apr 13 23:50:05 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 14:50:05 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith wrote: > On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn > wrote: >> Ah, I didn't think about 6-bit or huffman. Certainly helps. >> >> I'm almost +1 on your proposal now, but a couple of more ideas: >> >> 1) Let the key (the size_t) spill over to the next specialization entry if >> it is too large; and prepend that key with a continuation code (two size-ts >> could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, using >> - as continuation). The key-based caller will expect a continuation if it >> knows about the specialization, and the prepended char will prevent spurios >> matches against the overspilled slot. >> >> We could even use the pointers for part of the continuation... > > I am really lost here. Why is any of this complicated encoding stuff > better than interning? Interning takes one line of code, is incredibly > cheap (one dict lookup per call site and function definition), and it > lets you check any possible signature (even complicated ones involving > memoryviews) by doing a single-word comparison. And best of all, you > don't have to think hard to make sure you got the encoding right. ;-) > > On a 32-bit system, pointers are smaller than a size_t, but more > expressive! You can still do binary search if you want, etc. Is the > problem just that interning requires a runtime calculation? Because I > feel like C users (like numpy) will want to compute these compressed > codes at module-init anyway, and those of us with a fancy compiler > capable of computing them ahead of time (like Cython) can instruct > that fancy compiler to compute them at module-init time just as > easily? Good question. The primary disadvantage of interning that I see is memory locality. I suppose if all the C-level caches of interned values were co-located, this may not be as big of an issue. Not being able to compare against compile-time constants may thwart some optimization opportunities, but that's less clear. It also requires coordination common repository, but I suppose one would just stick a set in some standard module (or leverage Python's interning). - Robert From d.s.seljebotn at astro.uio.no Sat Apr 14 00:06:47 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 00:06:47 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: Robert Bradshaw wrote: >On Fri, Apr 13, 2012 at 1:27 PM, Dag Sverre Seljebotn > wrote: >> Ah, I didn't think about 6-bit or huffman. Certainly helps. > >Yeah, we don't want to complicate the ABI too much, but I think >something like 8 4-bit common chars and 32 6-bit other chars (or 128 >8-bit other chars) wouldn't be outrageous. The fact that we only have >to encode into a single word makes the algorithm very simple (though >the majority of the time we'd spit out pre-encoded literals). We have >a version number to play with this as well. > >> I'm almost +1 on your proposal now, but a couple of more ideas: >> >> 1) Let the key (the size_t) spill over to the next specialization >entry if >> it is too large; and prepend that key with a continuation code (two >size-ts >> could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, >using >> - as continuation). The key-based caller will expect a continuation >if it >> knows about the specialization, and the prepended char will prevent >spurios >> matches against the overspilled slot. >> >> We could even use the pointers for part of the continuation... >> >> 2) Separate the char* format strings from the keys, ie this memory >layout: >> >> >Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... >> >> Where nslots is larger than nspecs if there are continuations. >> >> OK, this is getting close to my original proposal, but the difference >is the >> contiunation char, so that if you expect a short signature, you can >safely >> scan every slot and branching and no null-checking necesarry. > >I don't think we need nslots (though it might be interesting). My >thought is that once you start futzing with variable-length keys, you >might as well just compare char*s. This is where we disagree. If you are the caller you know at compile-time how much you want to match; I think comparing 2 or 3 size-t with no looping is a lot better (a fully-unrolled, 64-bit per instruction strcmp with one of the operands known to the compiler...). > >If one is concerned about memory, one could force the sigcharptr to be >aligned, and then the "keys" could be either sigcharptr or key >depending on whether the least significant bit was set. One could >easily scan for/switch on a key and scanning for a char* would be >almost as easy (just don't dereference if the lsb is set). > >I don't see us being memory constrained, so > >(version,nspecs,futureuse),(key,sigcharptr,funcptr)*,optionalsigchardata* > >seems fine to me even if only one of key/sigchrptr is ever used per >spec. Null-terminating the specs would work fine as well (one less >thing to keep track of during iteration). Well, can't one always use more L1 cache, or is that not a concern? If you have 5-6 different routines calling each other using this mechanism, each with multiple specializations, those unused slots translate to many cache lines wasted. I don't think it is that important, I just think that how pretty the C struct declaration ends up looking should not be a concern at all, when the whole point of this is speed anyway. You can always just use a throwaway struct declaration and a cast to get whatever layout you need. If the 'padding' leads to less branching then fine, but I don't see that it helps in any way. To refine my proposal a bit, we have a list of variable size entries, (keydata, keydata, ..., funcptr) where each keydata and the ptr is 64 bits on all platforms (see below); each entry must have a total length multiple of 128 bits (so that one can safely scan for a signature in 128 bit increments in the data *without* parsing or branching, you'll never hit a pointer), and each key but the first starts with a 'dash'. Signature strings are either kept separate, or even parsed/decoded from the keys. We really only care about speed when you have compiled or JITed code for the case, decoding should be fine otherwise. BTW, won't the Cython-generated C code be a horrible mess if we use size-t rather than insist on int64t? (ok, those need some ifdefs for various compilers, but still seem cleaner than operating with 32bit and 64bit keys, and stdint.h is winning ground). Dag > >- Robert >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Sat Apr 14 00:22:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 00:22:15 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> Message-ID: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Robert Bradshaw wrote: >On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith wrote: >> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >> wrote: >>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>> >>> I'm almost +1 on your proposal now, but a couple of more ideas: >>> >>> 1) Let the key (the size_t) spill over to the next specialization >entry if >>> it is too large; and prepend that key with a continuation code (two >size-ts >>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >encoding, using >>> - as continuation). The key-based caller will expect a continuation >if it >>> knows about the specialization, and the prepended char will prevent >spurios >>> matches against the overspilled slot. >>> >>> We could even use the pointers for part of the continuation... >> >> I am really lost here. Why is any of this complicated encoding stuff >> better than interning? Interning takes one line of code, is >incredibly >> cheap (one dict lookup per call site and function definition), and it >> lets you check any possible signature (even complicated ones >involving >> memoryviews) by doing a single-word comparison. And best of all, you >> don't have to think hard to make sure you got the encoding right. ;-) >> >> On a 32-bit system, pointers are smaller than a size_t, but more >> expressive! You can still do binary search if you want, etc. Is the >> problem just that interning requires a runtime calculation? Because I >> feel like C users (like numpy) will want to compute these compressed >> codes at module-init anyway, and those of us with a fancy compiler >> capable of computing them ahead of time (like Cython) can instruct >> that fancy compiler to compute them at module-init time just as >> easily? > >Good question. > >The primary disadvantage of interning that I see is memory locality. I >suppose if all the C-level caches of interned values were co-located, >this may not be as big of an issue. Not being able to compare against >compile-time constants may thwart some optimization opportunities, but >that's less clear. > >It also requires coordination common repository, but I suppose one >would just stick a set in some standard module (or leverage Python's >interning). More problems: 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python and we should not make it worse. You basically *need* a thread safe store separate from any python interpreter; though pythread.h does not rely on the interpreter state; which helps. 2) you end up with the known comparison values in read-write memory segments rather than readonly segments, which is probably worse on multicore systems? I really think that anything that we can do to make this near-c-speed should be done; none of the proposals are *that* complicated. Using keys, NumPy can in the C code choose to be slower but more readable; but using interned string forces cython to be slower, cython gets no way of choosing to go faster. (to the degree that it has an effect; none of these claims were checked) Dag > >- Robert >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Sat Apr 14 00:31:58 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 00:31:58 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: <8effd9fb-f145-48a8-a4cb-a75abac57a89@email.android.com> Dag Sverre Seljebotn wrote: > > >Robert Bradshaw wrote: > >>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith >wrote: >>> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >>> wrote: >>>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>>> >>>> I'm almost +1 on your proposal now, but a couple of more ideas: >>>> >>>> 1) Let the key (the size_t) spill over to the next specialization >>entry if >>>> it is too large; and prepend that key with a continuation code (two >>size-ts >>>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >>encoding, using >>>> - as continuation). The key-based caller will expect a continuation >>if it >>>> knows about the specialization, and the prepended char will prevent >>spurios >>>> matches against the overspilled slot. >>>> >>>> We could even use the pointers for part of the continuation... >>> >>> I am really lost here. Why is any of this complicated encoding stuff >>> better than interning? Interning takes one line of code, is >>incredibly >>> cheap (one dict lookup per call site and function definition), and >it >>> lets you check any possible signature (even complicated ones >>involving >>> memoryviews) by doing a single-word comparison. And best of all, you >>> don't have to think hard to make sure you got the encoding right. >;-) >>> >>> On a 32-bit system, pointers are smaller than a size_t, but more >>> expressive! You can still do binary search if you want, etc. Is the >>> problem just that interning requires a runtime calculation? Because >I >>> feel like C users (like numpy) will want to compute these compressed >>> codes at module-init anyway, and those of us with a fancy compiler >>> capable of computing them ahead of time (like Cython) can instruct >>> that fancy compiler to compute them at module-init time just as >>> easily? >> >>Good question. >> >>The primary disadvantage of interning that I see is memory locality. I >>suppose if all the C-level caches of interned values were co-located, >>this may not be as big of an issue. Not being able to compare against >>compile-time constants may thwart some optimization opportunities, but >>that's less clear. >> >>It also requires coordination common repository, but I suppose one >>would just stick a set in some standard module (or leverage Python's >>interning). > >More problems: > >1) It doesn't work well with multiple interpreter states. Ok, nothing >works with that at the moment, but it is on the roadmap for Python and >we should not make it worse. > >You basically *need* a thread safe store separate from any python >interpreter; though pythread.h does not rely on the interpreter state; >which helps. No, it doesn't, unless we want to ship a single(!) .so-file that can be depended upon by all relevant projects. There's just no way for loaded modules to communicate and synchronize that they know about this CEP except through an interpreter... That's almost impossible to work around in any clean way? (I can think of several very ugly ones...) Unless the multiple interpreter state idea is entirely dead in CPython, interning must be done seperately for each interpreter and the values stored in the module object. Ugh. Dag > >2) you end up with the known comparison values in read-write memory >segments rather than readonly segments, which is probably worse on >multicore systems? > >I really think that anything that we can do to make this near-c-speed >should be done; none of the proposals are *that* complicated. > >Using keys, NumPy can in the C code choose to be slower but more >readable; but using interned string forces cython to be slower, cython >gets no way of choosing to go faster. (to the degree that it has an >effect; none of these claims were checked) > >Dag > > >> >>- Robert >>_______________________________________________ >>cython-devel mailing list >>cython-devel at python.org >>http://mail.python.org/mailman/listinfo/cython-devel > >-- >Sent from my Android phone with K-9 Mail. Please excuse my brevity. >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From robertwb at gmail.com Sat Apr 14 01:46:39 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 13 Apr 2012 16:46:39 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 3:06 PM, Dag Sverre Seljebotn wrote: > > > Robert Bradshaw wrote: > >>On Fri, Apr 13, 2012 at 1:27 PM, Dag Sverre Seljebotn >> wrote: >>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >> >>Yeah, we don't want to complicate the ABI too much, but I think >>something like 8 4-bit common chars and 32 6-bit other chars (or 128 >>8-bit other chars) wouldn't be outrageous. The fact that we only have >>to encode into a single word makes the algorithm very simple (though >>the majority of the time we'd spit out pre-encoded literals). We have >>a version number to play with this as well. >> >>> I'm almost +1 on your proposal now, but a couple of more ideas: >>> >>> 1) Let the key (the size_t) spill over to the next specialization >>entry if >>> it is too large; and prepend that key with a continuation code (two >>size-ts >>> could together say "iii)-d\0\0" on 32 bit systems with 8bit encoding, >>using >>> - as continuation). The key-based caller will expect a continuation >>if it >>> knows about the specialization, and the prepended char will prevent >>spurios >>> matches against the overspilled slot. >>> >>> We could even use the pointers for part of the continuation... >>> >>> 2) Separate the char* format strings from the keys, ie this memory >>layout: >>> >>> >>Version,nslots,nspecs,funcptr,key,funcptr,key,...,sigcharptr,sigcharptr... >>> >>> Where nslots is larger than nspecs if there are continuations. >>> >>> OK, this is getting close to my original proposal, but the difference >>is the >>> contiunation char, so that if you expect a short signature, you can >>safely >>> scan every slot and branching and no null-checking necesarry. >> >>I don't think we need nslots (though it might be interesting). My >>thought is that once you start futzing with variable-length keys, you >>might as well just compare char*s. > > This is where we disagree. If you are the caller you know at compile-time how much you want to match; I think comparing 2 or 3 size-t with no looping is a lot better (a fully-unrolled, 64-bit per instruction strcmp with one of the operands known to the compiler...). Doesn't the compiler unroll strcmp much like this for a known operand? >>If one is concerned about memory, one could force the sigcharptr to be >>aligned, and then the "keys" could be either sigcharptr or key >>depending on whether the least significant bit was set. One could >>easily scan for/switch on a key and scanning for a char* would be >>almost as easy (just don't dereference if the lsb is set). >> >>I don't see us being memory constrained, so >> >>(version,nspecs,futureuse),(key,sigcharptr,funcptr)*,optionalsigchardata* >> >>seems fine to me even if only one of key/sigchrptr is ever used per >>spec. Null-terminating the specs would work fine as well (one less >>thing to keep track of during iteration). > > Well, can't one always use more L1 cache, or is that not a concern? If you have 5-6 different routines calling each other using this mechanism, each with multiple specializations, those unused slots translate to many cache lines wasted. > > I don't think it is that important, I just think that how pretty the C struct declaration ends up looking should not be a concern at all, when the whole point of this is speed anyway. You can always just use a throwaway struct declaration and a cast to get whatever layout you need. If the 'padding' leads to less branching then fine, but I don't see that it helps in any way. I was more concerned about guaranteeing each char* was aligned. > To refine my proposal a bit, we have a list of variable size entries, > > (keydata, keydata, ..., funcptr) > > where each keydata and the ptr is 64 bits on all platforms (see below); each entry must have a total length multiple of 128 bits (so that one can safely scan for a signature in 128 bit increments in the data *without* parsing or branching, you'll never hit a pointer), and each key but the first starts with a 'dash'. Ah, OK, similar to UTF-8. Yes, I like this idea. > Signature strings are either kept separate, or even parsed/decoded from the keys. We really only care about speed when you have compiled or JITed code for the case, decoding should be fine otherwise. True. > BTW, won't the Cython-generated C code be a horrible mess if we use size-t rather than insist on int64t? (ok, those need some ifdefs for various compilers, but still seem cleaner than operating with 32bit and 64bit keys, and stdint.h is winning ground). Sure, we could require 64-bit keys (and pointer slots). On Fri, Apr 13, 2012 at 3:22 PM, Dag Sverre Seljebotn wrote: >>> I am really lost here. Why is any of this complicated encoding stuff >>> better than interning? Interning takes one line of code, is >>incredibly >>> cheap (one dict lookup per call site and function definition), and it >>> lets you check any possible signature (even complicated ones >>involving >>> memoryviews) by doing a single-word comparison. And best of all, you >>> don't have to think hard to make sure you got the encoding right. ;-) >>> >>> On a 32-bit system, pointers are smaller than a size_t, but more >>> expressive! You can still do binary search if you want, etc. Is the >>> problem just that interning requires a runtime calculation? Because I >>> feel like C users (like numpy) will want to compute these compressed >>> codes at module-init anyway, and those of us with a fancy compiler >>> capable of computing them ahead of time (like Cython) can instruct >>> that fancy compiler to compute them at module-init time just as >>> easily? >> >>Good question. >> >>The primary disadvantage of interning that I see is memory locality. I >>suppose if all the C-level caches of interned values were co-located, >>this may not be as big of an issue. Not being able to compare against >>compile-time constants may thwart some optimization opportunities, but >>that's less clear. >> >>It also requires coordination common repository, but I suppose one >>would just stick a set in some standard module (or leverage Python's >>interning). > > More problems: > > 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python and we should not make it worse. > > You basically *need* a thread safe store separate from any python interpreter; though pythread.h does not rely on the interpreter state; which helps. I didn't know about the push for multiple interpreter states, but yeah, that makes things much more painful. > 2) you end up with the known comparison values in read-write memory segments rather than readonly segments, which is probably worse on multicore systems? Yeah, this is the kind of stuff I was vaguely worried about when I wrote "Not being able to compare against compile-time constants may thwart some optimization opportunities." I don't know what the impact is, but it's worth trying to measure and take into account. > I really think that anything that we can do to make this near-c-speed should be done; none of the proposals are *that* complicated. > > Using keys, NumPy can in the C code choose to be slower but more readable; but using interned string forces cython to be slower, cython gets no way of choosing to go faster. (to the degree that it has an effect; none of these claims were checked) Yep, agreed. - Robert From njs at pobox.com Sat Apr 14 02:19:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 14 Apr 2012 01:19:41 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: On Fri, Apr 13, 2012 at 11:22 PM, Dag Sverre Seljebotn wrote: > > > Robert Bradshaw wrote: > >>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith wrote: >>> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >>> wrote: >>>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>>> >>>> I'm almost +1 on your proposal now, but a couple of more ideas: >>>> >>>> 1) Let the key (the size_t) spill over to the next specialization >>entry if >>>> it is too large; and prepend that key with a continuation code (two >>size-ts >>>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >>encoding, using >>>> - as continuation). The key-based caller will expect a continuation >>if it >>>> knows about the specialization, and the prepended char will prevent >>spurios >>>> matches against the overspilled slot. >>>> >>>> We could even use the pointers for part of the continuation... >>> >>> I am really lost here. Why is any of this complicated encoding stuff >>> better than interning? Interning takes one line of code, is >>incredibly >>> cheap (one dict lookup per call site and function definition), and it >>> lets you check any possible signature (even complicated ones >>involving >>> memoryviews) by doing a single-word comparison. And best of all, you >>> don't have to think hard to make sure you got the encoding right. ;-) >>> >>> On a 32-bit system, pointers are smaller than a size_t, but more >>> expressive! You can still do binary search if you want, etc. Is the >>> problem just that interning requires a runtime calculation? Because I >>> feel like C users (like numpy) will want to compute these compressed >>> codes at module-init anyway, and those of us with a fancy compiler >>> capable of computing them ahead of time (like Cython) can instruct >>> that fancy compiler to compute them at module-init time just as >>> easily? >> >>Good question. >> >>The primary disadvantage of interning that I see is memory locality. I >>suppose if all the C-level caches of interned values were co-located, >>this may not be as big of an issue. Not being able to compare against >>compile-time constants may thwart some optimization opportunities, but >>that's less clear. I would like to see some demonstration of this. E.g., you can run this: echo -e '#include \nint main(int argc, char ** argv) { return strcmp(argv[0], "a"); }' | gcc -S -x c - -o - -O2 | less Looks to me like for a short, known-at-compile-time string, with optimization on, gcc implements it by basically sticking the string in a global variable and then using a pointer... (If I do argv[0] == (char *)0x1234, then it places the constant value directly into the instruction stream. Strangely enough, it does *not* inline the constant value even if I do memcmp(&argv[0], "\1\2\3\4", 4), which should be exactly equivalent...!) I think gcc is just as likely to stick a bunch of static void * interned_dd_to_d; static void * interned_ll_to_l; next to each other in the memory image as it is to stick a bunch of equivalent manifest constants. If you're worried, make it static void * interned_signatures[NUM_SIGNATURES] -- then they'll definitely be next to each other. >>It also requires coordination common repository, but I suppose one >>would just stick a set in some standard module (or leverage Python's >>interning). > > More problems: > > 1) It doesn't work well with multiple interpreter states. Ok, nothing works with that at the moment, but it is on the roadmap for Python and we should not make it worse. This isn't a criticism, but I'd like to see a reference to the work in this direction! My impression was that it's been on the roadmap for maybe a decade, in a really desultory fashion: http://docs.python.org/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock So if it's actually happening that's quite interesting. > You basically *need* a thread safe store separate from any python interpreter; though pythread.h does not rely on the interpreter state; which helps. Anyway, yes, if you can't rely on the interpreter than you'd need some place to store the intern table, but I'm not sure why this would be a problem (in Python 3.6 or whenever it becomes relevant). > 2) you end up with the known comparison values in read-write memory segments rather than readonly segments, which is probably worse on multicore systems? Is it? Can you elaborate? Cache ping-ponging is certainly bad, but that's when multiple cores are writing to the same cache line, I can't see how the TLB flags would matter. I guess the problem would be if you also have some other data in the global variable space that you write to constantly, and then it turned out they were placed next to these read-only comparison values in the same cache line? > I really think that anything that we can do to make this near-c-speed should be done; none of the proposals are *that* complicated. I agree, but I object to codifying the waving on dead chickens. :-) > Using keys, NumPy can in the C code choose to be slower but more readable; but using interned string forces cython to be slower, cython gets no way of choosing to go faster. (to the degree that it has an effect; none of these claims were checked) I think the only slowdown we know of is a few dict lookups at module load time. - N From greg.ewing at canterbury.ac.nz Sat Apr 14 02:24:58 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 14 Apr 2012 12:24:58 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: <4F88C3DA.6000009@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > 1) It doesn't work well with multiple interpreter states. Ok, nothing works > with that at the moment, but it is on the roadmap for Python Is it really? I got the impression that it's not considered feasible, since it would require massive changes to the entire implementation and totally break the existing C API. Has someone thought of a way around those problems? -- Greg From d.s.seljebotn at astro.uio.no Sat Apr 14 10:36:55 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 10:36:55 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> Message-ID: <7b6da5f0-8248-42f5-8263-b4cf9ebedf95@email.android.com> Nathaniel Smith wrote: >On Fri, Apr 13, 2012 at 11:22 PM, Dag Sverre Seljebotn > wrote: >> >> >> Robert Bradshaw wrote: >> >>>On Fri, Apr 13, 2012 at 2:24 PM, Nathaniel Smith >wrote: >>>> On Fri, Apr 13, 2012 at 9:27 PM, Dag Sverre Seljebotn >>>> wrote: >>>>> Ah, I didn't think about 6-bit or huffman. Certainly helps. >>>>> >>>>> I'm almost +1 on your proposal now, but a couple of more ideas: >>>>> >>>>> 1) Let the key (the size_t) spill over to the next specialization >>>entry if >>>>> it is too large; and prepend that key with a continuation code >(two >>>size-ts >>>>> could together say "iii)-d\0\0" on 32 bit systems with 8bit >>>encoding, using >>>>> - as continuation). The key-based caller will expect a >continuation >>>if it >>>>> knows about the specialization, and the prepended char will >prevent >>>spurios >>>>> matches against the overspilled slot. >>>>> >>>>> We could even use the pointers for part of the continuation... >>>> >>>> I am really lost here. Why is any of this complicated encoding >stuff >>>> better than interning? Interning takes one line of code, is >>>incredibly >>>> cheap (one dict lookup per call site and function definition), and >it >>>> lets you check any possible signature (even complicated ones >>>involving >>>> memoryviews) by doing a single-word comparison. And best of all, >you >>>> don't have to think hard to make sure you got the encoding right. >;-) >>>> >>>> On a 32-bit system, pointers are smaller than a size_t, but more >>>> expressive! You can still do binary search if you want, etc. Is the >>>> problem just that interning requires a runtime calculation? Because >I >>>> feel like C users (like numpy) will want to compute these >compressed >>>> codes at module-init anyway, and those of us with a fancy compiler >>>> capable of computing them ahead of time (like Cython) can instruct >>>> that fancy compiler to compute them at module-init time just as >>>> easily? >>> >>>Good question. >>> >>>The primary disadvantage of interning that I see is memory locality. >I >>>suppose if all the C-level caches of interned values were co-located, >>>this may not be as big of an issue. Not being able to compare against >>>compile-time constants may thwart some optimization opportunities, >but >>>that's less clear. > >I would like to see some demonstration of this. E.g., you can run this: > >echo -e '#include \nint main(int argc, char ** argv) { >return strcmp(argv[0], "a"); }' | gcc -S -x c - -o - -O2 | less > >Looks to me like for a short, known-at-compile-time string, with >optimization on, gcc implements it by basically sticking the string in >a global variable and then using a pointer... (If I do argv[0] == >(char *)0x1234, then it places the constant value directly into the >instruction stream. Strangely enough, it does *not* inline the >constant value even if I do memcmp(&argv[0], "\1\2\3\4", 4), which >should be exactly equivalent...!) Right. So: - With keys you have the *option* of hardcoding them, and then they will be in the instruction stream (rather than the instruction stream containing, essentially, a pointer to the key). - With interned, you always have a pointer you must dereference in the instruction stream. > >I think gcc is just as likely to stick a bunch of > static void * interned_dd_to_d; > static void * interned_ll_to_l; >next to each other in the memory image as it is to stick a bunch of >equivalent manifest constants. If you're worried, make it static void >* interned_signatures[NUM_SIGNATURES] -- then they'll definitely be >next to each other. > >>>It also requires coordination common repository, but I suppose one >>>would just stick a set in some standard module (or leverage Python's >>>interning). >> >> More problems: >> >> 1) It doesn't work well with multiple interpreter states. Ok, nothing >works with that at the moment, but it is on the roadmap for Python and >we should not make it worse. > >This isn't a criticism, but I'd like to see a reference to the work in >this direction! My impression was that it's been on the roadmap for >maybe a decade, in a really desultory fashion: >http://docs.python.org/faq/library.html#can-t-we-get-rid-of-the-global-interpreter-lock >So if it's actually happening that's quite interesting. I wasn't referring to the GIL, but multiple interpreters (where objects from one cannot be used in another). PEP3121 mentions it as one of the things it prepares for. Perhaps that didn't go anywhere, I don't really know. > >> You basically *need* a thread safe store separate from any python >interpreter; though pythread.h does not rely on the interpreter state; >which helps. > >Anyway, yes, if you can't rely on the interpreter than you'd need some >place to store the intern table, but I'm not sure why this would be a >problem (in Python 3.6 or whenever it becomes relevant). > >> 2) you end up with the known comparison values in read-write memory >segments rather than readonly segments, which is probably worse on >multicore systems? > >Is it? Can you elaborate? Cache ping-ponging is certainly bad, but >that's when multiple cores are writing to the same cache line, I can't >see how the TLB flags would matter. > >I guess the problem would be if you also have some other data in the >global variable space that you write to constantly, and then it turned >out they were placed next to these read-only comparison values in the >same cache line? You may be right, my understanding of this is actually too vague. Anyway, if the constant ends up in the instruction stream it is at least one less register load from data cache with the key approach? Dag > >> I really think that anything that we can do to make this near-c-speed >should be done; none of the proposals are *that* complicated. > >I agree, but I object to codifying the waving on dead chickens. :-) > >> Using keys, NumPy can in the C code choose to be slower but more >readable; but using interned string forces cython to be slower, cython >gets no way of choosing to go faster. (to the degree that it has an >effect; none of these claims were checked) > >I think the only slowdown we know of is a few dict lookups at module >load time. > >- N >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Sat Apr 14 10:41:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 10:41:28 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F88C3DA.6000009@canterbury.ac.nz> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> Message-ID: <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> Greg Ewing wrote: >Dag Sverre Seljebotn wrote: > >> 1) It doesn't work well with multiple interpreter states. Ok, nothing >works >> with that at the moment, but it is on the roadmap for Python > >Is it really? I got the impression that it's not considered feasible, >since it would require massive changes to the entire implementation >and totally break the existing C API. Has someone thought of a way >around those problems? I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... You know this a lot better than me. Dag > >-- >Greg >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From stefan_ml at behnel.de Sat Apr 14 10:56:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 10:56:44 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> Message-ID: <4F893BCC.7040706@behnel.de> Dag Sverre Seljebotn, 14.04.2012 10:41: > Greg Ewing wrote: >> Dag Sverre Seljebotn wrote: >> >>> 1) It doesn't work well with multiple interpreter states. Ok, nothing >>> works with that at the moment, but it is on the roadmap for Python >> >> Is it really? I got the impression that it's not considered feasible, >> since it would require massive changes to the entire implementation >> and totally break the existing C API. Has someone thought of a way >> around those problems? > > I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... IIRC, the last status was that even after this PEP, Py3 still has serious issues with keeping extension modules in separate interpreters. And this probably isn't worth doing anything about because it won't work without a major effort in all sorts of places. And I never heard that any extension module even tried to support this. I don't think we should invest too much thought into this direction. Stefan From arfrever.fta at gmail.com Sat Apr 14 12:16:04 2012 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Sat, 14 Apr 2012 12:16:04 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: <201204141216.05543.Arfrever.FTA@gmail.com> 2012-04-12 16:38:37 mark florisson napisa?(a): > Yet another release candidate, this will hopefully be the last before > the 0.16 release. You can grab it from here: > http://wiki.cython.org/ReleaseNotes-0.16 > > There were several fixes for the numpy attribute rewrite, memoryviews > and fused types. Accessing the 'base' attribute of a typed ndarray now > goes through the object layer, which means direct assignment is no > longer supported. > > If there are any problems, please let us know. 4 tests still fail with Python 3.2 (currently 3.2.3). All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. Failures with Python 3.2: ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte ====================================================================== FAIL: NestedWith (withstat) Doctest: withstat.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat.cpython-32.so", line ?, in withstat.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.cpp:5574) File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:8101) File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7989) File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.cpp:7838) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 24: invalid continuation byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.c:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.c:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte ====================================================================== FAIL: NestedWith (withstat_py) Doctest: withstat_py.NestedWith ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for withstat_py.NestedWith File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat_py.cpython-32.so", line unknown line number, in NestedWith ---------------------------------------------------------------------- File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/cpp/withstat_py.cpython-32.so", line ?, in withstat_py.NestedWith Failed example: NestedWith().runTest() Exception raised: Traceback (most recent call last): File "/usr/lib64/python3.2/doctest.py", line 1288, in __run compileflags, 1), test.globs) File "", line 1, in NestedWith().runTest() File "withstat_py.py", line 250, in withstat_py.NestedWith.runTest (withstat_py.cpp:7262) File "withstat_py.py", line 289, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9789) File "withstat_py.py", line 290, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9677) File "withstat_py.py", line 291, in withstat_py.NestedWith.testEnterReturnsTuple (withstat_py.cpp:9526) File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func DeprecationWarning, 2) File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning file.write(formatwarning(message, category, filename, lineno, line)) File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning line = linecache.getline(filename, lineno) if line is None else line File "/usr/lib64/python3.2/linecache.py", line 15, in getline lines = getlines(filename, module_globals) File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines return self.save_linecache_getlines(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 41, in getlines return updatecache(filename, module_globals) File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache lines = fp.readlines() File "/usr/lib64/python3.2/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 24: invalid start byte ---------------------------------------------------------------------- Ran 6485 tests in 2413.255s FAILED (failures=4) ALL DONE -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: This is a digitally signed message part. URL: From markflorisson88 at gmail.com Sat Apr 14 12:46:23 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 11:46:23 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: On 12 April 2012 22:00, Wes McKinney wrote: > On Thu, Apr 12, 2012 at 10:38 AM, mark florisson > wrote: >> Yet another release candidate, this will hopefully be the last before >> the 0.16 release. You can grab it from here: >> http://wiki.cython.org/ReleaseNotes-0.16 >> >> There were several fixes for the numpy attribute rewrite, memoryviews >> and fused types. Accessing the 'base' attribute of a typed ndarray now >> goes through the object layer, which means direct assignment is no >> longer supported. >> >> If there are any problems, please let us know. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > I'm unable to build pandas using git master Cython. I just released > pandas 0.7.3 today which has no issues at all with 0.15.1: > > http://pypi.python.org/pypi/pandas > > For example: > > 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace > running build_ext > cythoning pandas/src/tseries.pyx to pandas/src/tseries.c > > Error compiling Cython file: > ------------------------------------------------------------ > ... > ? ? ? ?self.store = {} > > ? ? ? ?ptr = malloc(self.depth * sizeof(int32_t*)) > > ? ? ? ?for i in range(self.depth): > ? ? ? ? ? ?ptr[i] = ( label_arrays[i]).data > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ > ------------------------------------------------------------ > > pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform > > ModuleNode.body = StatListNode(tseries.pyx:1:0) > StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) > StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, > ? ?as_name = u'MultiMap', > ? ?class_name = u'MultiMap', > ? ?doc = u'\n ? ?Need to come up with a better data structure for > multi-level indexing\n ? ?', > ? ?module_name = u'', > ? ?visibility = u'private') > CClassDefNode.body = StatListNode(tseries.pyx:91:4) > StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) > StatListNode.stats[0] = DefNode(tseries.pyx:95:4, > ? ?modifiers = [...]/0, > ? ?name = u'__init__', > ? ?num_required_args = 2, > ? ?py_wrapper_required = True, > ? ?reqd_kw_flags_cname = '0', > ? ?used = True) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:96:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:106:8) > File 'Nodes.py', line 5903, in analyse_expressions: > ForInStatNode(tseries.pyx:106:8) > File 'Nodes.py', line 342, in analyse_expressions: > StatListNode(tseries.pyx:107:21) > File 'Nodes.py', line 4767, in analyse_expressions: > SingleAssignmentNode(tseries.pyx:107:21) > File 'Nodes.py', line 4872, in analyse_types: > SingleAssignmentNode(tseries.pyx:107:21) > File 'ExprNodes.py', line 7082, in analyse_types: > TypecastNode(tseries.pyx:107:21, > ? ?result_is_used = True, > ? ?use_managed_ref = True) > File 'ExprNodes.py', line 4274, in analyse_types: > AttributeNode(tseries.pyx:107:59, > ? ?attribute = u'data', > ? ?initialized_check = True, > ? ?is_attribute = 1, > ? ?member = u'data', > ? ?needs_none_check = True, > ? ?op = '->', > ? ?result_is_used = True, > ? ?use_managed_ref = True) > File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: > AttributeNode(tseries.pyx:107:59, > ? ?attribute = u'data', > ? ?initialized_check = True, > ? ?is_attribute = 1, > ? ?member = u'data', > ? ?needs_none_check = True, > ? ?op = '->', > ? ?result_is_used = True, > ? ?use_managed_ref = True) > File 'ExprNodes.py', line 4436, in analyse_attribute: > AttributeNode(tseries.pyx:107:59, > ? ?attribute = u'data', > ? ?initialized_check = True, > ? ?is_attribute = 1, > ? ?member = u'data', > ? ?needs_none_check = True, > ? ?op = '->', > ? ?result_is_used = True, > ? ?use_managed_ref = True) > > Compiler crash traceback from this point on: > ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", > line 4436, in analyse_attribute > ? ?replacement_node = numpy_transform_attribute_node(self) > ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", > line 18, in numpy_transform_attribute_node > ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope > AttributeError: 'TypecastNode' object has no attribute 'entry' > building 'pandas._tseries' extension > creating build > creating build/temp.linux-x86_64-2.7 > creating build/temp.linux-x86_64-2.7/pandas > creating build/temp.linux-x86_64-2.7/pandas/src > gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC > -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include > -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o > build/temp.linux-x86_64-2.7/pandas/src/tseries.o > pandas/src/tseries.c:1:2: error: #error Do not use this file, it is > the result of a failed Cython compilation. > error: command 'gcc' failed with exit status 1 > > > ----- > > I kludged this particular line in the pandas/timeseries branch so it > will build on git master Cython, but I was treated to dozens of > failures, errors, and finally a segfault in the middle of the test > suite. Suffice to say I'm not sure I would advise you to release the > library in its current state until all of this is resolved. Happy to > help however I can but I'm back to 0.15.1 for now. > > - Wes > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel It seems that the numpy stopgap solution broke something in Pandas, I'm not sure what or how, but it leads to segfaults where code is trying to retrieve objects from a numpy array that are NULL. I tried disabling the numpy rewrites which unbreaks this with the cython release branch, so I think we should do another RC either with the attribute rewrite disabled or fixed. Dag, do you know what could have been broken by this fix that could lead to these results? From stefan_ml at behnel.de Sat Apr 14 13:00:17 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 13:00:17 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <201204141216.05543.Arfrever.FTA@gmail.com> References: <201204141216.05543.Arfrever.FTA@gmail.com> Message-ID: <4F8958C1.40404@behnel.de> Arfrever Frehtes Taifersar Arahesis, 14.04.2012 12:16: > 4 tests still fail with Python 3.2 (currently 3.2.3). > All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. Thanks for the report. > Failures with Python 3.2: > > ====================================================================== > FAIL: NestedWith (withstat) > Doctest: withstat.NestedWith > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest > raise self.failureException(self.format_failure(new.getvalue())) > AssertionError: Failed doctest test for withstat.NestedWith > File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith > > ---------------------------------------------------------------------- > File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith > Failed example: > NestedWith().runTest() > Exception raised: > Traceback (most recent call last): > File "/usr/lib64/python3.2/doctest.py", line 1288, in __run > compileflags, 1), test.globs) > File "", line 1, in > NestedWith().runTest() > File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) > File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) > File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) > File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) > File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func > DeprecationWarning, 2) > File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning > file.write(formatwarning(message, category, filename, lineno, line)) > File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning > line = linecache.getline(filename, lineno) if line is None else line > File "/usr/lib64/python3.2/linecache.py", line 15, in getline > lines = getlines(filename, module_globals) > File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines > return self.save_linecache_getlines(filename, module_globals) > File "/usr/lib64/python3.2/linecache.py", line 41, in getlines > return updatecache(filename, module_globals) > File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache > lines = fp.readlines() > File "/usr/lib64/python3.2/codecs.py", line 300, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte This looks like it's trying to print a DeprecationWarning because of some unittest related problem and fails to format the message for it. Doesn't look Cython related, but I'll see if I can find out something about this. Stefan From markflorisson88 at gmail.com Sat Apr 14 13:02:01 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 12:02:01 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <4F8958C1.40404@behnel.de> References: <201204141216.05543.Arfrever.FTA@gmail.com> <4F8958C1.40404@behnel.de> Message-ID: On 14 April 2012 12:00, Stefan Behnel wrote: > Arfrever Frehtes Taifersar Arahesis, 14.04.2012 12:16: >> 4 tests still fail with Python 3.2 (currently 3.2.3). >> All tests pass with Python 2.6.8, 2.7.3 and 3.1.5. > > Thanks for the report. > Indeed, I just pushed a fix here: https://github.com/markflorisson88/cython/tree/release Afrever, could you retry running this tests, i.e. python runtests.py -vv 'run\.withstat' Thanks for the help! >> Failures with Python 3.2: >> >> ====================================================================== >> FAIL: NestedWith (withstat) >> Doctest: withstat.NestedWith >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ? File "/usr/lib64/python3.2/doctest.py", line 2153, in runTest >> ? ? raise self.failureException(self.format_failure(new.getvalue())) >> AssertionError: Failed doctest test for withstat.NestedWith >> ? File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line unknown line number, in NestedWith >> >> ---------------------------------------------------------------------- >> File "/var/tmp/portage/dev-python/cython-0.16_rc1/work/Cython-0.16rc1/tests-3.2/run/c/withstat.cpython-32.so", line ?, in withstat.NestedWith >> Failed example: >> ? ? NestedWith().runTest() >> Exception raised: >> ? ? Traceback (most recent call last): >> ? ? ? File "/usr/lib64/python3.2/doctest.py", line 1288, in __run >> ? ? ? ? compileflags, 1), test.globs) >> ? ? ? File "", line 1, in >> ? ? ? ? NestedWith().runTest() >> ? ? ? File "withstat.pyx", line 183, in withstat.NestedWith.runTest (withstat.c:5574) >> ? ? ? File "withstat.pyx", line 222, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:8101) >> ? ? ? File "withstat.pyx", line 223, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7989) >> ? ? ? File "withstat.pyx", line 224, in withstat.NestedWith.testEnterReturnsTuple (withstat.c:7838) >> ? ? ? File "/usr/lib64/python3.2/unittest/case.py", line 1169, in deprecated_func >> ? ? ? ? DeprecationWarning, 2) >> ? ? ? File "/usr/lib64/python3.2/warnings.py", line 18, in showwarning >> ? ? ? ? file.write(formatwarning(message, category, filename, lineno, line)) >> ? ? ? File "/usr/lib64/python3.2/warnings.py", line 25, in formatwarning >> ? ? ? ? line = linecache.getline(filename, lineno) if line is None else line >> ? ? ? File "/usr/lib64/python3.2/linecache.py", line 15, in getline >> ? ? ? ? lines = getlines(filename, module_globals) >> ? ? ? File "/usr/lib64/python3.2/doctest.py", line 1372, in __patched_linecache_getlines >> ? ? ? ? return self.save_linecache_getlines(filename, module_globals) >> ? ? ? File "/usr/lib64/python3.2/linecache.py", line 41, in getlines >> ? ? ? ? return updatecache(filename, module_globals) >> ? ? ? File "/usr/lib64/python3.2/linecache.py", line 127, in updatecache >> ? ? ? ? lines = fp.readlines() >> ? ? ? File "/usr/lib64/python3.2/codecs.py", line 300, in decode >> ? ? ? ? (result, consumed) = self._buffer_decode(data, self.errors, final) >> ? ? UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 40: invalid start byte > > This looks like it's trying to print a DeprecationWarning because of some > unittest related problem and fails to format the message for it. Doesn't > look Cython related, but I'll see if I can find out something about this. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sat Apr 14 15:06:37 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 15:06:37 +0200 Subject: [Cython] Fwd: sage.math cluster OFF until about 9:30am. In-Reply-To: References: Message-ID: <4F89765D.8070603@behnel.de> -------- Original-Message -------- From: William Stein (wstein a gmail.com) Hi, As previously announced a few times, the sage.math cluster is OFF, due to a electrical work that is being done in the building that houses the server room. I expect the machines to be off until about 9:30am. Obviously, anything that runs on those machines -- including http://sagenb.org, http://sagemath.org, etc. -- is off. I don't expect major havoc in getting things back up, since I had a chance to properly shut down all the machines. -- William From d.s.seljebotn at astro.uio.no Sat Apr 14 15:57:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 15:57:28 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: <4F898248.5030105@astro.uio.no> On 04/14/2012 12:46 PM, mark florisson wrote: > On 12 April 2012 22:00, Wes McKinney wrote: >> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >> wrote: >>> Yet another release candidate, this will hopefully be the last before >>> the 0.16 release. You can grab it from here: >>> http://wiki.cython.org/ReleaseNotes-0.16 >>> >>> There were several fixes for the numpy attribute rewrite, memoryviews >>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>> goes through the object layer, which means direct assignment is no >>> longer supported. >>> >>> If there are any problems, please let us know. >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> I'm unable to build pandas using git master Cython. I just released >> pandas 0.7.3 today which has no issues at all with 0.15.1: >> >> http://pypi.python.org/pypi/pandas >> >> For example: >> >> 16:57 ~/code/pandas (master)$ python setup.py build_ext --inplace >> running build_ext >> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >> >> Error compiling Cython file: >> ------------------------------------------------------------ >> ... >> self.store = {} >> >> ptr = malloc(self.depth * sizeof(int32_t*)) >> >> for i in range(self.depth): >> ptr[i] = ( label_arrays[i]).data >> ^ >> ------------------------------------------------------------ >> >> pandas/src/tseries.pyx:107:59: Compiler crash in AnalyseExpressionsTransform >> >> ModuleNode.body = StatListNode(tseries.pyx:1:0) >> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >> as_name = u'MultiMap', >> class_name = u'MultiMap', >> doc = u'\n Need to come up with a better data structure for >> multi-level indexing\n ', >> module_name = u'', >> visibility = u'private') >> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >> modifiers = [...]/0, >> name = u'__init__', >> num_required_args = 2, >> py_wrapper_required = True, >> reqd_kw_flags_cname = '0', >> used = True) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:96:8) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:106:8) >> File 'Nodes.py', line 5903, in analyse_expressions: >> ForInStatNode(tseries.pyx:106:8) >> File 'Nodes.py', line 342, in analyse_expressions: >> StatListNode(tseries.pyx:107:21) >> File 'Nodes.py', line 4767, in analyse_expressions: >> SingleAssignmentNode(tseries.pyx:107:21) >> File 'Nodes.py', line 4872, in analyse_types: >> SingleAssignmentNode(tseries.pyx:107:21) >> File 'ExprNodes.py', line 7082, in analyse_types: >> TypecastNode(tseries.pyx:107:21, >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4274, in analyse_types: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> File 'ExprNodes.py', line 4436, in analyse_attribute: >> AttributeNode(tseries.pyx:107:59, >> attribute = u'data', >> initialized_check = True, >> is_attribute = 1, >> member = u'data', >> needs_none_check = True, >> op = '->', >> result_is_used = True, >> use_managed_ref = True) >> >> Compiler crash traceback from this point on: >> File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >> line 4436, in analyse_attribute >> replacement_node = numpy_transform_attribute_node(self) >> File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >> line 18, in numpy_transform_attribute_node >> numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >> AttributeError: 'TypecastNode' object has no attribute 'entry' >> building 'pandas._tseries' extension >> creating build >> creating build/temp.linux-x86_64-2.7 >> creating build/temp.linux-x86_64-2.7/pandas >> creating build/temp.linux-x86_64-2.7/pandas/src >> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >> the result of a failed Cython compilation. >> error: command 'gcc' failed with exit status 1 >> >> >> ----- >> >> I kludged this particular line in the pandas/timeseries branch so it >> will build on git master Cython, but I was treated to dozens of >> failures, errors, and finally a segfault in the middle of the test >> suite. Suffice to say I'm not sure I would advise you to release the >> library in its current state until all of this is resolved. Happy to >> help however I can but I'm back to 0.15.1 for now. >> >> - Wes >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > It seems that the numpy stopgap solution broke something in Pandas, > I'm not sure what or how, but it leads to segfaults where code is > trying to retrieve objects from a numpy array that are NULL. I tried > disabling the numpy rewrites which unbreaks this with the cython > release branch, so I think we should do another RC either with the > attribute rewrite disabled or fixed. > > Dag, do you know what could have been broken by this fix that could > lead to these results? I can't imagine what causes a change like you say... one thing that could cause a segfault is that technically we should now call import_array in every module using numpy.pxd; while we don't do that. If a NumPy version is used where PyArray_DATA or similar is not a macro, you would segfault....that should be fixed... Dag From stefan_ml at behnel.de Sat Apr 14 16:18:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 16:18:18 +0200 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: <201204141216.05543.Arfrever.FTA@gmail.com> <4F8958C1.40404@behnel.de> Message-ID: <4F89872A.9040200@behnel.de> mark florisson, 14.04.2012 13:02: > I just pushed a fix here: > https://github.com/markflorisson88/cython/tree/release Note that I had already pushed a couple of other fixes into the release branch of the main repo. Stefan From markflorisson88 at gmail.com Sat Apr 14 17:32:31 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 16:32:31 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: <4F898248.5030105@astro.uio.no> References: <4F898248.5030105@astro.uio.no> Message-ID: On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: > On 04/14/2012 12:46 PM, mark florisson wrote: >> >> On 12 April 2012 22:00, Wes McKinney ?wrote: >>> >>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>> ?wrote: >>>> >>>> Yet another release candidate, this will hopefully be the last before >>>> the 0.16 release. You can grab it from here: >>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>> >>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>> goes through the object layer, which means direct assignment is no >>>> longer supported. >>>> >>>> If there are any problems, please let us know. >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> I'm unable to build pandas using git master Cython. I just released >>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>> >>> http://pypi.python.org/pypi/pandas >>> >>> For example: >>> >>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>> running build_ext >>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>> >>> Error compiling Cython file: >>> ------------------------------------------------------------ >>> ... >>> ? ? ? ?self.store = {} >>> >>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>> >>> ? ? ? ?for i in range(self.depth): >>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>> ------------------------------------------------------------ >>> >>> pandas/src/tseries.pyx:107:59: Compiler crash in >>> AnalyseExpressionsTransform >>> >>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>> ? ?as_name = u'MultiMap', >>> ? ?class_name = u'MultiMap', >>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>> multi-level indexing\n ? ?', >>> ? ?module_name = u'', >>> ? ?visibility = u'private') >>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>> ? ?modifiers = [...]/0, >>> ? ?name = u'__init__', >>> ? ?num_required_args = 2, >>> ? ?py_wrapper_required = True, >>> ? ?reqd_kw_flags_cname = '0', >>> ? ?used = True) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:96:8) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:106:8) >>> File 'Nodes.py', line 5903, in analyse_expressions: >>> ForInStatNode(tseries.pyx:106:8) >>> File 'Nodes.py', line 342, in analyse_expressions: >>> StatListNode(tseries.pyx:107:21) >>> File 'Nodes.py', line 4767, in analyse_expressions: >>> SingleAssignmentNode(tseries.pyx:107:21) >>> File 'Nodes.py', line 4872, in analyse_types: >>> SingleAssignmentNode(tseries.pyx:107:21) >>> File 'ExprNodes.py', line 7082, in analyse_types: >>> TypecastNode(tseries.pyx:107:21, >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> File 'ExprNodes.py', line 4274, in analyse_types: >>> AttributeNode(tseries.pyx:107:59, >>> ? ?attribute = u'data', >>> ? ?initialized_check = True, >>> ? ?is_attribute = 1, >>> ? ?member = u'data', >>> ? ?needs_none_check = True, >>> ? ?op = '->', >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>> AttributeNode(tseries.pyx:107:59, >>> ? ?attribute = u'data', >>> ? ?initialized_check = True, >>> ? ?is_attribute = 1, >>> ? ?member = u'data', >>> ? ?needs_none_check = True, >>> ? ?op = '->', >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>> AttributeNode(tseries.pyx:107:59, >>> ? ?attribute = u'data', >>> ? ?initialized_check = True, >>> ? ?is_attribute = 1, >>> ? ?member = u'data', >>> ? ?needs_none_check = True, >>> ? ?op = '->', >>> ? ?result_is_used = True, >>> ? ?use_managed_ref = True) >>> >>> Compiler crash traceback from this point on: >>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>> line 4436, in analyse_attribute >>> ? ?replacement_node = numpy_transform_attribute_node(self) >>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>> line 18, in numpy_transform_attribute_node >>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>> building 'pandas._tseries' extension >>> creating build >>> creating build/temp.linux-x86_64-2.7 >>> creating build/temp.linux-x86_64-2.7/pandas >>> creating build/temp.linux-x86_64-2.7/pandas/src >>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>> the result of a failed Cython compilation. >>> error: command 'gcc' failed with exit status 1 >>> >>> >>> ----- >>> >>> I kludged this particular line in the pandas/timeseries branch so it >>> will build on git master Cython, but I was treated to dozens of >>> failures, errors, and finally a segfault in the middle of the test >>> suite. Suffice to say I'm not sure I would advise you to release the >>> library in its current state until all of this is resolved. Happy to >>> help however I can but I'm back to 0.15.1 for now. >>> >>> - Wes >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> >> It seems that the numpy stopgap solution broke something in Pandas, >> I'm not sure what or how, but it leads to segfaults where code is >> trying to retrieve objects from a numpy array that are NULL. I tried >> disabling the numpy rewrites which unbreaks this with the cython >> release branch, so I think we should do another RC either with the >> attribute rewrite disabled or fixed. >> >> Dag, do you know what could have been broken by this fix that could >> lead to these results? > > > I can't imagine what causes a change like you say... one thing that could > cause a segfault is that technically we should now call import_array in > every module using numpy.pxd; while we don't do that. If a NumPy version is > used where PyArray_DATA or similar is not a macro, you would > segfault....that should be fixed... > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Yeah that makes sense, but the thing is that pandas is already calling import_array everywhere, and the function calls themselves work, it's the result that's NULL. Now this could be a bug in pandas, but seeing that pandas works fine without the stopgap solution (that is, it doesn't pass all the tests but at least it doesn't segfault), I think it's something funky on our side. So I suppose I'll disable the fix for 0.16, and we can try to fix it for the next release. From robertwb at gmail.com Sat Apr 14 20:10:08 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 11:10:08 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F893BCC.7040706@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> Message-ID: On Sat, Apr 14, 2012 at 1:56 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 14.04.2012 10:41: >> Greg Ewing wrote: >>> Dag Sverre Seljebotn wrote: >>> >>>> 1) It doesn't work well with multiple interpreter states. Ok, nothing >>>> works with that at the moment, but it is on the roadmap for Python >>> >>> Is it really? I got the impression that it's not considered feasible, >>> since it would require massive changes to the entire implementation >>> and totally break the existing C API. Has someone thought of a way >>> around those problems? >> >> I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... > > IIRC, the last status was that even after this PEP, Py3 still has serious > issues with keeping extension modules in separate interpreters. And this > probably isn't worth doing anything about because it won't work without a > major effort in all sorts of places. And I never heard that any extension > module even tried to support this. > > I don't think we should invest too much thought into this direction. I had never even heard of this PEP before this thread, but this certainly seems reasonable to me. Aside from this, there is some value with the inlined signature in that a pure C library can easily support the ABI as well. Has anyone done any experiments/timings to see if having constants vs. globals even matters? - Robert From d.s.seljebotn at astro.uio.no Sat Apr 14 21:04:06 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 21:04:06 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> Message-ID: <4F89CA26.2080906@astro.uio.no> On 04/14/2012 08:10 PM, Robert Bradshaw wrote: > On Sat, Apr 14, 2012 at 1:56 AM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 14.04.2012 10:41: >>> Greg Ewing wrote: >>>> Dag Sverre Seljebotn wrote: >>>> >>>>> 1) It doesn't work well with multiple interpreter states. Ok, nothing >>>>> works with that at the moment, but it is on the roadmap for Python >>>> >>>> Is it really? I got the impression that it's not considered feasible, >>>> since it would require massive changes to the entire implementation >>>> and totally break the existing C API. Has someone thought of a way >>>> around those problems? >>> >>> I was just referring to the offhand comments in PEP3121, but I guess that PEP had multiple reasons, and perhaps this particular argu?ent had no significance... >> >> IIRC, the last status was that even after this PEP, Py3 still has serious >> issues with keeping extension modules in separate interpreters. And this >> probably isn't worth doing anything about because it won't work without a >> major effort in all sorts of places. And I never heard that any extension >> module even tried to support this. >> >> I don't think we should invest too much thought into this direction. A shame; short of getting rid of the GIL, multiple interpreter states would be my favourite shared-memory parallel computation approach, as they could share NumPy buffers (and other C-level structures), without worrying about allocating in process-shared memory (and most data structures, like std::map, won't even work (or, work portably and reliably) in process-shared memory anyway). Multiple seperate interpreter states would be a very nice way of getting the benefits of multi-threading without the disadvantages. > > I had never even heard of this PEP before this thread, but this > certainly seems reasonable to me. Aside from this, there is some value > with the inlined signature in that a pure C library can easily support > the ABI as well. Yes -- I think both "sides" of this discussion prefer their approach out of aesthetics more than performance :-) I'll post a revamped CEP in a minute to at least try to sum them up. > Has anyone done any experiments/timings to see if having constants vs. > globals even matters? It'd be interesting to see; won't have time myself until Monday earliest. Dag From d.s.seljebotn at astro.uio.no Sat Apr 14 21:08:13 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 21:08:13 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87530F.7050000@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> Message-ID: <4F89CB1D.6000109@astro.uio.no> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: > Travis Oliphant recently raised the issue on the NumPy list of what > mechanisms to use to box native functions produced by his Numba so that > SciPy functions can call it, e.g. (I'm making the numba part up): This thread is turning into one of those big ones... But I think it is really worth it in the end; I'm getting excited about the possibility down the road of importing functions using normal Python mechanisms and still have fast calls. Anyway, to organize discussion I've tried to revamp the CEP and describe both the intern-way and the strcmp-way. The wiki appears to be down, so I'll post it below... Dag = CEP 1000: Convention for native dispatches through Python callables = Many callable objects are simply wrappers around native code. This holds for any Cython function, f2py functions, manually written CPython extensions, Numba, etc. Obviously, when native code calls other native code, it would be nice to skip the significant cost of boxing and unboxing all the arguments. Early binding at compile-time is only possible between different Cython modules, not between all the tools listed above. [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects (and is out-of-date w.r.t. this CEP); this CEP is intended to be about a cross-project convention only. If a success, this CEP may be proposesd as a PEP in a modified form. Motivating example (looking a year or two into the future): {{{ @numba def f(x): return 2 * x @cython.inline def g(x : cython.double): return 3 * x from fortranmod import h print f(3) print g(3) print h(3) print scipy.integrate.quad(f, 0.2, 3) # fast callback! print scipy.integrate.quad(g, 0.2, 3) # fast callback! print scipy.integrate.quad(h, 0.2, 3) # fast callback! }}} == The native-call slot == We need ''fast'' access to probing whether a callable object supports this CEP. Other mechanisms, such as an attribute in a dict, is too slow for many purposes (quoting robertwb: "We're trying to get a 300ns dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, if you call a callable in a loop you can fetch the pointer outside of the loop. But in particular if this becomes a language feature in Cython it will be used in all sorts of places.) So we hack another type slot into existing and future CPython implementations in the following way: This CEP provides a C header that for all Python versions define a macro {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in {{{tp_flags}}} in the {{{PyTypeObject}}}. If present, then we extend {{{PyTypeObject}}} as follows: {{{ typedef struct { PyTypeObject tp_main; size_t tp_unofficial_flags; size_t tp_nativecall_offset; } PyUnofficialTypeObject; }}} {{{tp_unofficial_flags}}} is unused and should be all 0 for the time being, but can be used later to indicate features beyond this CEP. If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and the information for doing a native dispatch on a callable {{{obj}}} is located at {{{ (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; }}} === GIL-less accesss === It is OK to access the native-call table without holding the GIL. This should of course only be used to call functions that state in their signature that they don't need the GIL. This is important for JITted callables who would like to rewrite their table as more specializations gets added; if one needs to reallocate the table, the old table must linger along long enough that all threads that are currently accessing it are done with it. == Native dispatch descriptor == The final format for the descriptor is not agreed upon yet; this sums up the major alternatives. The descriptor should be a list of specializations/overload, each described by a function pointer and a signature specification string, such as "id)i" for {{{int f(int, double)}}}. The way it is stored must cater for two cases; first, when the caller expects one or more hard-coded signatures: {{{ if (obj has signature "id)i") { call; } else if (obj has signature "if)i") { call with promoted second argument; } else { box all arguments; PyObject_Call; } }}} The second is when a call stack is built dynamically while parsing the string. Since this has higher overhead anyway, optimizing for the first case makes sense. === Approach 1: Interning/run-time allocated IDs === 1A: Let each overload have a struct {{{ struct { size_t signature_id; char *signature; void *func_ptr; }; }}} Within each process run, there is a 1:1 between {{{signature}}} and {{{signature_id}}}. {{{signature_id}}} is allocated by some central registry. 1B: Intern the string instead: {{{ struct { char *signature; /* pointer must come from the central registry */ void *func_ptr; }; }}} However this is '''not'' trivial, since signature strings can be allocated on the heap (e.g., a JIT would do this), so interned strings must be memory managed and reference counted. This could be done by each object passing in the signature '''both''' when incref-ing and decref-ing the signature string in the interning machinery. Using Python {{{bytes}}} objects is another option. ==== Discussion ==== '''The cost of comparing a signature''': Comparing a global variable (needle) to a value that is guaranteed to already be in cache (candidate match) '''Pros:''' * Conceptually simple struct format. '''Cons:''' * Requires a registry for interning strings. This must be "handshaked" between the implementors of this CEP (probably by "first to get at {{{sys.modules["_nativecall"}}} sticks it there), as we can't ship a common dependency library for this CEP. === Approach 2: Efficient strcmp of verbatim signatures === The idea is to store the full signatures and the function pointers together in the same memory area, but still have some structure to allow for quick scanning through the list. Each entry has the structure {{{[signature_string, funcptr]}}} where: * The signature string has variable length, but the length is divisible by 8 bytes on all platforms. The {{{funcptr}}} is always 8 bytes (it is padded on 32-bit systems). * The total size of the entry should be divisible by 16 bytes (= the signature data should be 8 bytes, or 24 bytes, or...) * All but the first chunk of signature data should start with a continuation character "-", i.e. a really long signature string could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is inserted on all positions in the string divisible by 8, except the first. The point is that if you know a signature, you can quickly scan through the binary blob for the signature in 128 bit increments, without worrying about the variable size nature of each entry. The rules above protects against spurious matches. ==== Optional: Encoding ==== The strcmp approach can be made efficient for larger signatures by using a more efficient encoding than ASCII. E.g., an encoding could use 4 bits for the 12 most common symbols and 8 bits for 64 symbols (for a total of 78 symbols), of which some could be letter combinations ("Zd", "T{"). This should be reasonably simple to encode and decode. The CEP should provide C routines in a header file to work with the signatures. Callers that wish to parse the format string and build a call stack on the fly should probably work with the encoded representation. ==== Discussion ==== '''The cost of comparing a signature''': For the vast majority of functions, the cost is comparing a 64-bit number stored in the CPU instruction stream (needle) to a value that is guaranteed to already be in cache (candidate match). '''Pros:''' * Readability-wise, one can use the C switch statement to dispatch * "Stateless data", for compiled code it does not require any run-time initialization like interning does * One less pointer-dereference in the common case of a short signature '''Cons:''' * Long signatures will require more than 8 bytes to store and could thus be more expensive than interned strings * Format looks uglier in the form of literals in C source code == Signature strings == Example: The function {{{ int f(double x, float y); }}} would have the signature string {{{"df)i"}}} (or, to save space, {{{"idf"}}}). Fields would follow the PEP3118 extensions of the struct-module format string, but with some modifications: * The format should be canonical and fit for {{{strcmp}}}-like comparison: No whitespace, no field names (TBD: what else?) * TBD: Information about GIL requirements (nogil, with gil?), how exceptions are reported * TBD: Support for Cython-specific constructs like memoryview slices (so that arrays with strides and shape can be passed faster than passing an {{{"O"}}}). From markflorisson88 at gmail.com Sat Apr 14 23:00:26 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:00:26 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89CB1D.6000109@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so that >> SciPy functions can call it, e.g. (I'm making the numba part up): > > > > This thread is turning into one of those big ones... > > But I think it is really worth it in the end; I'm getting excited about the > possibility down the road of importing functions using normal Python > mechanisms and still have fast calls. > > Anyway, to organize discussion I've tried to revamp the CEP and describe > both the intern-way and the strcmp-way. > > The wiki appears to be down, so I'll post it below... > > Dag > > = CEP 1000: Convention for native dispatches through Python callables = > > Many callable objects are simply wrappers around native code. ?This > holds for any Cython function, f2py functions, manually > written CPython extensions, Numba, etc. > > Obviously, when native code calls other native code, it would be > nice to skip the significant cost of boxing and unboxing all the arguments. > Early binding at compile-time is only possible > between different Cython modules, not between all the tools > listed above. > > [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects > (and is out-of-date w.r.t. this CEP); this CEP is intended to be about > a cross-project convention only. If a success, this CEP may be > proposesd as a PEP in a modified form. > > Motivating example (looking a year or two into the future): > > {{{ > @numba > def f(x): return 2 * x > > @cython.inline > def g(x : cython.double): return 3 * x > > from fortranmod import h > > print f(3) > print g(3) > print h(3) > print scipy.integrate.quad(f, 0.2, 3) # fast callback! > print scipy.integrate.quad(g, 0.2, 3) # fast callback! > print scipy.integrate.quad(h, 0.2, 3) # fast callback! > > }}} > > == The native-call slot == > > We need ''fast'' access to probing whether a callable object supports > this CEP. ?Other mechanisms, such as an attribute in a dict, is too > slow for many purposes (quoting robertwb: "We're trying to get a 300ns > dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, > if you call a callable in a loop you can fetch the pointer outside > of the loop. But in particular if this becomes a language feature > in Cython it will be used in all sorts of places.) > > So we hack another type slot into existing and future CPython > implementations in the following way: This CEP provides a C header > that for all Python versions define a macro > {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in > {{{tp_flags}}} in the {{{PyTypeObject}}}. > > If present, then we extend {{{PyTypeObject}}} > as follows: > {{{ > typedef struct { > ? ?PyTypeObject tp_main; > ? ?size_t tp_unofficial_flags; > ? ?size_t tp_nativecall_offset; > } PyUnofficialTypeObject; > }}} > > {{{tp_unofficial_flags}}} is unused and should be all 0 for the time > being, but can be used later to indicate features beyond this CEP. > > If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and > the information for doing a native dispatch on a callable {{{obj}}} > is located at > {{{ > (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; > }}} > > === GIL-less accesss === > > It is OK to access the native-call table without holding the GIL. This > should of course only be used to call functions that state in their > signature that they don't need the GIL. > > This is important for JITted callables who would like to rewrite their > table as more specializations gets added; if one needs to reallocate > the table, the old table must linger along long enough that all > threads that are currently accessing it are done with it. > > == Native dispatch descriptor == > > The final format for the descriptor is not agreed upon yet; this sums > up the major alternatives. > > The descriptor should be a list of specializations/overload, each > described by a function pointer and a signature specification > string, such as "id)i" for {{{int f(int, double)}}}. > > The way it is stored must cater for two cases; first, when the caller > expects one or more hard-coded signatures: > {{{ > if (obj has signature "id)i") { > ? ?call; > } else if (obj has signature "if)i") { > ? ?call with promoted second argument; > } else { > ? ?box all arguments; > ? ?PyObject_Call; > } > }}} There may be a lot of promotion/demotion (you likely only want the former) combinations, especially for multiple arguments, so perhaps it makes sense to limit ourselves a bit. For instance for numeric scalar argument types we could limit to long (and the unsigned counterparts), double and double complex. So char, short and int scalars will be promoted to long, float to double and float complex to double complex. Anything bigger, like long long etc will be matched specifically. Promotions and associated demotions if necessary in the callee should be fairly cheap compared to checking all combinations or going through the python layer. > The second is when a call stack is built dynamically while parsing the > string. Since this has higher overhead anyway, optimizing for the first > case makes sense. > > === Approach 1: Interning/run-time allocated IDs === > > > 1A: Let each overload have a struct > {{{ > struct { > ? ?size_t signature_id; > ? ?char *signature; > ? ?void *func_ptr; > }; > }}} > Within each process run, there is a 1:1 between {{{signature}}} and > {{{signature_id}}}. {{{signature_id}}} is allocated by some central > registry. > > 1B: Intern the string instead: > {{{ > struct { > ? ?char *signature; /* pointer must come from the central registry */ > ? ?void *func_ptr; > }; > }}} > However this is '''not'' trivial, since signature strings can > be allocated on the heap (e.g., a JIT would do this), so interned strings > must be memory managed and reference counted. This could be done by > each object passing in the signature '''both''' when incref-ing and > decref-ing the signature string in the interning machinery. > Using Python {{{bytes}}} objects is another option. > > ==== Discussion ==== > > '''The cost of comparing a signature''': Comparing a global variable > (needle) > to a value that is guaranteed to already be in cache (candidate match) > > '''Pros:''' > > ?* Conceptually simple struct format. > > '''Cons:''' > > ?* Requires a registry for interning strings. This must be > ? "handshaked" between the implementors of this CEP (probably by > ? "first to get at {{{sys.modules["_nativecall"}}} sticks it there), > ? as we can't ship a common dependency library for this CEP. > > === Approach 2: Efficient strcmp of verbatim signatures === > > The idea is to store the full signatures and the function pointers together > in the same memory area, but still have some structure to allow for quick > scanning through the list. > > Each entry has the structure {{{[signature_string, funcptr]}}} > where: > > ?* The signature string has variable length, but the length is > ? divisible by 8 bytes on all platforms. The {{{funcptr}}} is always > ? 8 bytes (it is padded on 32-bit systems). > > ?* The total size of the entry should be divisible by 16 bytes (= the > ? signature data should be 8 bytes, or 24 bytes, or...) > > ?* All but the first chunk of signature data should start with a > ? continuation character "-", i.e. a really long signature string > ? could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is > ? inserted on all positions in the string divisible by 8, except the > ? first. > > The point is that if you know a signature, you can quickly scan > through the binary blob for the signature in 128 bit increments, > without worrying about the variable size nature of each entry. ?The > rules above protects against spurious matches. > > ==== Optional: Encoding ==== > > The strcmp approach can be made efficient for larger signatures by > using a more efficient encoding than ASCII. E.g., an encoding could > use 4 bits for the 12 most common symbols and 8 bits > for 64 symbols (for a total of 78 symbols), of which some could be > letter combinations ("Zd", "T{"). This should be reasonably simple > to encode and decode. > > The CEP should provide C routines in a header file to work with the > signatures. Callers that wish to parse the format string and build a > call stack on the fly should probably work with the encoded > representation. > > ==== Discussion ==== > > '''The cost of comparing a signature''': For the vast majority of > functions, the cost is comparing a 64-bit number stored in the CPU > instruction stream (needle) to a value that is guaranteed to already > be in cache (candidate match). > > '''Pros:''' > > ?* Readability-wise, one can use the C switch statement to dispatch > > ?* "Stateless data", for compiled code it does not require any > ? run-time initialization like interning does > > ?* One less pointer-dereference in the common case of a short > ? signature > > '''Cons:''' > > ?* Long signatures will require more than 8 bytes to store and could > ? thus be more expensive than interned strings > > ?* Format looks uglier in the form of literals in C source code > > > == Signature strings == > > Example: The function > {{{ > int f(double x, float y); > }}} > would have the signature string {{{"df)i"}}} (or, to save space, > {{{"idf"}}}). > > Fields would follow the PEP3118 extensions of the struct-module format > string, but with some modifications: > > ?* The format should be canonical and fit for {{{strcmp}}}-like > ? comparison: No whitespace, no field names (TBD: what else?) I think alignment is also a troublemaker. Maybe we should allow '@' (which cannot appear in the character string but will be the default, that is native size, alignment and byteorder) and '^', unaligned native size and byteorder (to be used for packed structs). > ?* TBD: Information about GIL requirements (nogil, with gil?), how > ? exceptions are reported Maybe that could be a separate list, to be consulted mostly for explicit casts (I think PyErr_Occurred() would be the default for non-object return types). > ?* TBD: Support for Cython-specific constructs like memoryview slices > ? (so that arrays with strides and shape can be passed faster than > ? passing an {{{"O"}}}). Definitely, maybe something simple like M{1f}, for a 1D memoryview slice of floats. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sat Apr 14 23:02:05 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 23:02:05 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89CB1D.6000109@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: <4F89E5CD.5060301@behnel.de> Hi, thanks for writing this up. Comments inline as I read through it. Dag Sverre Seljebotn, 14.04.2012 21:08: > === GIL-less accesss === > > It is OK to access the native-call table without holding the GIL. This > should of course only be used to call functions that state in their > signature that they don't need the GIL. > > This is important for JITted callables who would like to rewrite their > table as more specializations gets added; if one needs to reallocate > the table, the old table must linger along long enough that all > threads that are currently accessing it are done with it. The problem here is that changing the table in the face of threaded access is very likely to introduce race conditions, and the average library out there won't know when all threads are done with it. I don't think later modification is a good idea. > == Native dispatch descriptor == > > The final format for the descriptor is not agreed upon yet; this sums > up the major alternatives. > > The descriptor should be a list of specializations/overload While overloaded signatures are great for the callee, they make things much more complicated for the caller. It's no longer just one signature that either matches or not. Especially when we allow more than one expected signature, then each of them has to be compared against all exported signatures. We'll have to see what the runtime impact and the impact on the code complexity is, I guess. > each described by a function pointer and a signature specification > string, such as "id)i" for {{{int f(int, double)}}}. How do we deal with object argument types? Do we care on the caller side? Functions might have alternative signatures that differ in the type of their object parameters. Or should we handle this inside of the caller and expect that it's something like a fused function with internal dispatch in that case? Personally, I think there is not enough to gain from object parameters that we should handle it on the caller side. The callee can dispatch those if necessary. What about signatures that require an object when we have a C typed value? What about signatures that require a C typed argument when we have an arbitrary object value in our call parameters? We should also strip the "self" argument from the parameter list of methods. That's handled by the attribute lookup before even getting at the callable. > === Approach 1: Interning/run-time allocated IDs === > > > 1A: Let each overload have a struct > {{{ > struct { > size_t signature_id; > char *signature; > void *func_ptr; > }; > }}} > Within each process run, there is a 1:1 mapping/relation > between {{{signature}}} and > {{{signature_id}}}. {{{signature_id}}} is allocated by some central > registry. > > 1B: Intern the string instead: > {{{ > struct { > char *signature; /* pointer must come from the central registry */ > void *func_ptr; > }; > }}} > However this is '''not'' trivial, since signature strings can > be allocated on the heap (e.g., a JIT would do this), so interned strings > must be memory managed and reference counted. Not necessarily, they are really short strings that could just live forever, stored efficiently by the registry in a series of larger memory blocks. It would take a while to fill up enough memory with those to become problematic. Finding an efficiently lookup scheme for them might become interesting at some point, but that would also take a while. I don't expect real-world systems to have to deal with thousands of different runtime(!) discovered signatures during one interpreter lifetime. > ==== Discussion ==== > > '''The cost of comparing a signature''': Comparing a global variable (needle) > to a value that is guaranteed to already be in cache (candidate match) > > '''Pros:''' > > * Conceptually simple struct format. > > '''Cons:''' > > * Requires a registry for interning strings. This must be > "handshaked" between the implementors of this CEP (probably by > "first to get at {{{sys.modules["_nativecall"}}} sticks it there), > as we can't ship a common dependency library for this CEP. ... which would eventually end up in the stdlib, but could equally well come from PyPI for now. I don't see a problem with that. Using sys.modules (or another global store) instead of an explicit import allows for dependency injection, that's good. > === Approach 2: Efficient strcmp of verbatim signatures === > > The idea is to store the full signatures and the function pointers together > in the same memory area, but still have some structure to allow for quick > scanning through the list. > > Each entry has the structure {{{[signature_string, funcptr]}}} > where: > > * The signature string has variable length, but the length is > divisible by 8 bytes on all platforms. The {{{funcptr}}} is always > 8 bytes (it is padded on 32-bit systems). > > * The total size of the entry should be divisible by 16 bytes (= the > signature data should be 8 bytes, or 24 bytes, or...) > > * All but the first chunk of signature data should start with a > continuation character "-", i.e. a really long signature string > could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is > inserted on all positions in the string divisible by 8, except the > first. > > The point is that if you know a signature, you can quickly scan > through the binary blob for the signature in 128 bit increments, > without worrying about the variable size nature of each entry. The > rules above protects against spurious matches. Sounds pretty fast to me. Absolutely worth trying. And if we store the signature we compare against in the same format, we won't have to parse the signature string as such, we can really just compare the numeric values. Assuming that's really fast, that would allow the callee to optimistically export additional signatures, e.g. with compatible subtypes or easily coercible types, ordered by the expected overhead of processing the arguments (and the expected probability of being called), so that the caller would automatically hit the fastest call path first when traversing the list from start to end. The number of possible signatures would obviously explode at some point... Note that JITs could still be smart enough to avoid the traversal after a few loop iterations. One problem: if any of the call parameters is a plain object type, identity matches may not work anymore because we won't know what signature to expect. > ==== Optional: Encoding ==== > > The strcmp approach can be made efficient for larger signatures by > using a more efficient encoding than ASCII. E.g., an encoding could > use 4 bits for the 12 most common symbols and 8 bits > for 64 symbols (for a total of 78 symbols), of which some could be > letter combinations ("Zd", "T{"). This should be reasonably simple > to encode and decode. > > The CEP should provide C routines in a header file to work with the > signatures. Callers that wish to parse the format string and build a > call stack on the fly should probably work with the encoded > representation. Huffman codes can be processed bitwise from start to end, that would work. However, this would quickly die when we start adding arbitrary object types. That would require a global registry for user types again. A reason not to care about object types at the caller. Also, how do we encode struct/union argument types? > ==== Discussion ==== > > '''The cost of comparing a signature''': For the vast majority of > functions, the cost is comparing a 64-bit number stored in the CPU > instruction stream (needle) to a value that is guaranteed to already > be in cache (candidate match). > > '''Pros:''' > > * Readability-wise, one can use the C switch statement to dispatch > > * "Stateless data", for compiled code it does not require any > run-time initialization like interning does > > * One less pointer-dereference in the common case of a short > signature > > '''Cons:''' > > * Long signatures will require more than 8 bytes to store and could > thus be more expensive than interned strings We could also ignore trailing arguments and only dispatch based on a fixed number of first arguments. Callees with more arguments would then simply not export native signatures. > * Format looks uglier in the form of literals in C source code They are not meant for reading, and we can always generate a comment with a spelled-out readable signature next to it. > == Signature strings == > > Example: The function > {{{ > int f(double x, float y); > }}} > would have the signature string {{{"df)i"}}} (or, to save space, {{{"idf"}}}). > > Fields would follow the PEP3118 extensions of the struct-module format > string, but with some modifications: > > * The format should be canonical and fit for {{{strcmp}}}-like > comparison: No whitespace, no field names (TBD: what else?) > > * TBD: Information about GIL requirements (nogil, with gil?), how > exceptions are reported What about C++, including C++ exceptions? > * TBD: Support for Cython-specific constructs like memoryview slices > (so that arrays with strides and shape can be passed faster than > passing an {{{"O"}}}). Is this really Cython specific or would a generic Py_buffer struct work? Stefan From stefan_ml at behnel.de Sat Apr 14 23:06:13 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 14 Apr 2012 23:06:13 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: <4F89E6C5.5070007@behnel.de> mark florisson, 14.04.2012 23:00: > On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >> * TBD: Information about GIL requirements (nogil, with gil?), how >> exceptions are reported > > Maybe that could be a separate list, to be consulted mostly for > explicit casts (I think PyErr_Occurred() would be the default for > non-object return types). Good idea. We could have an additional "flags" field for each signature (or maybe just each callable?) that would contain orthogonal information about exception handling and GIL requirements. Stefan From wesmckinn at gmail.com Sat Apr 14 23:13:45 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 14 Apr 2012 17:13:45 -0400 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: <4F898248.5030105@astro.uio.no> Message-ID: On Sat, Apr 14, 2012 at 11:32 AM, mark florisson wrote: > On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: >> On 04/14/2012 12:46 PM, mark florisson wrote: >>> >>> On 12 April 2012 22:00, Wes McKinney ?wrote: >>>> >>>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>>> ?wrote: >>>>> >>>>> Yet another release candidate, this will hopefully be the last before >>>>> the 0.16 release. You can grab it from here: >>>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>>> >>>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>>> goes through the object layer, which means direct assignment is no >>>>> longer supported. >>>>> >>>>> If there are any problems, please let us know. >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> >>>> I'm unable to build pandas using git master Cython. I just released >>>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>>> >>>> http://pypi.python.org/pypi/pandas >>>> >>>> For example: >>>> >>>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>>> running build_ext >>>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>>> >>>> Error compiling Cython file: >>>> ------------------------------------------------------------ >>>> ... >>>> ? ? ? ?self.store = {} >>>> >>>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>>> >>>> ? ? ? ?for i in range(self.depth): >>>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>>> ------------------------------------------------------------ >>>> >>>> pandas/src/tseries.pyx:107:59: Compiler crash in >>>> AnalyseExpressionsTransform >>>> >>>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>>> ? ?as_name = u'MultiMap', >>>> ? ?class_name = u'MultiMap', >>>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>>> multi-level indexing\n ? ?', >>>> ? ?module_name = u'', >>>> ? ?visibility = u'private') >>>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>>> ? ?modifiers = [...]/0, >>>> ? ?name = u'__init__', >>>> ? ?num_required_args = 2, >>>> ? ?py_wrapper_required = True, >>>> ? ?reqd_kw_flags_cname = '0', >>>> ? ?used = True) >>>> File 'Nodes.py', line 342, in analyse_expressions: >>>> StatListNode(tseries.pyx:96:8) >>>> File 'Nodes.py', line 342, in analyse_expressions: >>>> StatListNode(tseries.pyx:106:8) >>>> File 'Nodes.py', line 5903, in analyse_expressions: >>>> ForInStatNode(tseries.pyx:106:8) >>>> File 'Nodes.py', line 342, in analyse_expressions: >>>> StatListNode(tseries.pyx:107:21) >>>> File 'Nodes.py', line 4767, in analyse_expressions: >>>> SingleAssignmentNode(tseries.pyx:107:21) >>>> File 'Nodes.py', line 4872, in analyse_types: >>>> SingleAssignmentNode(tseries.pyx:107:21) >>>> File 'ExprNodes.py', line 7082, in analyse_types: >>>> TypecastNode(tseries.pyx:107:21, >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> File 'ExprNodes.py', line 4274, in analyse_types: >>>> AttributeNode(tseries.pyx:107:59, >>>> ? ?attribute = u'data', >>>> ? ?initialized_check = True, >>>> ? ?is_attribute = 1, >>>> ? ?member = u'data', >>>> ? ?needs_none_check = True, >>>> ? ?op = '->', >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>>> AttributeNode(tseries.pyx:107:59, >>>> ? ?attribute = u'data', >>>> ? ?initialized_check = True, >>>> ? ?is_attribute = 1, >>>> ? ?member = u'data', >>>> ? ?needs_none_check = True, >>>> ? ?op = '->', >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>>> AttributeNode(tseries.pyx:107:59, >>>> ? ?attribute = u'data', >>>> ? ?initialized_check = True, >>>> ? ?is_attribute = 1, >>>> ? ?member = u'data', >>>> ? ?needs_none_check = True, >>>> ? ?op = '->', >>>> ? ?result_is_used = True, >>>> ? ?use_managed_ref = True) >>>> >>>> Compiler crash traceback from this point on: >>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>>> line 4436, in analyse_attribute >>>> ? ?replacement_node = numpy_transform_attribute_node(self) >>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>>> line 18, in numpy_transform_attribute_node >>>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>>> building 'pandas._tseries' extension >>>> creating build >>>> creating build/temp.linux-x86_64-2.7 >>>> creating build/temp.linux-x86_64-2.7/pandas >>>> creating build/temp.linux-x86_64-2.7/pandas/src >>>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>>> the result of a failed Cython compilation. >>>> error: command 'gcc' failed with exit status 1 >>>> >>>> >>>> ----- >>>> >>>> I kludged this particular line in the pandas/timeseries branch so it >>>> will build on git master Cython, but I was treated to dozens of >>>> failures, errors, and finally a segfault in the middle of the test >>>> suite. Suffice to say I'm not sure I would advise you to release the >>>> library in its current state until all of this is resolved. Happy to >>>> help however I can but I'm back to 0.15.1 for now. >>>> >>>> - Wes >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> >>> It seems that the numpy stopgap solution broke something in Pandas, >>> I'm not sure what or how, but it leads to segfaults where code is >>> trying to retrieve objects from a numpy array that are NULL. I tried >>> disabling the numpy rewrites which unbreaks this with the cython >>> release branch, so I think we should do another RC either with the >>> attribute rewrite disabled or fixed. >>> >>> Dag, do you know what could have been broken by this fix that could >>> lead to these results? >> >> >> I can't imagine what causes a change like you say... one thing that could >> cause a segfault is that technically we should now call import_array in >> every module using numpy.pxd; while we don't do that. If a NumPy version is >> used where PyArray_DATA or similar is not a macro, you would >> segfault....that should be fixed... >> >> Dag >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Yeah that makes sense, but the thing is that pandas is already calling > import_array everywhere, and the function calls themselves work, it's > the result that's NULL. Now this could be a bug in pandas, but seeing > that pandas works fine without the stopgap solution (that is, it > doesn't pass all the tests but at least it doesn't segfault), I think > it's something funky on our side. > > So I suppose I'll disable the fix for 0.16, and we can try to fix it > for the next release. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Where is the bug in pandas / bad memory access? Maybe something I can work around? From markflorisson88 at gmail.com Sat Apr 14 23:15:22 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:15:22 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89E5CD.5060301@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> Message-ID: On 14 April 2012 22:02, Stefan Behnel wrote: > Hi, > > thanks for writing this up. Comments inline as I read through it. > > Dag Sverre Seljebotn, 14.04.2012 21:08: >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. > > The problem here is that changing the table in the face of threaded access > is very likely to introduce race conditions, and the average library out > there won't know when all threads are done with it. I don't think later > modification is a good idea. > > >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload > > While overloaded signatures are great for the callee, they make things much > more complicated for the caller. It's no longer just one signature that > either matches or not. Especially when we allow more than one expected > signature, then each of them has to be compared against all exported > signatures. > > We'll have to see what the runtime impact and the impact on the code > complexity is, I guess. > > >> each described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. > > How do we deal with object argument types? Do we care on the caller side? > Functions might have alternative signatures that differ in the type of > their object parameters. Or should we handle this inside of the caller and > expect that it's something like a fused function with internal dispatch in > that case? > > Personally, I think there is not enough to gain from object parameters that > we should handle it on the caller side. The callee can dispatch those if > necessary. > > What about signatures that require an object when we have a C typed value? > > What about signatures that require a C typed argument when we have an > arbitrary object value in our call parameters? > > We should also strip the "self" argument from the parameter list of > methods. That's handled by the attribute lookup before even getting at the > callable. > > >> === Approach 1: Interning/run-time allocated IDs === >> >> >> 1A: Let each overload have a struct >> {{{ >> struct { >> ? ? size_t signature_id; >> ? ? char *signature; >> ? ? void *func_ptr; >> }; >> }}} >> Within each process run, there is a 1:1 > > mapping/relation > >> between {{{signature}}} and >> {{{signature_id}}}. {{{signature_id}}} is allocated by some central >> registry. >> >> 1B: Intern the string instead: >> {{{ >> struct { >> ? ? char *signature; /* pointer must come from the central registry */ >> ? ? void *func_ptr; >> }; >> }}} >> However this is '''not'' trivial, since signature strings can >> be allocated on the heap (e.g., a JIT would do this), so interned strings >> must be memory managed and reference counted. > > Not necessarily, they are really short strings that could just live > forever, stored efficiently by the registry in a series of larger memory > blocks. It would take a while to fill up enough memory with those to become > problematic. Finding an efficiently lookup scheme for them might become > interesting at some point, but that would also take a while. > > I don't expect real-world systems to have to deal with thousands of > different runtime(!) discovered signatures during one interpreter lifetime. > > >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': Comparing a global variable (needle) >> to a value that is guaranteed to already be in cache (candidate match) >> >> '''Pros:''' >> >> ?* Conceptually simple struct format. >> >> '''Cons:''' >> >> ?* Requires a registry for interning strings. This must be >> ? ?"handshaked" between the implementors of this CEP (probably by >> ? ?"first to get at {{{sys.modules["_nativecall"}}} sticks it there), >> ? ?as we can't ship a common dependency library for this CEP. > > ... which would eventually end up in the stdlib, but could equally well > come from PyPI for now. I don't see a problem with that. > > Using sys.modules (or another global store) instead of an explicit import > allows for dependency injection, that's good. > > >> === Approach 2: Efficient strcmp of verbatim signatures === >> >> The idea is to store the full signatures and the function pointers together >> in the same memory area, but still have some structure to allow for quick >> scanning through the list. >> >> Each entry has the structure {{{[signature_string, funcptr]}}} >> where: >> >> ?* The signature string has variable length, but the length is >> ? ?divisible by 8 bytes on all platforms. The {{{funcptr}}} is always >> ? ?8 bytes (it is padded on 32-bit systems). >> >> ?* The total size of the entry should be divisible by 16 bytes (= the >> ? ?signature data should be 8 bytes, or 24 bytes, or...) >> >> ?* All but the first chunk of signature data should start with a >> ? ?continuation character "-", i.e. a really long signature string >> ? ?could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is >> ? ?inserted on all positions in the string divisible by 8, except the >> ? ?first. >> >> The point is that if you know a signature, you can quickly scan >> through the binary blob for the signature in 128 bit increments, >> without worrying about the variable size nature of each entry. ?The >> rules above protects against spurious matches. > > Sounds pretty fast to me. Absolutely worth trying. And if we store the > signature we compare against in the same format, we won't have to parse the > signature string as such, we can really just compare the numeric values. > Assuming that's really fast, that would allow the callee to optimistically > export additional signatures, e.g. with compatible subtypes or easily > coercible types, ordered by the expected overhead of processing the > arguments (and the expected probability of being called), so that the > caller would automatically hit the fastest call path first when traversing > the list from start to end. The number of possible signatures would > obviously explode at some point... > > Note that JITs could still be smart enough to avoid the traversal after a > few loop iterations. > > One problem: if any of the call parameters is a plain object type, identity > matches may not work anymore because we won't know what signature to expect. > > >> ==== Optional: Encoding ==== >> >> The strcmp approach can be made efficient for larger signatures by >> using a more efficient encoding than ASCII. E.g., an encoding could >> use 4 bits for the 12 most common symbols and 8 bits >> for 64 symbols (for a total of 78 symbols), of which some could be >> letter combinations ("Zd", "T{"). This should be reasonably simple >> to encode and decode. >> >> The CEP should provide C routines in a header file to work with the >> signatures. Callers that wish to parse the format string and build a >> call stack on the fly should probably work with the encoded >> representation. > > Huffman codes can be processed bitwise from start to end, that would work. > > However, this would quickly die when we start adding arbitrary object > types. That would require a global registry for user types again. A reason > not to care about object types at the caller. > > Also, how do we encode struct/union argument types? > > >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': For the vast majority of >> functions, the cost is comparing a 64-bit number stored in the CPU >> instruction stream (needle) to a value that is guaranteed to already >> be in cache (candidate match). >> >> '''Pros:''' >> >> ?* Readability-wise, one can use the C switch statement to dispatch >> >> ?* "Stateless data", for compiled code it does not require any >> ? ?run-time initialization like interning does >> >> ?* One less pointer-dereference in the common case of a short >> ? ?signature >> >> '''Cons:''' >> >> ?* Long signatures will require more than 8 bytes to store and could >> ? ?thus be more expensive than interned strings > > We could also ignore trailing arguments and only dispatch based on a fixed > number of first arguments. Callees with more arguments would then simply > not export native signatures. > > >> ?* Format looks uglier in the form of literals in C source code > > They are not meant for reading, and we can always generate a comment with a > spelled-out readable signature next to it. > > >> == Signature strings == >> >> Example: The function >> {{{ >> int f(double x, float y); >> }}} >> would have the signature string {{{"df)i"}}} (or, to save space, {{{"idf"}}}). >> >> Fields would follow the PEP3118 extensions of the struct-module format >> string, but with some modifications: >> >> ?* The format should be canonical and fit for {{{strcmp}}}-like >> ? ?comparison: No whitespace, no field names (TBD: what else?) >> >> ?* TBD: Information about GIL requirements (nogil, with gil?), how >> ? ?exceptions are reported > > What about C++, including C++ exceptions? > > >> ?* TBD: Support for Cython-specific constructs like memoryview slices >> ? ?(so that arrays with strides and shape can be passed faster than >> ? ?passing an {{{"O"}}}). > > Is this really Cython specific or would a generic Py_buffer struct work? That could work through simple unboxing wrapper functions, but it would add some overhead, specifically because it would have to check the buffer's object, and if it didn't exist or was not a memoryview object, it would have to create one (checking whether something is a memoryview object would also be a pain, as each module has a different memoryview type). That could still be feasible for interaction with Cython functions from non-Cython code. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sat Apr 14 23:21:28 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:21:28 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: <4F898248.5030105@astro.uio.no> Message-ID: On 14 April 2012 22:13, Wes McKinney wrote: > On Sat, Apr 14, 2012 at 11:32 AM, mark florisson > wrote: >> On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: >>> On 04/14/2012 12:46 PM, mark florisson wrote: >>>> >>>> On 12 April 2012 22:00, Wes McKinney ?wrote: >>>>> >>>>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>>>> ?wrote: >>>>>> >>>>>> Yet another release candidate, this will hopefully be the last before >>>>>> the 0.16 release. You can grab it from here: >>>>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>>>> >>>>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>>>> goes through the object layer, which means direct assignment is no >>>>>> longer supported. >>>>>> >>>>>> If there are any problems, please let us know. >>>>>> _______________________________________________ >>>>>> cython-devel mailing list >>>>>> cython-devel at python.org >>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>> >>>>> >>>>> I'm unable to build pandas using git master Cython. I just released >>>>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>>>> >>>>> http://pypi.python.org/pypi/pandas >>>>> >>>>> For example: >>>>> >>>>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>>>> running build_ext >>>>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>>>> >>>>> Error compiling Cython file: >>>>> ------------------------------------------------------------ >>>>> ... >>>>> ? ? ? ?self.store = {} >>>>> >>>>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>>>> >>>>> ? ? ? ?for i in range(self.depth): >>>>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>>>> ------------------------------------------------------------ >>>>> >>>>> pandas/src/tseries.pyx:107:59: Compiler crash in >>>>> AnalyseExpressionsTransform >>>>> >>>>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>>>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>>>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>>>> ? ?as_name = u'MultiMap', >>>>> ? ?class_name = u'MultiMap', >>>>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>>>> multi-level indexing\n ? ?', >>>>> ? ?module_name = u'', >>>>> ? ?visibility = u'private') >>>>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>>>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>>>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>>>> ? ?modifiers = [...]/0, >>>>> ? ?name = u'__init__', >>>>> ? ?num_required_args = 2, >>>>> ? ?py_wrapper_required = True, >>>>> ? ?reqd_kw_flags_cname = '0', >>>>> ? ?used = True) >>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>> StatListNode(tseries.pyx:96:8) >>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>> StatListNode(tseries.pyx:106:8) >>>>> File 'Nodes.py', line 5903, in analyse_expressions: >>>>> ForInStatNode(tseries.pyx:106:8) >>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>> StatListNode(tseries.pyx:107:21) >>>>> File 'Nodes.py', line 4767, in analyse_expressions: >>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>> File 'Nodes.py', line 4872, in analyse_types: >>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>> File 'ExprNodes.py', line 7082, in analyse_types: >>>>> TypecastNode(tseries.pyx:107:21, >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> File 'ExprNodes.py', line 4274, in analyse_types: >>>>> AttributeNode(tseries.pyx:107:59, >>>>> ? ?attribute = u'data', >>>>> ? ?initialized_check = True, >>>>> ? ?is_attribute = 1, >>>>> ? ?member = u'data', >>>>> ? ?needs_none_check = True, >>>>> ? ?op = '->', >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>>>> AttributeNode(tseries.pyx:107:59, >>>>> ? ?attribute = u'data', >>>>> ? ?initialized_check = True, >>>>> ? ?is_attribute = 1, >>>>> ? ?member = u'data', >>>>> ? ?needs_none_check = True, >>>>> ? ?op = '->', >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>>>> AttributeNode(tseries.pyx:107:59, >>>>> ? ?attribute = u'data', >>>>> ? ?initialized_check = True, >>>>> ? ?is_attribute = 1, >>>>> ? ?member = u'data', >>>>> ? ?needs_none_check = True, >>>>> ? ?op = '->', >>>>> ? ?result_is_used = True, >>>>> ? ?use_managed_ref = True) >>>>> >>>>> Compiler crash traceback from this point on: >>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>>>> line 4436, in analyse_attribute >>>>> ? ?replacement_node = numpy_transform_attribute_node(self) >>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>>>> line 18, in numpy_transform_attribute_node >>>>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>>>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>>>> building 'pandas._tseries' extension >>>>> creating build >>>>> creating build/temp.linux-x86_64-2.7 >>>>> creating build/temp.linux-x86_64-2.7/pandas >>>>> creating build/temp.linux-x86_64-2.7/pandas/src >>>>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>>>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>>>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>>>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>>>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>>>> the result of a failed Cython compilation. >>>>> error: command 'gcc' failed with exit status 1 >>>>> >>>>> >>>>> ----- >>>>> >>>>> I kludged this particular line in the pandas/timeseries branch so it >>>>> will build on git master Cython, but I was treated to dozens of >>>>> failures, errors, and finally a segfault in the middle of the test >>>>> suite. Suffice to say I'm not sure I would advise you to release the >>>>> library in its current state until all of this is resolved. Happy to >>>>> help however I can but I'm back to 0.15.1 for now. >>>>> >>>>> - Wes >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> >>>> It seems that the numpy stopgap solution broke something in Pandas, >>>> I'm not sure what or how, but it leads to segfaults where code is >>>> trying to retrieve objects from a numpy array that are NULL. I tried >>>> disabling the numpy rewrites which unbreaks this with the cython >>>> release branch, so I think we should do another RC either with the >>>> attribute rewrite disabled or fixed. >>>> >>>> Dag, do you know what could have been broken by this fix that could >>>> lead to these results? >>> >>> >>> I can't imagine what causes a change like you say... one thing that could >>> cause a segfault is that technically we should now call import_array in >>> every module using numpy.pxd; while we don't do that. If a NumPy version is >>> used where PyArray_DATA or similar is not a macro, you would >>> segfault....that should be fixed... >>> >>> Dag >>> >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> Yeah that makes sense, but the thing is that pandas is already calling >> import_array everywhere, and the function calls themselves work, it's >> the result that's NULL. Now this could be a bug in pandas, but seeing >> that pandas works fine without the stopgap solution (that is, it >> doesn't pass all the tests but at least it doesn't segfault), I think >> it's something funky on our side. >> >> So I suppose I'll disable the fix for 0.16, and we can try to fix it >> for the next release. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Where is the bug in pandas / bad memory access? Maybe something I can > work around? > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel It may have something to do with the Sliders, I'm not sure, but without looking carefully at them they look somewhat dangerous. Anyway, here is a traceback from the Cython debugger: #7 0x00000000080dd760 in () at /home/mark/apps/bin/nosetests:8 8 load_entry_point('nose==1.1.2', 'console_scripts', 'nosetests')() #18 0x00000000080dd760 in __init__() at /home/mark/apps/lib/python2.7/site-packages/nose/core.py:118 118 **extra_args) #25 0x00000000080dd760 in __init__() at /home/mark/apps/lib/python2.7/unittest/main.py:95 95 self.runTests() #28 0x00000000080dd760 in runTests() at /home/mark/apps/lib/python2.7/site-packages/nose/core.py:197 197 result = self.testRunner.run(self.test) #31 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/core.py:61 61 test(result) #41 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #46 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #56 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/unittest/suite.py:65 65 return self.run(*args, **kwds) #61 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:74 74 test(result) #71 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #76 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #86 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #91 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #101 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #106 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #116 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 176 return self.run(*arg, **kw) #121 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 223 test(orig) #131 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/site-packages/nose/case.py:45 45 return self.run(*arg, **kwarg) #136 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/site-packages/nose/case.py:133 133 self.runTest(result) #139 0x00000000080dd760 in runTest() at /home/mark/apps/lib/python2.7/site-packages/nose/case.py:151 151 test(result) #149 0x00000000080dd760 in __call__() at /home/mark/apps/lib/python2.7/unittest/case.py:376 376 return self.run(*args, **kwds) #154 0x00000000080dd760 in run() at /home/mark/apps/lib/python2.7/unittest/case.py:318 318 testMethod() #157 0x00000000080dd760 in test_as_index_series_return_frame() at /home/mark/code/pandas/pandas/tests/test_groupby.py:710 710 expected = grouped.agg(np.sum).ix[:, ['A', 'C']] #161 0x00000000080dd760 in agg() at /home/mark/code/pandas/pandas/core/groupby.py:282 282 return self.aggregate(func, *args, **kwargs) #166 0x00000000080dd760 in aggregate() at /home/mark/code/pandas/pandas/core/groupby.py:1050 1050 result = self._aggregate_generic(arg, *args---Type to continue, or q to quit--- ;49;00m, **kwargs) #171 0x00000000080dd760 in _aggregate_generic() at /home/mark/code/pandas/pandas/core/groupby.py:1103 1103 return self._aggregate_item_by_item(func, *args, **kwargs) #176 0x00000000080dd760 in _aggregate_item_by_item() at /home/mark/code/pandas/pandas/core/groupby.py:1137 1137 result[item] = colg.agg(func, *args, **kwargs) #181 0x00000000080dd760 in agg() at /home/mark/code/pandas/pandas/core/groupby.py:282 282 return self.aggregate(func, *args, **kwargs) #186 0x00000000080dd760 in aggregate() at /home/mark/code/pandas/pandas/core/groupby.py:795 795 return self._python_agg_general(func_or_funcs, *args, **kwargs) #191 0x00000000080dd760 in _python_agg_general() at /home/mark/code/pandas/pandas/core/groupby.py:370 370 comp_ids, max_group) #194 0x00000000080dd760 in _aggregate_series() at /home/mark/code/pandas/pandas/core/groupby.py:421 421 return self._aggregate_series_fast(obj, func, group_index, ngroups) #197 0x00000000080dd760 in _aggregate_series_fast() at /home/mark/code/pandas/pandas/core/groupby.py:437 437 result, counts = grouper.get_result() #199 0x000000000091880e in get_result() at /home/mark/code/pandas/pandas/src/tseries.pyx:127 127 else: #204 0x00000000080dd760 in () at /home/mark/code/pandas/pandas/core/groupby.py:361 361 agg_func = lambda x: func(x, *args, **kwargs) #209 0x00000000080dd760 in sum() at /home/mark/apps/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1455 1455 return sum(axis, dtype, out) #213 0x00000000080dd760 in sum() at /home/mark/code/pandas/pandas/core/series.py:862 862 return nanops.nansum(self.values, skipna=skipna) #217 0x00000000080dd760 in f() at /home/mark/code/pandas/pandas/core/nanops.py:28 28 result = alt(values, axis=axis, skipna=skipna, **kwargs) #222 0x00000000080dd760 in _nansum() at /home/mark/code/pandas/pandas/core/nanops.py:48 48 mask = isnull(values) #225 0x00000000080dd760 in isnull() at /home/mark/code/pandas/pandas/core/common.py:60 60 vec = lib.isnullobj(obj.ravel()) #227 0x000000000088efe0 in isnullobj() at /home/mark/code/pandas/pandas/src/tseries.pyx:224 224 cpdef checknull(object val): Actually that last line is wrong, as the debugger is confused by Cython's 'include' statement (that has to be fixed as well at some point :). The error occurs on line 240 in isnullobj on the statement 'val = arr[i]', because arr[i] is a NULL PyObject *, so the incref fails. If you have any idea why the stopgap solution results in different behaviour, please let us know. From markflorisson88 at gmail.com Sat Apr 14 23:22:50 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 14 Apr 2012 22:22:50 +0100 Subject: [Cython] Cython 0.16 RC 1 In-Reply-To: References: <4F898248.5030105@astro.uio.no> Message-ID: On 14 April 2012 22:21, mark florisson wrote: > On 14 April 2012 22:13, Wes McKinney wrote: >> On Sat, Apr 14, 2012 at 11:32 AM, mark florisson >> wrote: >>> On 14 April 2012 14:57, Dag Sverre Seljebotn wrote: >>>> On 04/14/2012 12:46 PM, mark florisson wrote: >>>>> >>>>> On 12 April 2012 22:00, Wes McKinney ?wrote: >>>>>> >>>>>> On Thu, Apr 12, 2012 at 10:38 AM, mark florisson >>>>>> ?wrote: >>>>>>> >>>>>>> Yet another release candidate, this will hopefully be the last before >>>>>>> the 0.16 release. You can grab it from here: >>>>>>> http://wiki.cython.org/ReleaseNotes-0.16 >>>>>>> >>>>>>> There were several fixes for the numpy attribute rewrite, memoryviews >>>>>>> and fused types. Accessing the 'base' attribute of a typed ndarray now >>>>>>> goes through the object layer, which means direct assignment is no >>>>>>> longer supported. >>>>>>> >>>>>>> If there are any problems, please let us know. >>>>>>> _______________________________________________ >>>>>>> cython-devel mailing list >>>>>>> cython-devel at python.org >>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>> >>>>>> >>>>>> I'm unable to build pandas using git master Cython. I just released >>>>>> pandas 0.7.3 today which has no issues at all with 0.15.1: >>>>>> >>>>>> http://pypi.python.org/pypi/pandas >>>>>> >>>>>> For example: >>>>>> >>>>>> 16:57 ~/code/pandas ?(master)$ python setup.py build_ext --inplace >>>>>> running build_ext >>>>>> cythoning pandas/src/tseries.pyx to pandas/src/tseries.c >>>>>> >>>>>> Error compiling Cython file: >>>>>> ------------------------------------------------------------ >>>>>> ... >>>>>> ? ? ? ?self.store = {} >>>>>> >>>>>> ? ? ? ?ptr = ?malloc(self.depth * sizeof(int32_t*)) >>>>>> >>>>>> ? ? ? ?for i in range(self.depth): >>>>>> ? ? ? ? ? ?ptr[i] = ?( ?label_arrays[i]).data >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ >>>>>> ------------------------------------------------------------ >>>>>> >>>>>> pandas/src/tseries.pyx:107:59: Compiler crash in >>>>>> AnalyseExpressionsTransform >>>>>> >>>>>> ModuleNode.body = StatListNode(tseries.pyx:1:0) >>>>>> StatListNode.stats[23] = StatListNode(tseries.pyx:86:5) >>>>>> StatListNode.stats[0] = CClassDefNode(tseries.pyx:86:5, >>>>>> ? ?as_name = u'MultiMap', >>>>>> ? ?class_name = u'MultiMap', >>>>>> ? ?doc = u'\n ? ?Need to come up with a better data structure for >>>>>> multi-level indexing\n ? ?', >>>>>> ? ?module_name = u'', >>>>>> ? ?visibility = u'private') >>>>>> CClassDefNode.body = StatListNode(tseries.pyx:91:4) >>>>>> StatListNode.stats[1] = StatListNode(tseries.pyx:95:4) >>>>>> StatListNode.stats[0] = DefNode(tseries.pyx:95:4, >>>>>> ? ?modifiers = [...]/0, >>>>>> ? ?name = u'__init__', >>>>>> ? ?num_required_args = 2, >>>>>> ? ?py_wrapper_required = True, >>>>>> ? ?reqd_kw_flags_cname = '0', >>>>>> ? ?used = True) >>>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>>> StatListNode(tseries.pyx:96:8) >>>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>>> StatListNode(tseries.pyx:106:8) >>>>>> File 'Nodes.py', line 5903, in analyse_expressions: >>>>>> ForInStatNode(tseries.pyx:106:8) >>>>>> File 'Nodes.py', line 342, in analyse_expressions: >>>>>> StatListNode(tseries.pyx:107:21) >>>>>> File 'Nodes.py', line 4767, in analyse_expressions: >>>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>>> File 'Nodes.py', line 4872, in analyse_types: >>>>>> SingleAssignmentNode(tseries.pyx:107:21) >>>>>> File 'ExprNodes.py', line 7082, in analyse_types: >>>>>> TypecastNode(tseries.pyx:107:21, >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> File 'ExprNodes.py', line 4274, in analyse_types: >>>>>> AttributeNode(tseries.pyx:107:59, >>>>>> ? ?attribute = u'data', >>>>>> ? ?initialized_check = True, >>>>>> ? ?is_attribute = 1, >>>>>> ? ?member = u'data', >>>>>> ? ?needs_none_check = True, >>>>>> ? ?op = '->', >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> File 'ExprNodes.py', line 4360, in analyse_as_ordinary_attribute: >>>>>> AttributeNode(tseries.pyx:107:59, >>>>>> ? ?attribute = u'data', >>>>>> ? ?initialized_check = True, >>>>>> ? ?is_attribute = 1, >>>>>> ? ?member = u'data', >>>>>> ? ?needs_none_check = True, >>>>>> ? ?op = '->', >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> File 'ExprNodes.py', line 4436, in analyse_attribute: >>>>>> AttributeNode(tseries.pyx:107:59, >>>>>> ? ?attribute = u'data', >>>>>> ? ?initialized_check = True, >>>>>> ? ?is_attribute = 1, >>>>>> ? ?member = u'data', >>>>>> ? ?needs_none_check = True, >>>>>> ? ?op = '->', >>>>>> ? ?result_is_used = True, >>>>>> ? ?use_managed_ref = True) >>>>>> >>>>>> Compiler crash traceback from this point on: >>>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/ExprNodes.py", >>>>>> line 4436, in analyse_attribute >>>>>> ? ?replacement_node = numpy_transform_attribute_node(self) >>>>>> ?File "/home/wesm/code/repos/cython/Cython/Compiler/NumpySupport.py", >>>>>> line 18, in numpy_transform_attribute_node >>>>>> ? ?numpy_pxd_scope = node.obj.entry.type.scope.parent_scope >>>>>> AttributeError: 'TypecastNode' object has no attribute 'entry' >>>>>> building 'pandas._tseries' extension >>>>>> creating build >>>>>> creating build/temp.linux-x86_64-2.7 >>>>>> creating build/temp.linux-x86_64-2.7/pandas >>>>>> creating build/temp.linux-x86_64-2.7/pandas/src >>>>>> gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -O2 -fPIC >>>>>> -I/home/wesm/epd/lib/python2.7/site-packages/numpy/core/include >>>>>> -I/home/wesm/epd/include/python2.7 -c pandas/src/tseries.c -o >>>>>> build/temp.linux-x86_64-2.7/pandas/src/tseries.o >>>>>> pandas/src/tseries.c:1:2: error: #error Do not use this file, it is >>>>>> the result of a failed Cython compilation. >>>>>> error: command 'gcc' failed with exit status 1 >>>>>> >>>>>> >>>>>> ----- >>>>>> >>>>>> I kludged this particular line in the pandas/timeseries branch so it >>>>>> will build on git master Cython, but I was treated to dozens of >>>>>> failures, errors, and finally a segfault in the middle of the test >>>>>> suite. Suffice to say I'm not sure I would advise you to release the >>>>>> library in its current state until all of this is resolved. Happy to >>>>>> help however I can but I'm back to 0.15.1 for now. >>>>>> >>>>>> - Wes >>>>>> _______________________________________________ >>>>>> cython-devel mailing list >>>>>> cython-devel at python.org >>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>> >>>>> >>>>> It seems that the numpy stopgap solution broke something in Pandas, >>>>> I'm not sure what or how, but it leads to segfaults where code is >>>>> trying to retrieve objects from a numpy array that are NULL. I tried >>>>> disabling the numpy rewrites which unbreaks this with the cython >>>>> release branch, so I think we should do another RC either with the >>>>> attribute rewrite disabled or fixed. >>>>> >>>>> Dag, do you know what could have been broken by this fix that could >>>>> lead to these results? >>>> >>>> >>>> I can't imagine what causes a change like you say... one thing that could >>>> cause a segfault is that technically we should now call import_array in >>>> every module using numpy.pxd; while we don't do that. If a NumPy version is >>>> used where PyArray_DATA or similar is not a macro, you would >>>> segfault....that should be fixed... >>>> >>>> Dag >>>> >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> Yeah that makes sense, but the thing is that pandas is already calling >>> import_array everywhere, and the function calls themselves work, it's >>> the result that's NULL. Now this could be a bug in pandas, but seeing >>> that pandas works fine without the stopgap solution (that is, it >>> doesn't pass all the tests but at least it doesn't segfault), I think >>> it's something funky on our side. >>> >>> So I suppose I'll disable the fix for 0.16, and we can try to fix it >>> for the next release. >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> Where is the bug in pandas / bad memory access? Maybe something I can >> work around? >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > It may have something to do with the Sliders, I'm not sure, but > without looking carefully at them they look somewhat dangerous. > Anyway, here is a traceback from the Cython debugger: > > #7 ?0x00000000080dd760 in () at /home/mark/apps/bin/nosetests:8 > ? ? ? ? 8 ? ? ? load_entry_point('nose==1.1.2', 'console_scripts', > 'nosetests')() > #18 0x00000000080dd760 in __init__() at > /home/mark/apps/lib/python2.7/site-packages/nose/core.py:118 > ? ? ? 118 ? ? ? ? ? ? ? ?**extra_args) > #25 0x00000000080dd760 in __init__() at > /home/mark/apps/lib/python2.7/unittest/main.py:95 > ? ? ? ?95 ? ? ? ? ? ?self.runTests() > #28 0x00000000080dd760 in runTests() at > /home/mark/apps/lib/python2.7/site-packages/nose/core.py:197 > ? ? ? 197 ? ? ? ? ? ?result = self.testRunner.run(self.test) > #31 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/core.py:61 > ? ? ? ?61 ? ? ? ? ? ?test(result) > #41 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #46 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #56 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/unittest/suite.py:65 > ? ? ? ?65 ? ? ? ? ? ?return self.run(*args, **kwds) > #61 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:74 > ? ? ? ?74 ? ? ? ? ? ? ? ?test(result) > #71 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #76 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #86 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #91 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #101 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #106 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #116 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:176 > ? ? ? 176 ? ? ? ? ? ?return self.run(*arg, **kw) > #121 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/suite.py:223 > ? ? ? 223 ? ? ? ? ? ? ? ? ? ?test(orig) > #131 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/site-packages/nose/case.py:45 > ? ? ? ?45 ? ? ? ? ? ?return self.run(*arg, **kwarg) > #136 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/site-packages/nose/case.py:133 > ? ? ? 133 ? ? ? ? ? ? ? ? ? ?self.runTest(result) > #139 0x00000000080dd760 in runTest() at > /home/mark/apps/lib/python2.7/site-packages/nose/case.py:151 > ? ? ? 151 ? ? ? ? ? ?test(result) > #149 0x00000000080dd760 in __call__() at > /home/mark/apps/lib/python2.7/unittest/case.py:376 > ? ? ? 376 ? ? ? ? ? ?return self.run(*args, **kwds) > #154 0x00000000080dd760 in run() at > /home/mark/apps/lib/python2.7/unittest/case.py:318 > ? ? ? 318 ? ? ? ? ? ? ? ? ? ? ? ?testMethod() > #157 0x00000000080dd760 in test_as_index_series_return_frame() at > /home/mark/code/pandas/pandas/tests/test_groupby.py:710 > ? ? ? 710 ? ? ? ? ? ?expected = grouped.agg(np.sum).ix[:, ['A', 'C']] > #161 0x00000000080dd760 in agg() at > /home/mark/code/pandas/pandas/core/groupby.py:282 > ? ? ? 282 ? ? ? ? ? ?return self.aggregate(func, *args, **kwargs) > #166 0x00000000080dd760 in aggregate() at > /home/mark/code/pandas/pandas/core/groupby.py:1050 > ? ? ?1050 ? ? ? ? ? ? ? ? ? ?result = self._aggregate_generic(arg, > *args---Type to continue, or q to quit--- > ;49;00m, **kwargs) > #171 0x00000000080dd760 in _aggregate_generic() at > /home/mark/code/pandas/pandas/core/groupby.py:1103 > ? ? ?1103 ? ? ? ? ? ? ? ? ? ?return > self._aggregate_item_by_item(func, *args, **kwargs) > #176 0x00000000080dd760 in _aggregate_item_by_item() at > /home/mark/code/pandas/pandas/core/groupby.py:1137 > ? ? ?1137 ? ? ? ? ? ? ? ? ? ?result[item] = colg.agg(func, *args, **kwargs) > #181 0x00000000080dd760 in agg() at > /home/mark/code/pandas/pandas/core/groupby.py:282 > ? ? ? 282 ? ? ? ? ? ?return self.aggregate(func, *args, **kwargs) > #186 0x00000000080dd760 in aggregate() at > /home/mark/code/pandas/pandas/core/groupby.py:795 > ? ? ? 795 ? ? ? ? ? ? ? ? ? ?return > self._python_agg_general(func_or_funcs, *args, **kwargs) > #191 0x00000000080dd760 in _python_agg_general() at > /home/mark/code/pandas/pandas/core/groupby.py:370 > ? ? ? 370 > comp_ids, max_group) > #194 0x00000000080dd760 in _aggregate_series() at > /home/mark/code/pandas/pandas/core/groupby.py:421 > ? ? ? 421 ? ? ? ? ? ? ? ?return self._aggregate_series_fast(obj, > func, group_index, ngroups) > #197 0x00000000080dd760 in _aggregate_series_fast() at > /home/mark/code/pandas/pandas/core/groupby.py:437 > ? ? ? 437 ? ? ? ? ? ?result, counts = grouper.get_result() > #199 0x000000000091880e in get_result() at > /home/mark/code/pandas/pandas/src/tseries.pyx:127 > ? ? ? 127 ? ? ? ? ? ? ? ?else: > #204 0x00000000080dd760 in () at > /home/mark/code/pandas/pandas/core/groupby.py:361 > ? ? ? 361 ? ? ? ? ? ?agg_func = lambda x: func(x, *args, **kwargs) > #209 0x00000000080dd760 in sum() at > /home/mark/apps/lib/python2.7/site-packages/numpy/core/fromnumeric.py:1455 > ? ? ?1455 ? ? ? ?return sum(axis, dtype, out) > #213 0x00000000080dd760 in sum() at > /home/mark/code/pandas/pandas/core/series.py:862 > ? ? ? 862 ? ? ? ? ? ?return nanops.nansum(self.values, skipna=skipna) > #217 0x00000000080dd760 in f() at > /home/mark/code/pandas/pandas/core/nanops.py:28 > ? ? ? ?28 ? ? ? ? ? ? ? ? ? ?result = alt(values, axis=axis, > skipna=skipna, **kwargs) > #222 0x00000000080dd760 in _nansum() at > /home/mark/code/pandas/pandas/core/nanops.py:48 > ? ? ? ?48 ? ? ? ?mask = isnull(values) > #225 0x00000000080dd760 in isnull() at > /home/mark/code/pandas/pandas/core/common.py:60 > ? ? ? ?60 ? ? ? ? ? ? ? ?vec = lib.isnullobj(obj.ravel()) > #227 0x000000000088efe0 in isnullobj() at > /home/mark/code/pandas/pandas/src/tseries.pyx:224 > ? ? ? 224 ? ?cpdef checknull(object val): > > Actually that last line is wrong, as the debugger is confused by > Cython's 'include' statement (that has to be fixed as well at some > point :). The error occurs on line 240 in isnullobj on the statement > 'val = arr[i]', because arr[i] is a NULL PyObject *, so the incref > fails. > > If you have any idea why the stopgap solution results in different > behaviour, please let us know. (The get_result() is actually from reduce.pyx, not from tseries.pyx, but again the debugger is confused by the include of reduce.pyx). From d.s.seljebotn at astro.uio.no Sat Apr 14 23:30:04 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 14 Apr 2012 23:30:04 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89E6C5.5070007@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E6C5.5070007@behnel.de> Message-ID: <911f5414-8247-44a7-bc14-68c1362139ed@email.android.com> Stefan Behnel wrote: >mark florisson, 14.04.2012 23:00: >> On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >>> * TBD: Information about GIL requirements (nogil, with gil?), how >>> exceptions are reported >> >> Maybe that could be a separate list, to be consulted mostly for >> explicit casts (I think PyErr_Occurred() would be the default for >> non-object return types). > >Good idea. We could have an additional "flags" field for each signature >(or >maybe just each callable?) that would contain orthogonal information >about >exception handling and GIL requirements. I don't think gil/nogil is orthogonal at all; I think you could export both versions as two different overloads (so that one can jump past gil-acquisition in with-gil-functions, etc) Dag > >Stefan >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From greg.ewing at canterbury.ac.nz Sun Apr 15 02:28:59 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Apr 2012 12:28:59 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> Message-ID: <4F8A164B.1050507@canterbury.ac.nz> Robert Bradshaw wrote: > Has anyone done any experiments/timings to see if having constants vs. > globals even matters? My gut feeling is that one extra memory read is going to be insignificant compared to the time taken by the call itself and whatever it does. But of course gut feelings are always better when backed up (or refuted!) by measurements. -- Greg From greg.ewing at canterbury.ac.nz Sun Apr 15 03:07:43 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 15 Apr 2012 13:07:43 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89CB1D.6000109@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: <4F8A1F5F.3030602@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > if (obj has signature "id)i") { This is an aside, but is it really necessary to define the signature syntax in a way that involves unmatched parens? Some editors (such as the one I like to use) get confused by this, even when they're inside quotes. The answer "get a better editor" would be entirely appropriate if there were some advantage to this syntax, over a non-unbalanced one, but I can't see any. -- Greg From robertwb at gmail.com Sun Apr 15 07:59:01 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 22:59:01 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: > On 14 April 2012 20:08, Dag Sverre Seljebotn wrote: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so that >>> SciPy functions can call it, e.g. (I'm making the numba part up): >> >> >> >> This thread is turning into one of those big ones... >> >> But I think it is really worth it in the end; I'm getting excited about the >> possibility down the road of importing functions using normal Python >> mechanisms and still have fast calls. >> >> Anyway, to organize discussion I've tried to revamp the CEP and describe >> both the intern-way and the strcmp-way. >> >> The wiki appears to be down, so I'll post it below... >> >> Dag >> >> = CEP 1000: Convention for native dispatches through Python callables = >> >> Many callable objects are simply wrappers around native code. ?This >> holds for any Cython function, f2py functions, manually >> written CPython extensions, Numba, etc. >> >> Obviously, when native code calls other native code, it would be >> nice to skip the significant cost of boxing and unboxing all the arguments. >> Early binding at compile-time is only possible >> between different Cython modules, not between all the tools >> listed above. >> >> [[enhancements/nativecall|CEP 523]] deals with Cython-specific aspects >> (and is out-of-date w.r.t. this CEP); this CEP is intended to be about >> a cross-project convention only. If a success, this CEP may be >> proposesd as a PEP in a modified form. >> >> Motivating example (looking a year or two into the future): >> >> {{{ >> @numba >> def f(x): return 2 * x >> >> @cython.inline >> def g(x : cython.double): return 3 * x >> >> from fortranmod import h >> >> print f(3) >> print g(3) >> print h(3) >> print scipy.integrate.quad(f, 0.2, 3) # fast callback! >> print scipy.integrate.quad(g, 0.2, 3) # fast callback! >> print scipy.integrate.quad(h, 0.2, 3) # fast callback! >> >> }}} >> >> == The native-call slot == >> >> We need ''fast'' access to probing whether a callable object supports >> this CEP. ?Other mechanisms, such as an attribute in a dict, is too >> slow for many purposes (quoting robertwb: "We're trying to get a 300ns >> dispatch down to 10ns; you do not want a 50ns dict lookup"). (Obviously, >> if you call a callable in a loop you can fetch the pointer outside >> of the loop. But in particular if this becomes a language feature >> in Cython it will be used in all sorts of places.) >> >> So we hack another type slot into existing and future CPython >> implementations in the following way: This CEP provides a C header >> that for all Python versions define a macro >> {{{Py_TPFLAGS_UNOFFICIAL_EXTRAS}}} for a free bit in >> {{{tp_flags}}} in the {{{PyTypeObject}}}. >> >> If present, then we extend {{{PyTypeObject}}} >> as follows: >> {{{ >> typedef struct { >> ? ?PyTypeObject tp_main; >> ? ?size_t tp_unofficial_flags; >> ? ?size_t tp_nativecall_offset; >> } PyUnofficialTypeObject; >> }}} >> >> {{{tp_unofficial_flags}}} is unused and should be all 0 for the time >> being, but can be used later to indicate features beyond this CEP. >> >> If {{{tp_nativecall_offset != 0}}}, this CEP is supported, and >> the information for doing a native dispatch on a callable {{{obj}}} >> is located at >> {{{ >> (char*)obj + ((PyUnofficialTypeObject*)obj->ob_type)->tp_nativecall_offset; >> }}} >> >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. >> >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload, each >> described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. >> >> The way it is stored must cater for two cases; first, when the caller >> expects one or more hard-coded signatures: >> {{{ >> if (obj has signature "id)i") { >> ? ?call; >> } else if (obj has signature "if)i") { >> ? ?call with promoted second argument; >> } else { >> ? ?box all arguments; >> ? ?PyObject_Call; >> } >> }}} > > There may be a lot of promotion/demotion (you likely only want the > former) combinations, especially for multiple arguments, so perhaps it > makes sense to limit ourselves a bit. For instance for numeric scalar > argument types we could limit to long (and the unsigned counterparts), > double and double complex. > > So char, short and int scalars will be > promoted to long, float to double and float complex to double complex. > Anything bigger, like long long etc will be matched specifically. > Promotions and associated demotions if necessary in the callee should > be fairly cheap compared to checking all combinations or going through > the python layer. True, though this could be a convention rather than a requirement of the spec. Long vs. < long seems natural, but are there any systems where (scalar) float still has an advantage over double? Of course pointers like float* vs double* can't be promoted, so we would still need this kind of type declaration. >> The second is when a call stack is built dynamically while parsing the >> string. Since this has higher overhead anyway, optimizing for the first >> case makes sense. >> >> === Approach 1: Interning/run-time allocated IDs === >> >> >> 1A: Let each overload have a struct >> {{{ >> struct { >> ? ?size_t signature_id; >> ? ?char *signature; >> ? ?void *func_ptr; >> }; >> }}} >> Within each process run, there is a 1:1 between {{{signature}}} and >> {{{signature_id}}}. {{{signature_id}}} is allocated by some central >> registry. >> >> 1B: Intern the string instead: >> {{{ >> struct { >> ? ?char *signature; /* pointer must come from the central registry */ >> ? ?void *func_ptr; >> }; >> }}} >> However this is '''not'' trivial, since signature strings can >> be allocated on the heap (e.g., a JIT would do this), so interned strings >> must be memory managed and reference counted. This could be done by >> each object passing in the signature '''both''' when incref-ing and >> decref-ing the signature string in the interning machinery. >> Using Python {{{bytes}}} objects is another option. >> >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': Comparing a global variable >> (needle) >> to a value that is guaranteed to already be in cache (candidate match) >> >> '''Pros:''' >> >> ?* Conceptually simple struct format. >> >> '''Cons:''' >> >> ?* Requires a registry for interning strings. This must be >> ? "handshaked" between the implementors of this CEP (probably by >> ? "first to get at {{{sys.modules["_nativecall"}}} sticks it there), >> ? as we can't ship a common dependency library for this CEP. >> >> === Approach 2: Efficient strcmp of verbatim signatures === >> >> The idea is to store the full signatures and the function pointers together >> in the same memory area, but still have some structure to allow for quick >> scanning through the list. >> >> Each entry has the structure {{{[signature_string, funcptr]}}} >> where: >> >> ?* The signature string has variable length, but the length is >> ? divisible by 8 bytes on all platforms. The {{{funcptr}}} is always >> ? 8 bytes (it is padded on 32-bit systems). >> >> ?* The total size of the entry should be divisible by 16 bytes (= the >> ? signature data should be 8 bytes, or 24 bytes, or...) >> >> ?* All but the first chunk of signature data should start with a >> ? continuation character "-", i.e. a really long signature string >> ? could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is >> ? inserted on all positions in the string divisible by 8, except the >> ? first. >> >> The point is that if you know a signature, you can quickly scan >> through the binary blob for the signature in 128 bit increments, >> without worrying about the variable size nature of each entry. ?The >> rules above protects against spurious matches. Note that these two approaches need not be mutually exclusive; a cutoff could be established giving the best (and worst) of both. >> ==== Optional: Encoding ==== >> >> The strcmp approach can be made efficient for larger signatures by >> using a more efficient encoding than ASCII. E.g., an encoding could >> use 4 bits for the 12 most common symbols and 8 bits >> for 64 symbols (for a total of 78 symbols), of which some could be >> letter combinations ("Zd", "T{"). This should be reasonably simple >> to encode and decode. >> >> The CEP should provide C routines in a header file to work with the >> signatures. Callers that wish to parse the format string and build a >> call stack on the fly should probably work with the encoded >> representation. >> >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': For the vast majority of >> functions, the cost is comparing a 64-bit number stored in the CPU >> instruction stream (needle) to a value that is guaranteed to already >> be in cache (candidate match). >> >> '''Pros:''' >> >> ?* Readability-wise, one can use the C switch statement to dispatch >> >> ?* "Stateless data", for compiled code it does not require any >> ? run-time initialization like interning does >> >> ?* One less pointer-dereference in the common case of a short >> ? signature >> >> '''Cons:''' >> >> ?* Long signatures will require more than 8 bytes to store and could >> ? thus be more expensive than interned strings >> >> ?* Format looks uglier in the form of literals in C source code >> >> >> == Signature strings == >> >> Example: The function >> {{{ >> int f(double x, float y); >> }}} >> would have the signature string {{{"df)i"}}} (or, to save space, >> {{{"idf"}}}). >> >> Fields would follow the PEP3118 extensions of the struct-module format >> string, but with some modifications: >> >> ?* The format should be canonical and fit for {{{strcmp}}}-like >> ? comparison: No whitespace, no field names (TBD: what else?) > > I think alignment is also a troublemaker. Maybe we should allow '@' > (which cannot appear in the character string but will be the default, > that is native size, alignment and byteorder) and '^', unaligned > native size and byteorder (to be used for packed structs). > >> ?* TBD: Information about GIL requirements (nogil, with gil?), how >> ? exceptions are reported > > Maybe that could be a separate list, to be consulted mostly for > explicit casts (I think PyErr_Occurred() would be the default for > non-object return types). > >> ?* TBD: Support for Cython-specific constructs like memoryview slices >> ? (so that arrays with strides and shape can be passed faster than >> ? passing an {{{"O"}}}). > > Definitely, maybe something simple like M{1f}, for a 1D memoryview > slice of floats. It would certainly be useful to have special syntax for memory views (after nailing down a well-defined ABI for them) and builtin types. Being able to declare something as taking a "sage.rings.integer.Integer" could also prove useful, but could result in long (and prefix-sharing) signatures, favoring the runtime-allocated ids. From robertwb at gmail.com Sun Apr 15 08:07:13 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 23:07:13 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A164B.1050507@canterbury.ac.nz> References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> <4F8A164B.1050507@canterbury.ac.nz> Message-ID: On Sat, Apr 14, 2012 at 5:28 PM, Greg Ewing wrote: > Robert Bradshaw wrote: > >> Has anyone done any experiments/timings to see if having constants vs. >> globals even matters? > > > My gut feeling is that one extra memory read is going to be > insignificant compared to the time taken by the call itself > and whatever it does. This is most valuable for really fast calls (e.g. a user-defined double -> double), and compilers (and processors) have evolved to a point that they're often surprising and difficult to reason about. > But of course gut feelings are always > better when backed up (or refuted!) by measurements. I agree with your gut feeling (where insignificant to me is <3%) but can't rule it out, and data trumps consensus :). > This is an aside, but is it really necessary to define the > signature syntax in a way that involves unmatched parens? > Some editors (such as the one I like to use) get confused > by this, even when they're inside quotes. > > The answer "get a better editor" would be entirely > appropriate if there were some advantage to this syntax, > over a non-unbalanced one, but I can't see any. Brevity, especially if the signature is inlined. (Encoding could take care of this by, e.g. ignoring the redundant opening, or we could just write di=d.) - Robert From stefan_ml at behnel.de Sun Apr 15 08:16:47 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2012 08:16:47 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> Message-ID: <4F8A67CF.4010503@behnel.de> Robert Bradshaw, 15.04.2012 07:59: > On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: >> There may be a lot of promotion/demotion (you likely only want the >> former) combinations, especially for multiple arguments, so perhaps it >> makes sense to limit ourselves a bit. For instance for numeric scalar >> argument types we could limit to long (and the unsigned counterparts), >> double and double complex. >> >> So char, short and int scalars will be >> promoted to long, float to double and float complex to double complex. >> Anything bigger, like long long etc will be matched specifically. >> Promotions and associated demotions if necessary in the callee should >> be fairly cheap compared to checking all combinations or going through >> the python layer. > > True, though this could be a convention rather than a requirement of > the spec. Long vs. < long seems natural, but are there any systems > where (scalar) float still has an advantage over double? > > Of course pointers like float* vs double* can't be promoted, so we > would still need this kind of type declaration. Yes, passing data sets as C arrays requires proper knowledge about their memory layout on both sides. OTOH, we are talking about functions that would otherwise be called through Python, so this could only apply for buffers anyway. So why not require a Py_buffer* as argument for them? Stefan From stefan_ml at behnel.de Sun Apr 15 08:26:12 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2012 08:26:12 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> Message-ID: <4F8A6A04.4000402@behnel.de> mark florisson, 14.04.2012 23:15: > On 14 April 2012 22:02, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 14.04.2012 21:08: >>> * TBD: Support for Cython-specific constructs like memoryview slices >>> (so that arrays with strides and shape can be passed faster than >>> passing an {{{"O"}}}). >> >> Is this really Cython specific or would a generic Py_buffer struct work? > > That could work through simple unboxing wrapper functions, but it > would add some overhead, specifically because it would have to check > the buffer's object, and if it didn't exist or was not a memoryview > object, it would have to create one (checking whether something is a > memoryview object would also be a pain, as each module has a different > memoryview type). That could still be feasible for interaction with > Cython functions from non-Cython code. Hmm, I don't get it. Isn't the overhead always there when a memory view is requested in the signature? You'd have to create one for each call and that seriously hurts the efficiency. Is that a common use case? Why would you want to do more than passing unboxed buffers? Stefan From robertwb at gmail.com Sun Apr 15 08:27:11 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 23:27:11 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89E5CD.5060301@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> Message-ID: On Sat, Apr 14, 2012 at 2:02 PM, Stefan Behnel wrote: > Hi, > > thanks for writing this up. Comments inline as I read through it. > > Dag Sverre Seljebotn, 14.04.2012 21:08: >> === GIL-less accesss === >> >> It is OK to access the native-call table without holding the GIL. This >> should of course only be used to call functions that state in their >> signature that they don't need the GIL. >> >> This is important for JITted callables who would like to rewrite their >> table as more specializations gets added; if one needs to reallocate >> the table, the old table must linger along long enough that all >> threads that are currently accessing it are done with it. > > The problem here is that changing the table in the face of threaded access > is very likely to introduce race conditions, and the average library out > there won't know when all threads are done with it. I don't think later > modification is a good idea. I agree; a JIT that wants to do this can over-allocate. >> == Native dispatch descriptor == >> >> The final format for the descriptor is not agreed upon yet; this sums >> up the major alternatives. >> >> The descriptor should be a list of specializations/overload > > While overloaded signatures are great for the callee, they make things much > more complicated for the caller. It's no longer just one signature that > either matches or not. Especially when we allow more than one expected > signature, then each of them has to be compared against all exported > signatures. > > We'll have to see what the runtime impact and the impact on the code > complexity is, I guess. The caller could choose to only check the first signature to avoid complexity. I think, however, that overloaded signatures are important, and even checking a dozen is cheaper than going through the Python call. Fused types naturally lead to overloads as well. >> each described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. > > How do we deal with object argument types? Do we care on the caller side? > Functions might have alternative signatures that differ in the type of > their object parameters. Or should we handle this inside of the caller and > expect that it's something like a fused function with internal dispatch in > that case? > > Personally, I think there is not enough to gain from object parameters that > we should handle it on the caller side. The callee can dispatch those if > necessary. I don't think we should prohibit the signature from being able to declare arbitrary Cython types. Whether it proves useful is dependent on the library, and is the library writer's choice. > What about signatures that require an object when we have a C typed value? > > What about signatures that require a C typed argument when we have an > arbitrary object value in our call parameters? When considering conversion, one gets into the sticky question of finding the "best" overload. I'd be inclined to not do any conversion in the caller. One can (should) export the object version of the signature as well to avoid the slow Python call. > We should also strip the "self" argument from the parameter list of > methods. That's handled by the attribute lookup before even getting at the > callable. > > >> === Approach 1: Interning/run-time allocated IDs === >> >> >> 1A: Let each overload have a struct >> {{{ >> struct { >> ? ? size_t signature_id; >> ? ? char *signature; >> ? ? void *func_ptr; >> }; >> }}} >> Within each process run, there is a 1:1 > > mapping/relation > >> between {{{signature}}} and >> {{{signature_id}}}. {{{signature_id}}} is allocated by some central >> registry. >> >> 1B: Intern the string instead: >> {{{ >> struct { >> ? ? char *signature; /* pointer must come from the central registry */ >> ? ? void *func_ptr; >> }; >> }}} >> However this is '''not'' trivial, since signature strings can >> be allocated on the heap (e.g., a JIT would do this), so interned strings >> must be memory managed and reference counted. > > Not necessarily, they are really short strings that could just live > forever, stored efficiently by the registry in a series of larger memory > blocks. It would take a while to fill up enough memory with those to become > problematic. Finding an efficiently lookup scheme for them might become > interesting at some point, but that would also take a while. > > I don't expect real-world systems to have to deal with thousands of > different runtime(!) discovered signatures during one interpreter lifetime. > > >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': Comparing a global variable (needle) >> to a value that is guaranteed to already be in cache (candidate match) >> >> '''Pros:''' >> >> ?* Conceptually simple struct format. >> >> '''Cons:''' >> >> ?* Requires a registry for interning strings. This must be >> ? ?"handshaked" between the implementors of this CEP (probably by >> ? ?"first to get at {{{sys.modules["_nativecall"}}} sticks it there), >> ? ?as we can't ship a common dependency library for this CEP. > > ... which would eventually end up in the stdlib, but could equally well > come from PyPI for now. I don't see a problem with that. > > Using sys.modules (or another global store) instead of an explicit import > allows for dependency injection, that's good. It excludes (or makes it difficult) for non-Python libraries to participate. >> === Approach 2: Efficient strcmp of verbatim signatures === >> >> The idea is to store the full signatures and the function pointers together >> in the same memory area, but still have some structure to allow for quick >> scanning through the list. >> >> Each entry has the structure {{{[signature_string, funcptr]}}} >> where: >> >> ?* The signature string has variable length, but the length is >> ? ?divisible by 8 bytes on all platforms. The {{{funcptr}}} is always >> ? ?8 bytes (it is padded on 32-bit systems). >> >> ?* The total size of the entry should be divisible by 16 bytes (= the >> ? ?signature data should be 8 bytes, or 24 bytes, or...) >> >> ?* All but the first chunk of signature data should start with a >> ? ?continuation character "-", i.e. a really long signature string >> ? ?could be {{{"iiiidddd-iiidddd-iiidddd-)d"}}}. That is, a "-" is >> ? ?inserted on all positions in the string divisible by 8, except the >> ? ?first. >> >> The point is that if you know a signature, you can quickly scan >> through the binary blob for the signature in 128 bit increments, >> without worrying about the variable size nature of each entry. ?The >> rules above protects against spurious matches. > > Sounds pretty fast to me. Absolutely worth trying. And if we store the > signature we compare against in the same format, we won't have to parse the > signature string as such, we can really just compare the numeric values. > Assuming that's really fast, that would allow the callee to optimistically > export additional signatures, e.g. with compatible subtypes or easily > coercible types, ordered by the expected overhead of processing the > arguments (and the expected probability of being called), so that the > caller would automatically hit the fastest call path first when traversing > the list from start to end. The number of possible signatures would > obviously explode at some point... > > Note that JITs could still be smart enough to avoid the traversal after a > few loop iterations. > > One problem: if any of the call parameters is a plain object type, identity > matches may not work anymore because we won't know what signature to expect. > > >> ==== Optional: Encoding ==== >> >> The strcmp approach can be made efficient for larger signatures by >> using a more efficient encoding than ASCII. E.g., an encoding could >> use 4 bits for the 12 most common symbols and 8 bits >> for 64 symbols (for a total of 78 symbols), of which some could be >> letter combinations ("Zd", "T{"). This should be reasonably simple >> to encode and decode. >> >> The CEP should provide C routines in a header file to work with the >> signatures. Callers that wish to parse the format string and build a >> call stack on the fly should probably work with the encoded >> representation. > > Huffman codes can be processed bitwise from start to end, that would work. > > However, this would quickly die when we start adding arbitrary object > types. That would require a global registry for user types again. A reason > not to care about object types at the caller. > > Also, how do we encode struct/union argument types? > > >> ==== Discussion ==== >> >> '''The cost of comparing a signature''': For the vast majority of >> functions, the cost is comparing a 64-bit number stored in the CPU >> instruction stream (needle) to a value that is guaranteed to already >> be in cache (candidate match). >> >> '''Pros:''' >> >> ?* Readability-wise, one can use the C switch statement to dispatch >> >> ?* "Stateless data", for compiled code it does not require any >> ? ?run-time initialization like interning does >> >> ?* One less pointer-dereference in the common case of a short >> ? ?signature >> >> '''Cons:''' >> >> ?* Long signatures will require more than 8 bytes to store and could >> ? ?thus be more expensive than interned strings > > We could also ignore trailing arguments and only dispatch based on a fixed > number of first arguments. Callees with more arguments would then simply > not export native signatures. > > >> ?* Format looks uglier in the form of literals in C source code > > They are not meant for reading, and we can always generate a comment with a > spelled-out readable signature next to it. > > >> == Signature strings == >> >> Example: The function >> {{{ >> int f(double x, float y); >> }}} >> would have the signature string {{{"df)i"}}} (or, to save space, {{{"idf"}}}). >> >> Fields would follow the PEP3118 extensions of the struct-module format >> string, but with some modifications: >> >> ?* The format should be canonical and fit for {{{strcmp}}}-like >> ? ?comparison: No whitespace, no field names (TBD: what else?) >> >> ?* TBD: Information about GIL requirements (nogil, with gil?), how >> ? ?exceptions are reported > > What about C++, including C++ exceptions? > > >> ?* TBD: Support for Cython-specific constructs like memoryview slices >> ? ?(so that arrays with strides and shape can be passed faster than >> ? ?passing an {{{"O"}}}). > > Is this really Cython specific or would a generic Py_buffer struct work? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sun Apr 15 08:28:42 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2012 08:28:42 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A1F5F.3030602@canterbury.ac.nz> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F8A1F5F.3030602@canterbury.ac.nz> Message-ID: <4F8A6A9A.9000400@behnel.de> Greg Ewing, 15.04.2012 03:07: > Dag Sverre Seljebotn wrote: > >> if (obj has signature "id)i") { > > This is an aside, but is it really necessary to define the > signature syntax in a way that involves unmatched parens? > Some editors (such as the one I like to use) get confused > by this, even when they're inside quotes. > > The answer "get a better editor" would be entirely > appropriate if there were some advantage to this syntax, > over a non-unbalanced one, but I can't see any. It wasn't really a proposed syntax, I guess, more of a way to write down an example. It should be easy to do without any special separator by moving the return type first, for example. Also, it's not clear yet if we will actually use such a character syntax at all. Stefan From robertwb at gmail.com Sun Apr 15 08:32:10 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 14 Apr 2012 23:32:10 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A67CF.4010503@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F8A67CF.4010503@behnel.de> Message-ID: On Sat, Apr 14, 2012 at 11:16 PM, Stefan Behnel wrote: > Robert Bradshaw, 15.04.2012 07:59: >> On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: >>> There may be a lot of promotion/demotion (you likely only want the >>> former) combinations, especially for multiple arguments, so perhaps it >>> makes sense to limit ourselves a bit. For instance for numeric scalar >>> argument types we could limit to long (and the unsigned counterparts), >>> double and double complex. >>> >>> So char, short and int scalars will be >>> promoted to long, float to double and float complex to double complex. >>> Anything bigger, like long long etc will be matched specifically. >>> Promotions and associated demotions if necessary in the callee should >>> be fairly cheap compared to checking all combinations or going through >>> the python layer. >> >> True, though this could be a convention rather than a requirement of >> the spec. Long vs. < long seems natural, but are there any systems >> where (scalar) float still has an advantage over double? >> >> Of course pointers like float* vs double* can't be promoted, so we >> would still need this kind of type declaration. > > Yes, passing data sets as C arrays requires proper knowledge about their > memory layout on both sides. > > OTOH, we are talking about functions that would otherwise be called through > Python, so this could only apply for buffers anyway. So why not require a > Py_buffer* as argument for them? That's certainly our (initial?) usecase, but there's no need to limit the protocol to this. - Robert From d.s.seljebotn at astro.uio.no Sun Apr 15 08:33:24 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 15 Apr 2012 08:33:24 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A67CF.4010503@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F8A67CF.4010503@behnel.de> Message-ID: <4F8A6BB4.2070003@astro.uio.no> On 04/15/2012 08:16 AM, Stefan Behnel wrote: > Robert Bradshaw, 15.04.2012 07:59: >> On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: >>> There may be a lot of promotion/demotion (you likely only want the >>> former) combinations, especially for multiple arguments, so perhaps it >>> makes sense to limit ourselves a bit. For instance for numeric scalar >>> argument types we could limit to long (and the unsigned counterparts), >>> double and double complex. >>> >>> So char, short and int scalars will be >>> promoted to long, float to double and float complex to double complex. >>> Anything bigger, like long long etc will be matched specifically. >>> Promotions and associated demotions if necessary in the callee should >>> be fairly cheap compared to checking all combinations or going through >>> the python layer. >> >> True, though this could be a convention rather than a requirement of >> the spec. Long vs.< long seems natural, but are there any systems >> where (scalar) float still has an advantage over double? >> >> Of course pointers like float* vs double* can't be promoted, so we >> would still need this kind of type declaration. > > Yes, passing data sets as C arrays requires proper knowledge about their > memory layout on both sides. > > OTOH, we are talking about functions that would otherwise be called through > Python, so this could only apply for buffers anyway. So why not require a > Py_buffer* as argument for them? Is the proposal to limit the range of types valid for arguments? I'm a bit wary of throwing this into the mix. We know very little about the callee, they could decide: a) To only export a C function and have an exeption-raising __call__ b) To accept ctypes pointers in their __call__, and C pointers in their native-call c) They can invent their own use for this! I think agreeing on a CEP gets a lot simpler, and the result cleaner, if we focus on "how to describe C functions for the purposes of calling them" (for various usecases), and leave "conventions for recommended signatures" for CEP 1001. In Cython, we could always export a fully-promoted-scalar function first in the list, and always try to call this first, which would work well with Cython<->Cython. BTW, when Travis originally wanted a proposal on the NumPy list he just wanted it for "a C function"; his idea was something like mycapsule = numbaize(f) scipy.integrate(mycapsule) just saying that the fast-callable aspect isn't everything, passing the function pointer around was how this started. Dag From stefan_ml at behnel.de Sun Apr 15 08:37:10 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2012 08:37:10 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> Message-ID: <4F8A6C96.5010106@behnel.de> Robert Bradshaw, 15.04.2012 08:27: > On Sat, Apr 14, 2012 at 2:02 PM, Stefan Behnel wrote: >> While overloaded signatures are great for the callee, they make things much >> more complicated for the caller. It's no longer just one signature that >> either matches or not. Especially when we allow more than one expected >> signature, then each of them has to be compared against all exported >> signatures. >> >> We'll have to see what the runtime impact and the impact on the code >> complexity is, I guess. > > The caller could choose to only check the first signature to avoid > complexity. I think, however, that overloaded signatures are > important, and even checking a dozen is cheaper than going through the > Python call. Fused types naturally lead to overloads as well. Hmm, maybe it wouldn't even be all that inefficient. If both sides sorted their signatures by highest efficiency at compile time, the first hit would cut down the number of signatures that need further comparison to the ones before the match. >>> each described by a function pointer and a signature specification >>> string, such as "id)i" for {{{int f(int, double)}}}. >> >> How do we deal with object argument types? Do we care on the caller side? >> Functions might have alternative signatures that differ in the type of >> their object parameters. Or should we handle this inside of the caller and >> expect that it's something like a fused function with internal dispatch in >> that case? >> >> Personally, I think there is not enough to gain from object parameters that >> we should handle it on the caller side. The callee can dispatch those if >> necessary. > > I don't think we should prohibit the signature from being able to > declare arbitrary Cython types. Whether it proves useful is dependent > on the library, and is the library writer's choice. It leads to very different requirements for the signature encoding/syntax, though. I think we should only go that route when we think we have to. >>> * Requires a registry for interning strings. This must be >>> "handshaked" between the implementors of this CEP (probably by >>> "first to get at {{{sys.modules["_nativecall"}}} sticks it there), >>> as we can't ship a common dependency library for this CEP. >> >> ... which would eventually end up in the stdlib, but could equally well >> come from PyPI for now. I don't see a problem with that. >> >> Using sys.modules (or another global store) instead of an explicit import >> allows for dependency injection, that's good. > > It excludes (or makes it difficult) for non-Python libraries to participate. True, but that can be helped by providing a library (or header file) that provides simple C calls for the required setup. Stefan From stefan_ml at behnel.de Sun Apr 15 08:39:46 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2012 08:39:46 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F8A67CF.4010503@behnel.de> Message-ID: <4F8A6D32.8030306@behnel.de> Robert Bradshaw, 15.04.2012 08:32: > On Sat, Apr 14, 2012 at 11:16 PM, Stefan Behnel wrote: >> Robert Bradshaw, 15.04.2012 07:59: >>> On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: >>>> There may be a lot of promotion/demotion (you likely only want the >>>> former) combinations, especially for multiple arguments, so perhaps it >>>> makes sense to limit ourselves a bit. For instance for numeric scalar >>>> argument types we could limit to long (and the unsigned counterparts), >>>> double and double complex. >>>> >>>> So char, short and int scalars will be >>>> promoted to long, float to double and float complex to double complex. >>>> Anything bigger, like long long etc will be matched specifically. >>>> Promotions and associated demotions if necessary in the callee should >>>> be fairly cheap compared to checking all combinations or going through >>>> the python layer. >>> >>> True, though this could be a convention rather than a requirement of >>> the spec. Long vs. < long seems natural, but are there any systems >>> where (scalar) float still has an advantage over double? >>> >>> Of course pointers like float* vs double* can't be promoted, so we >>> would still need this kind of type declaration. >> >> Yes, passing data sets as C arrays requires proper knowledge about their >> memory layout on both sides. >> >> OTOH, we are talking about functions that would otherwise be called through >> Python, so this could only apply for buffers anyway. So why not require a >> Py_buffer* as argument for them? > > That's certainly our (initial?) usecase, but there's no need to limit > the protocol to this. I think the question here is: is this supposed to be a best effort protocol for bypassing Python calls, or would it be an error in some situations if no matching signature can be found? Stefan From d.s.seljebotn at astro.uio.no Sun Apr 15 08:58:03 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 15 Apr 2012 08:58:03 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F89E5CD.5060301@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> Message-ID: <4F8A717B.6000107@astro.uio.no> Ah, Cython objects. Didn't think of that. More below. On 04/14/2012 11:02 PM, Stefan Behnel wrote: > Hi, > > thanks for writing this up. Comments inline as I read through it. > > Dag Sverre Seljebotn, 14.04.2012 21:08: >> each described by a function pointer and a signature specification >> string, such as "id)i" for {{{int f(int, double)}}}. > > How do we deal with object argument types? Do we care on the caller side? > Functions might have alternative signatures that differ in the type of > their object parameters. Or should we handle this inside of the caller and > expect that it's something like a fused function with internal dispatch in > that case? > > Personally, I think there is not enough to gain from object parameters that > we should handle it on the caller side. The callee can dispatch those if > necessary. > > What about signatures that require an object when we have a C typed value? > > What about signatures that require a C typed argument when we have an > arbitrary object value in our call parameters? > > We should also strip the "self" argument from the parameter list of > methods. That's handled by the attribute lookup before even getting at the > callable. On 04/15/2012 07:59 AM, Robert Bradshaw wrote: > It would certainly be useful to have special syntax for memory views > (after nailing down a well-defined ABI for them) and builtin types. > Being able to declare something as taking a > "sage.rings.integer.Integer" could also prove useful, but could result > in long (and prefix-sharing) signatures, favoring the > runtime-allocated ids. I do think describing Cython objects in this cross-tool CEP would work nicely, this is for standardized ABIs only (we can't do memoryviews either until their ABI is standard). I think I prefer to a) exclude it now, and b) down the line we need another cross-tool ABI to communicate vtables, and then we could put that into this CEP now. I strongly believe we should go with the Go "duck-typing" approach for interfaces, i.e. it is not the declared name that should be compared but the method names and signatures. The only question that needs answering for CEP1000 is: Would this blow up the signature string enough that interning is the only viable option? Some strcmp solutions: a) Hash each vtable descriptor to 160-bits, and assume the hash is unique. Still, a couple of interfaces would blow up the signature string a lot. b) Modify approach B in CEP 1000 to this: If it is longer than 160 bits, take a full cryptographic hash, and just assume there won't be hash collisions (like git does). This still saves for short signature strings, and avoids interning at the cost of doing 160-bit comparisons. Both of these require other ways at getting at the actual string data. But I still like b) above better than interning. Dag From robertwb at gmail.com Sun Apr 15 09:00:34 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sun, 15 Apr 2012 00:00:34 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A6D32.8030306@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F8A67CF.4010503@behnel.de> <4F8A6D32.8030306@behnel.de> Message-ID: On Sat, Apr 14, 2012 at 11:39 PM, Stefan Behnel wrote: > Robert Bradshaw, 15.04.2012 08:32: >> On Sat, Apr 14, 2012 at 11:16 PM, Stefan Behnel wrote: >>> Robert Bradshaw, 15.04.2012 07:59: >>>> On Sat, Apr 14, 2012 at 2:00 PM, mark florisson wrote: >>>>> There may be a lot of promotion/demotion (you likely only want the >>>>> former) combinations, especially for multiple arguments, so perhaps it >>>>> makes sense to limit ourselves a bit. For instance for numeric scalar >>>>> argument types we could limit to long (and the unsigned counterparts), >>>>> double and double complex. >>>>> >>>>> So char, short and int scalars will be >>>>> promoted to long, float to double and float complex to double complex. >>>>> Anything bigger, like long long etc will be matched specifically. >>>>> Promotions and associated demotions if necessary in the callee should >>>>> be fairly cheap compared to checking all combinations or going through >>>>> the python layer. >>>> >>>> True, though this could be a convention rather than a requirement of >>>> the spec. Long vs. < long seems natural, but are there any systems >>>> where (scalar) float still has an advantage over double? >>>> >>>> Of course pointers like float* vs double* can't be promoted, so we >>>> would still need this kind of type declaration. >>> >>> Yes, passing data sets as C arrays requires proper knowledge about their >>> memory layout on both sides. >>> >>> OTOH, we are talking about functions that would otherwise be called through >>> Python, so this could only apply for buffers anyway. So why not require a >>> Py_buffer* as argument for them? >> >> That's certainly our (initial?) usecase, but there's no need to limit >> the protocol to this. > > I think the question here is: is this supposed to be a best effort protocol > for bypassing Python calls, or would it be an error in some situations if > no matching signature can be found? It may be an error in some cases. This isn't just about avoiding Python calls; Dag just barely summed this up quite nicely. - Robert From stefan_ml at behnel.de Sun Apr 15 09:30:08 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2012 09:30:08 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A717B.6000107@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> Message-ID: <4F8A7900.2060200@behnel.de> Dag Sverre Seljebotn, 15.04.2012 08:58: > Ah, Cython objects. Didn't think of that. More below. > > On 04/14/2012 11:02 PM, Stefan Behnel wrote: >> thanks for writing this up. Comments inline as I read through it. >> >> Dag Sverre Seljebotn, 14.04.2012 21:08: >>> each described by a function pointer and a signature specification >>> string, such as "id)i" for {{{int f(int, double)}}}. >> >> How do we deal with object argument types? Do we care on the caller side? >> Functions might have alternative signatures that differ in the type of >> their object parameters. Or should we handle this inside of the caller and >> expect that it's something like a fused function with internal dispatch in >> that case? >> >> Personally, I think there is not enough to gain from object parameters that >> we should handle it on the caller side. The callee can dispatch those if >> necessary. >> >> What about signatures that require an object when we have a C typed value? >> >> What about signatures that require a C typed argument when we have an >> arbitrary object value in our call parameters? >> >> We should also strip the "self" argument from the parameter list of >> methods. That's handled by the attribute lookup before even getting at the >> callable. > > On 04/15/2012 07:59 AM, Robert Bradshaw wrote: >> It would certainly be useful to have special syntax for memory views >> (after nailing down a well-defined ABI for them) and builtin types. >> Being able to declare something as taking a >> "sage.rings.integer.Integer" could also prove useful, but could result >> in long (and prefix-sharing) signatures, favoring the >> runtime-allocated ids. > > I do think describing Cython objects in this cross-tool CEP would work > nicely, this is for standardized ABIs only (we can't do memoryviews either > until their ABI is standard). It just occurred to me that an object's type can safely be represented at runtime as a pointer, i.e. an integer. Even if the type is heap allocated and replaced by another one later, a signature that uses that pointer value in its encoding would only ever match if both sides talk about the same type at call time (because at least one of them would hold a life reference to the type in order to actually use it). That would mean that IDs for signatures with object arguments would have to be generated at setup time, e.g. during module init, after importing the respective type. But I think that's acceptable. > I think I prefer to a) exclude it now, and b) down the line we need another > cross-tool ABI to communicate vtables, and then we could put that into this > CEP now. > > I strongly believe we should go with the Go "duck-typing" approach for > interfaces, i.e. it is not the declared name that should be compared but > the method names and signatures. > > The only question that needs answering for CEP1000 is: Would this blow up > the signature string enough that interning is the only viable option? That sounds excessive to me. Why would you want to test interfaces of arguments as part of the signature matching? Isn't that something that the callee should do when it actually needs a specific interface internally? Is there an important use case for passing objects with different interfaces as the same argument into the same callable? At least, it doesn't sound like such a use case would be performance critical in terms of the call overhead. Stefan From robertwb at gmail.com Sun Apr 15 09:39:02 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sun, 15 Apr 2012 00:39:02 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A717B.6000107@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> Message-ID: On Sat, Apr 14, 2012 at 11:58 PM, Dag Sverre Seljebotn wrote: > Ah, Cython objects. Didn't think of that. More below. > > > On 04/14/2012 11:02 PM, Stefan Behnel wrote: >> >> Hi, >> >> thanks for writing this up. Comments inline as I read through it. >> >> Dag Sverre Seljebotn, 14.04.2012 21:08: >>> >>> each described by a function pointer and a signature specification >>> >>> string, such as "id)i" for {{{int f(int, double)}}}. >> >> >> How do we deal with object argument types? Do we care on the caller side? >> Functions might have alternative signatures that differ in the type of >> their object parameters. Or should we handle this inside of the caller and >> expect that it's something like a fused function with internal dispatch in >> that case? > >> >> Personally, I think there is not enough to gain from object parameters >> that >> we should handle it on the caller side. The callee can dispatch those if >> necessary. >> >> What about signatures that require an object when we have a C typed value? >> >> What about signatures that require a C typed argument when we have an >> arbitrary object value in our call parameters? >> >> We should also strip the "self" argument from the parameter list of >> methods. That's handled by the attribute lookup before even getting at the >> callable. > > On 04/15/2012 07:59 AM, Robert Bradshaw wrote: >> It would certainly be useful to have special syntax for memory views >> (after nailing down a well-defined ABI for them) and builtin types. >> Being able to declare something as taking a >> "sage.rings.integer.Integer" could also prove useful, but could result >> in long (and prefix-sharing) signatures, favoring the >> runtime-allocated ids. > > > I do think describing Cython objects in this cross-tool CEP would work > nicely, this is for standardized ABIs only (we can't do memoryviews either > until their ABI is standard). > > I think I prefer to a) exclude it now, and b) down the line we need another > cross-tool ABI to communicate vtables, and then we could put that into this > CEP now. > > I strongly believe we should go with the Go "duck-typing" approach for > interfaces, i.e. it is not the declared name that should be compared but the > method names and signatures. > > The only question that needs answering for CEP1000 is: Would this blow up > the signature string enough that interning is the only viable option? Exactly. > Some strcmp solutions: > > ?a) Hash each vtable descriptor to 160-bits, and assume the hash is unique. > Still, a couple of interfaces would blow up the signature string a lot. > > ?b) Modify approach B in CEP 1000 to this: If it is longer than 160 bits, > take a full cryptographic hash, and just assume there won't be hash > collisions (like git does). This still saves for short signature strings, > and avoids interning at the cost of doing 160-bit comparisons. > > Both of these require other ways at getting at the actual string data. But I > still like b) above better than interning. Requiring an implementation (or at least access too) a cryptographic hash greatly complicates the spec. (On another note, even a simple hash as a prefix might be useful to prevent a lot of false partial matches, e.g. "sage.rings...") 160 * n bits starts to get large too (and we'd have to twiddle them to insert/avoid a "dash" ever 16 bytes). Here's a crazy thought: we could assume signatures like this are "application specific." We can partition up portions of the signature space to individual projects to compute however they want. Cython can do this via interning for those signatures containing Cython types (which is not an undue burden for anyone attempting to interoperate with Cython types). For (some superset of) the basic C types we agree on a common encoding and inline it. - Robert From robertwb at gmail.com Sun Apr 15 09:43:44 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sun, 15 Apr 2012 00:43:44 -0700 Subject: [Cython] [cython-users] Cython 0.16 RC 1 In-Reply-To: References: Message-ID: On Sat, Apr 14, 2012 at 9:14 PM, Al Danial wrote: > On Thu, Apr 12, 2012 at 7:38 AM, mark florisson > wrote: >> >> Yet another release candidate, this will hopefully be the last before >> the 0.16 release. You can grab it from here: >> http://wiki.cython.org/ReleaseNotes-0.16 > >> If there are any problems, please let us know. > > I'm having the same problem ("Cannot convert 'PyObject *' to Python object", > ref my posts at > http://groups.google.com/group/cython-users/browse_thread/thread/d1a727e9d61f93b6#) > on my code as with the release candidate 0. ?The code builds and runs > cleanly with 0.15.1. ?To duplicate: > > ?svn co http://pynastran.googlecode.com/svn/trunk/pyNastran/op4 > ?cd op4 > ?make clean ; make Including the problematic line would have been helpful. ndarray.base = array_wrapper_RS This is due to the Numpy 1.7 fix. I think we need to pull these commits out for now: https://github.com/cython/cython/pull/112 - Robert From d.s.seljebotn at astro.uio.no Sun Apr 15 10:07:02 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 15 Apr 2012 10:07:02 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A7900.2060200@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A7900.2060200@behnel.de> Message-ID: <4F8A81A6.8090007@astro.uio.no> On 04/15/2012 09:30 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 15.04.2012 08:58: >> Ah, Cython objects. Didn't think of that. More below. >> >> On 04/14/2012 11:02 PM, Stefan Behnel wrote: >>> thanks for writing this up. Comments inline as I read through it. >>> >>> Dag Sverre Seljebotn, 14.04.2012 21:08: >>>> each described by a function pointer and a signature specification >>>> string, such as "id)i" for {{{int f(int, double)}}}. >>> >>> How do we deal with object argument types? Do we care on the caller side? >>> Functions might have alternative signatures that differ in the type of >>> their object parameters. Or should we handle this inside of the caller and >>> expect that it's something like a fused function with internal dispatch in >>> that case? >>> >>> Personally, I think there is not enough to gain from object parameters that >>> we should handle it on the caller side. The callee can dispatch those if >>> necessary. >>> >>> What about signatures that require an object when we have a C typed value? >>> >>> What about signatures that require a C typed argument when we have an >>> arbitrary object value in our call parameters? >>> >>> We should also strip the "self" argument from the parameter list of >>> methods. That's handled by the attribute lookup before even getting at the >>> callable. >> >> On 04/15/2012 07:59 AM, Robert Bradshaw wrote: >>> It would certainly be useful to have special syntax for memory views >>> (after nailing down a well-defined ABI for them) and builtin types. >>> Being able to declare something as taking a >>> "sage.rings.integer.Integer" could also prove useful, but could result >>> in long (and prefix-sharing) signatures, favoring the >>> runtime-allocated ids. >> >> I do think describing Cython objects in this cross-tool CEP would work >> nicely, this is for standardized ABIs only (we can't do memoryviews either >> until their ABI is standard). > > It just occurred to me that an object's type can safely be represented at > runtime as a pointer, i.e. an integer. Even if the type is heap allocated > and replaced by another one later, a signature that uses that pointer value > in its encoding would only ever match if both sides talk about the same > type at call time (because at least one of them would hold a life reference > to the type in order to actually use it). The missing piece here is that both me and Robert are huge fans of Go-style polymorphism. If you haven't read up on that I highly recommend it, basic idea is if you agree on method names and their signatures, you don't have to have access to the same interface declaration (you don't have to call the interface the same thing). Guess we should let this rest for a few days and get back to it with some benchmarks; since all we need to solve in CEP1000 is interned vs. strcmp. I'll try to do that. Dag From d.s.seljebotn at astro.uio.no Sun Apr 15 10:15:53 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 15 Apr 2012 10:15:53 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> Message-ID: <4F8A83B9.904@astro.uio.no> On 04/15/2012 09:39 AM, Robert Bradshaw wrote: > On Sat, Apr 14, 2012 at 11:58 PM, Dag Sverre Seljebotn > wrote: >> Ah, Cython objects. Didn't think of that. More below. >> >> >> On 04/14/2012 11:02 PM, Stefan Behnel wrote: >>> >>> Hi, >>> >>> thanks for writing this up. Comments inline as I read through it. >>> >>> Dag Sverre Seljebotn, 14.04.2012 21:08: >>>> >>>> each described by a function pointer and a signature specification >>>> >>>> string, such as "id)i" for {{{int f(int, double)}}}. >>> >>> >>> How do we deal with object argument types? Do we care on the caller side? >>> Functions might have alternative signatures that differ in the type of >>> their object parameters. Or should we handle this inside of the caller and >>> expect that it's something like a fused function with internal dispatch in >>> that case? >> >>> >>> Personally, I think there is not enough to gain from object parameters >>> that >>> we should handle it on the caller side. The callee can dispatch those if >>> necessary. >>> >>> What about signatures that require an object when we have a C typed value? >>> >>> What about signatures that require a C typed argument when we have an >>> arbitrary object value in our call parameters? >>> >>> We should also strip the "self" argument from the parameter list of >>> methods. That's handled by the attribute lookup before even getting at the >>> callable. >> >> On 04/15/2012 07:59 AM, Robert Bradshaw wrote: >>> It would certainly be useful to have special syntax for memory views >>> (after nailing down a well-defined ABI for them) and builtin types. >>> Being able to declare something as taking a >>> "sage.rings.integer.Integer" could also prove useful, but could result >>> in long (and prefix-sharing) signatures, favoring the >>> runtime-allocated ids. >> >> >> I do think describing Cython objects in this cross-tool CEP would work >> nicely, this is for standardized ABIs only (we can't do memoryviews either >> until their ABI is standard). >> >> I think I prefer to a) exclude it now, and b) down the line we need another >> cross-tool ABI to communicate vtables, and then we could put that into this >> CEP now. >> >> I strongly believe we should go with the Go "duck-typing" approach for >> interfaces, i.e. it is not the declared name that should be compared but the >> method names and signatures. >> >> The only question that needs answering for CEP1000 is: Would this blow up >> the signature string enough that interning is the only viable option? > > Exactly. > >> Some strcmp solutions: >> >> a) Hash each vtable descriptor to 160-bits, and assume the hash is unique. >> Still, a couple of interfaces would blow up the signature string a lot. >> >> b) Modify approach B in CEP 1000 to this: If it is longer than 160 bits, >> take a full cryptographic hash, and just assume there won't be hash >> collisions (like git does). This still saves for short signature strings, >> and avoids interning at the cost of doing 160-bit comparisons. >> >> Both of these require other ways at getting at the actual string data. But I >> still like b) above better than interning. > > Requiring an implementation (or at least access too) a cryptographic > hash greatly complicates the spec. (On another note, even a simple > hash as a prefix might be useful to prevent a lot of false partial > matches, e.g. "sage.rings...") 160 * n bits starts to get large too > (and we'd have to twiddle them to insert/avoid a "dash" ever 16 > bytes). Do you really think it complicates the spec? SHA-1 is pretty standard, and Python ships with hashlib (the hashing part isn't performance critical). I prefer hashing to string-interning as it can still be done compile-time etc. 160 bits isn't worse than the second-to-best strcmp case of a 256-bit function entry. Shortening the hash to 120 bits (truncation) we could have a spec like this: - Short signature: [64 bit encoded signature. 64 bit funcptr] - Long signature: [64 bit hash, 64 bit pointer to full signature, 8 bit guard byte, 56 bits remaining hash, 64 bit funcptr] Anyway: Looks like it's about time to do some benchmarks. I'll try to get around to it next week. Dag From d.s.seljebotn at astro.uio.no Sun Apr 15 10:17:51 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 15 Apr 2012 10:17:51 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A81A6.8090007@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A7900.2060200@behnel.de> <4F8A81A6.8090007@astro.uio.no> Message-ID: <4F8A842F.5040104@astro.uio.no> On 04/15/2012 10:07 AM, Dag Sverre Seljebotn wrote: > On 04/15/2012 09:30 AM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 15.04.2012 08:58: >>> Ah, Cython objects. Didn't think of that. More below. >>> >>> On 04/14/2012 11:02 PM, Stefan Behnel wrote: >>>> thanks for writing this up. Comments inline as I read through it. >>>> >>>> Dag Sverre Seljebotn, 14.04.2012 21:08: >>>>> each described by a function pointer and a signature specification >>>>> string, such as "id)i" for {{{int f(int, double)}}}. >>>> >>>> How do we deal with object argument types? Do we care on the caller >>>> side? >>>> Functions might have alternative signatures that differ in the type of >>>> their object parameters. Or should we handle this inside of the >>>> caller and >>>> expect that it's something like a fused function with internal >>>> dispatch in >>>> that case? >>>> >>>> Personally, I think there is not enough to gain from object >>>> parameters that >>>> we should handle it on the caller side. The callee can dispatch >>>> those if >>>> necessary. >>>> >>>> What about signatures that require an object when we have a C typed >>>> value? >>>> >>>> What about signatures that require a C typed argument when we have an >>>> arbitrary object value in our call parameters? >>>> >>>> We should also strip the "self" argument from the parameter list of >>>> methods. That's handled by the attribute lookup before even getting >>>> at the >>>> callable. >>> >>> On 04/15/2012 07:59 AM, Robert Bradshaw wrote: >>>> It would certainly be useful to have special syntax for memory views >>>> (after nailing down a well-defined ABI for them) and builtin types. >>>> Being able to declare something as taking a >>>> "sage.rings.integer.Integer" could also prove useful, but could result >>>> in long (and prefix-sharing) signatures, favoring the >>>> runtime-allocated ids. >>> >>> I do think describing Cython objects in this cross-tool CEP would work >>> nicely, this is for standardized ABIs only (we can't do memoryviews >>> either >>> until their ABI is standard). >> >> It just occurred to me that an object's type can safely be represented at >> runtime as a pointer, i.e. an integer. Even if the type is heap allocated >> and replaced by another one later, a signature that uses that pointer >> value >> in its encoding would only ever match if both sides talk about the same >> type at call time (because at least one of them would hold a life >> reference >> to the type in order to actually use it). > > The missing piece here is that both me and Robert are huge fans of > Go-style polymorphism. If you haven't read up on that I highly recommend > it, basic idea is if you agree on method names and their signatures, you > don't have to have access to the same interface declaration (you don't > have to call the interface the same thing). > > Guess we should let this rest for a few days and get back to it with > some benchmarks; since all we need to solve in CEP1000 is interned vs. > strcmp. I'll try to do that. Actually, Stefan's idea above is valid for Go-style interfaces too, just replace pointer with an interned string. Which is what Robert proposed too. Dag From njs at pobox.com Sun Apr 15 10:48:37 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 15 Apr 2012 09:48:37 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A83B9.904@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> Message-ID: On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn wrote: > Do you really think it complicates the spec? SHA-1 is pretty standard, and > Python ships with hashlib (the hashing part isn't performance critical). > > I prefer hashing to string-interning as it can still be done compile-time > etc. 160 bits isn't worse than the second-to-best strcmp case of a 256-bit > function entry. If you're *so* set on compile-time calculation, one could also accommodate these within the intern framework pretty easily. Any PyString/PyBytes * will be aligned, which means the low bit will not be set, which means there are at least 2**31 bit-patterns that will never be used by a run-time interned string. So we could write down a lookup table in the spec that assigns arbitrary, well-known numbers to every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have 15 standard types, then you can assign such an id to every 0, 1, 2, 3, 4, 5, and 6 argument function with space left over. And this could all be abstracted away inside the intern() function. The only thing is that if you wanted to look at the characters in the interned string, you'd have to call a disintern() function instead of just following the pointer. I still think all this stuff would be complexity for its own sake, though. > Shortening the hash to 120 bits (truncation) we could have a spec like this: > > ?- Short signature: [64 bit encoded signature. 64 bit funcptr] > ?- Long signature: [64 bit hash, 64 bit pointer to full signature, > ? ? ? ? ? ? ? ? ? ?8 bit guard byte, 56 bits remaining hash, > ? ? ? ? ? ? ? ? ? ?64 bit funcptr] This is a fixed length encoding, so why does it need a guard byte? BTW, the guard byte design in the last version of the CEP looks buggy to me -- there's no guarantee that a valid pointer might not contain the guard byte by accident. A solution would be to move the to-be-continued byte (or bit) to the first word. This would also mean that if you're looking for a one-word signature via switch(), you won't hit signatures which have your signature as a prefix. In the variable-length encoding with the lookup rule you suggested you'd also want a second bit to mark the actual beginning of each structure, so you don't get hits on the middle of structures. > Anyway: Looks like it's about time to do some benchmarks. I'll try to get > around to it next week. Agreed :-). - N From njs at pobox.com Sun Apr 15 11:02:23 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 15 Apr 2012 10:02:23 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A81A6.8090007@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A7900.2060200@behnel.de> <4F8A81A6.8090007@astro.uio.no> Message-ID: On Sun, Apr 15, 2012 at 9:07 AM, Dag Sverre Seljebotn wrote: > On 04/15/2012 09:30 AM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 15.04.2012 08:58: >>> >>> Ah, Cython objects. Didn't think of that. More below. >>> >>> On 04/14/2012 11:02 PM, Stefan Behnel wrote: >>>> >>>> thanks for writing this up. Comments inline as I read through it. >>>> >>>> Dag Sverre Seljebotn, 14.04.2012 21:08: >>>>> >>>>> each described by a function pointer and a signature specification >>>>> string, such as "id)i" for {{{int f(int, double)}}}. >>>> >>>> >>>> How do we deal with object argument types? Do we care on the caller >>>> side? >>>> Functions might have alternative signatures that differ in the type of >>>> their object parameters. Or should we handle this inside of the caller >>>> and >>>> expect that it's something like a fused function with internal dispatch >>>> in >>>> that case? >>>> >>>> Personally, I think there is not enough to gain from object parameters >>>> that >>>> we should handle it on the caller side. The callee can dispatch those if >>>> necessary. >>>> >>>> What about signatures that require an object when we have a C typed >>>> value? >>>> >>>> What about signatures that require a C typed argument when we have an >>>> arbitrary object value in our call parameters? >>>> >>>> We should also strip the "self" argument from the parameter list of >>>> methods. That's handled by the attribute lookup before even getting at >>>> the >>>> callable. >>> >>> >>> On 04/15/2012 07:59 AM, Robert Bradshaw wrote: >>>> >>>> It would certainly be useful to have special syntax for memory views >>>> (after nailing down a well-defined ABI for them) and builtin types. >>>> Being able to declare something as taking a >>>> "sage.rings.integer.Integer" could also prove useful, but could result >>>> in long (and prefix-sharing) signatures, favoring the >>>> runtime-allocated ids. >>> >>> >>> I do think describing Cython objects in this cross-tool CEP would work >>> nicely, this is for standardized ABIs only (we can't do memoryviews >>> either >>> until their ABI is standard). >> >> >> It just occurred to me that an object's type can safely be represented at >> runtime as a pointer, i.e. an integer. Even if the type is heap allocated >> and replaced by another one later, a signature that uses that pointer >> value >> in its encoding would only ever match if both sides talk about the same >> type at call time (because at least one of them would hold a life >> reference >> to the type in order to actually use it). > > > The missing piece here is that both me and Robert are huge fans of Go-style > polymorphism. If you haven't read up on that I highly recommend it, basic > idea is if you agree on method names and their signatures, you don't have to > have access to the same interface declaration (you don't have to call the > interface the same thing). Go style polymorphism is certainly a neat idea, but two points: - You can't do this kind of matching via signature comparison. If I have a type with methods "foo", "bar" and "baz", then that should match the interface {"foo", "bar", "baz"}, but also {"foo", "bar"}, {"foo", "baz"}, {"bar"}, {}, etc. To find the right function for such a type, you need to decode each function signature and check them in some structured way. Unless your plan is to precompute the hash of all 2**n interfaces that each object fulfills. - Adding a whole new type system with polymorphic dispatch is a heck of a thing to do in a spec for boxing and unboxing pointers. Honestly at this level I'm even leery of describing Python objects via their type, as opposed to just "PyObject *". Just let the callee do the type checking if they need to, and if it later turns out that there are actually enough cases where Cython knows the exact type at compile time and is dispatching through a boxed pointer and the callee type checking is significant overhead, then extend the spec then. -- Nathaniel From d.s.seljebotn at astro.uio.no Sun Apr 15 11:08:43 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 15 Apr 2012 11:08:43 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> Message-ID: <2397ece4-b31a-4bdb-b946-0d7d9f2a32ae@email.android.com> Nathaniel Smith wrote: >On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn > wrote: >> Do you really think it complicates the spec? SHA-1 is pretty >standard, and >> Python ships with hashlib (the hashing part isn't performance >critical). >> >> I prefer hashing to string-interning as it can still be done >compile-time >> etc. 160 bits isn't worse than the second-to-best strcmp case of a >256-bit >> function entry. > >If you're *so* set on compile-time calculation, one could also >accommodate these within the intern framework pretty easily. Any >PyString/PyBytes * will be aligned, which means the low bit will not >be set, which means there are at least 2**31 bit-patterns that will >never be used by a run-time interned string. So we could write down a >lookup table in the spec that assigns arbitrary, well-known numbers to >every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have >15 standard types, then you can assign such an id to every 0, 1, 2, 3, >4, 5, and 6 argument function with space left over. > >And this could all be abstracted away inside the intern() function. >The only thing is that if you wanted to look at the characters in the >interned string, you'd have to call a disintern() function instead of >just following the pointer. > >I still think all this stuff would be complexity for its own sake, >though. > >> Shortening the hash to 120 bits (truncation) we could have a spec >like this: >> >> ?- Short signature: [64 bit encoded signature. 64 bit funcptr] >> ?- Long signature: [64 bit hash, 64 bit pointer to full signature, >> ? ? ? ? ? ? ? ? ? ?8 bit guard byte, 56 bits remaining hash, >> ? ? ? ? ? ? ? ? ? ?64 bit funcptr] > >This is a fixed length encoding, so why does it need a guard byte? No, there is two cases, one 128 bit and one 256 bit. > >BTW, the guard byte design in the last version of the CEP looks buggy >to me -- there's no guarantee that a valid pointer might not contain >the guard byte by accident. A solution would be to move the In the CEP text some posts ago? I am pretty sure I made sure that pointers would never be looked at -- you are supposed to scan in 128 bit jumps and will never look at the beginning of a pointer. Read it again and see if you can make a counterexample... That is the reason the above works, and why I split the hash in two segments. >to-be-continued byte (or bit) to the first word. This would also mean >that if you're looking for a one-word signature via switch(), you >won't hit signatures which have your signature as a prefix. In the You need 0-termination to be part of the signature (and if the 0 spills over, you spill over). I should have said that, good catch. Dag >variable-length encoding with the lookup rule you suggested you'd also >want a second bit to mark the actual beginning of each structure, so >you don't get hits on the middle of structures. > >> Anyway: Looks like it's about time to do some benchmarks. I'll try to >get >> around to it next week. > > Agreed :-). > >- N >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From d.s.seljebotn at astro.uio.no Sun Apr 15 11:13:51 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 15 Apr 2012 11:13:51 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A7900.2060200@behnel.de> <4F8A81A6.8090007@astro.uio.no> Message-ID: <06372908-d4aa-4ccc-9f2b-cee04793baf2@email.android.com> Nathaniel Smith wrote: >On Sun, Apr 15, 2012 at 9:07 AM, Dag Sverre Seljebotn > wrote: >> On 04/15/2012 09:30 AM, Stefan Behnel wrote: >>> >>> Dag Sverre Seljebotn, 15.04.2012 08:58: >>>> >>>> Ah, Cython objects. Didn't think of that. More below. >>>> >>>> On 04/14/2012 11:02 PM, Stefan Behnel wrote: >>>>> >>>>> thanks for writing this up. Comments inline as I read through it. >>>>> >>>>> Dag Sverre Seljebotn, 14.04.2012 21:08: >>>>>> >>>>>> each described by a function pointer and a signature >specification >>>>>> string, such as "id)i" for {{{int f(int, double)}}}. >>>>> >>>>> >>>>> How do we deal with object argument types? Do we care on the >caller >>>>> side? >>>>> Functions might have alternative signatures that differ in the >type of >>>>> their object parameters. Or should we handle this inside of the >caller >>>>> and >>>>> expect that it's something like a fused function with internal >dispatch >>>>> in >>>>> that case? >>>>> >>>>> Personally, I think there is not enough to gain from object >parameters >>>>> that >>>>> we should handle it on the caller side. The callee can dispatch >those if >>>>> necessary. >>>>> >>>>> What about signatures that require an object when we have a C >typed >>>>> value? >>>>> >>>>> What about signatures that require a C typed argument when we have >an >>>>> arbitrary object value in our call parameters? >>>>> >>>>> We should also strip the "self" argument from the parameter list >of >>>>> methods. That's handled by the attribute lookup before even >getting at >>>>> the >>>>> callable. >>>> >>>> >>>> On 04/15/2012 07:59 AM, Robert Bradshaw wrote: >>>>> >>>>> It would certainly be useful to have special syntax for memory >views >>>>> (after nailing down a well-defined ABI for them) and builtin >types. >>>>> Being able to declare something as taking a >>>>> "sage.rings.integer.Integer" could also prove useful, but could >result >>>>> in long (and prefix-sharing) signatures, favoring the >>>>> runtime-allocated ids. >>>> >>>> >>>> I do think describing Cython objects in this cross-tool CEP would >work >>>> nicely, this is for standardized ABIs only (we can't do memoryviews >>>> either >>>> until their ABI is standard). >>> >>> >>> It just occurred to me that an object's type can safely be >represented at >>> runtime as a pointer, i.e. an integer. Even if the type is heap >allocated >>> and replaced by another one later, a signature that uses that >pointer >>> value >>> in its encoding would only ever match if both sides talk about the >same >>> type at call time (because at least one of them would hold a life >>> reference >>> to the type in order to actually use it). >> >> >> The missing piece here is that both me and Robert are huge fans of >Go-style >> polymorphism. If you haven't read up on that I highly recommend it, >basic >> idea is if you agree on method names and their signatures, you don't >have to >> have access to the same interface declaration (you don't have to call >the >> interface the same thing). > >Go style polymorphism is certainly a neat idea, but two points: > >- You can't do this kind of matching via signature comparison. If I >have a type with methods "foo", "bar" and "baz", then that should >match the interface {"foo", "bar", "baz"}, but also {"foo", "bar"}, >{"foo", "baz"}, {"bar"}, {}, etc. To find the right function for such >a type, you need to decode each function signature and check them in >some structured way. Unless your plan is to precompute the hash of all >2**n interfaces that each object fulfills. You are of course right this needs a lot more thought. > >- Adding a whole new type system with polymorphic dispatch is a heck >of a thing to do in a spec for boxing and unboxing pointers. Honestly >at this level I'm even leery of describing Python objects via their >type, as opposed to just "PyObject *". Just let the callee do the type >checking if they need to, and if it later turns out that there are >actually enough cases where Cython knows the exact type at compile >time and is dispatching through a boxed pointer and the callee type >checking is significant overhead, then extend the spec then. We are not insane, it's been said several times this goes in a later spec. We're just trying to guess whether future developments would seriously impact intern vs. strcmp -- ie what a likely signature length is in the future. We make CEP1000 a simple spec, but spend some time to try to guess how it could be extended. Dag > >-- Nathaniel >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From markflorisson88 at gmail.com Sun Apr 15 13:30:27 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 15 Apr 2012 12:30:27 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A6A04.4000402@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A6A04.4000402@behnel.de> Message-ID: On 15 April 2012 07:26, Stefan Behnel wrote: > mark florisson, 14.04.2012 23:15: >> On 14 April 2012 22:02, Stefan Behnel wrote: >>> Dag Sverre Seljebotn, 14.04.2012 21:08: >>>> ?* TBD: Support for Cython-specific constructs like memoryview slices >>>> ? ?(so that arrays with strides and shape can be passed faster than >>>> ? ?passing an {{{"O"}}}). >>> >>> Is this really Cython specific or would a generic Py_buffer struct work? >> >> That could work through simple unboxing wrapper functions, but it >> would add some overhead, specifically because it would have to check >> the buffer's object, and if it didn't exist or was not a memoryview >> object, it would have to create one (checking whether something is a >> memoryview object would also be a pain, as each module has a different >> memoryview type). That could still be feasible for interaction with >> Cython functions from non-Cython code. > > Hmm, I don't get it. Isn't the overhead always there when a memory view is > requested in the signature? You'd have to create one for each call and that > seriously hurts the efficiency. Is that a common use case? Why would you > want to do more than passing unboxed buffers? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel So, if you're going to accept Py_buffer *buf (which is useful in itself), then to use memoryviews you have to copy over some shape/strides/suboffsets and the data pointer, which it not a big deal. But you also want a memoryview object associated with the memoryview slice, that keeps things around like the format string, function pointers to convert the dtype to and from Python objects and a reference (acquisition) count or a lock in case atomics are not supported by the compiler (or Cython doesn't know about the compiler). So if buf->obj is not a memoryview object, it will have to create one in the callee, and the caller will have to convert a slice to a new Py_buffer struct. Arguably, the memoryview implementation is not optimal, it should have a memoryview struct with that data, making it somewhat less expensive. Finally, what are the semantics for Py_buffer? Will the callee own the buffer, or will it borrow it? If they will borrow, then the compiler will have to figure out whether it will need to own it (or be slower and always own it), and acquire the buffer through buf->obj. At least it won't have to validate the buffer, which is the most expensive part. I think in many cases you want to borrow though, but if you want to always own, the caller could do something more efficient if releasebuffer is not implemented, like simply incref buf->obj and pass in a pointer to a copy of the Py_buffer. I think borrowing is probably the easiest and most sane way though. From stefan_ml at behnel.de Sun Apr 15 13:40:31 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Apr 2012 13:40:31 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A6A04.4000402@behnel.de> Message-ID: <4F8AB3AF.3020807@behnel.de> mark florisson, 15.04.2012 13:30: > Finally, what are the semantics for Py_buffer? Will the callee own the > buffer, or will it borrow it? If they will borrow, then the compiler > will have to figure out whether it will need to own it (or be slower > and always own it), and acquire the buffer through buf->obj. At least > it won't have to validate the buffer, which is the most expensive > part. > I think in many cases you want to borrow though, but if you want to > always own, the caller could do something more efficient if > releasebuffer is not implemented, like simply incref buf->obj and pass > in a pointer to a copy of the Py_buffer. I think borrowing is probably > the easiest and most sane way though. I think that's easy. If you request and unpack a buffer yourself, you own it. If you receive an unpacked buffer from someone else as a call argument, you borrow it, and you know that your caller (or the caller of your caller, etc.) owns it and keeps it alive until you return. If you receive it as return value of a function call, it's less clear, but my intuition tells me that you'd normally either receive an owned Python object or a borrowed unpacked buffer. In the case at hand, you'd always receive a borrowed buffer from the caller as argument. Stefan From markflorisson88 at gmail.com Sun Apr 15 13:49:31 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 15 Apr 2012 12:49:31 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8AB3AF.3020807@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A6A04.4000402@behnel.de> <4F8AB3AF.3020807@behnel.de> Message-ID: On 15 April 2012 12:40, Stefan Behnel wrote: > mark florisson, 15.04.2012 13:30: >> Finally, what are the semantics for Py_buffer? Will the callee own the >> buffer, or will it borrow it? If they will borrow, then the compiler >> will have to figure out whether it will need to own it (or be slower >> and always own it), and acquire the buffer through buf->obj. At least >> it won't have to validate the buffer, which is the most expensive >> part. >> I think in many cases you want to borrow though, but if you want to >> always own, the caller could do something more efficient if >> releasebuffer is not implemented, like simply incref buf->obj and pass >> in a pointer to a copy of the Py_buffer. I think borrowing is probably >> the easiest and most sane way though. > > I think that's easy. If you request and unpack a buffer yourself, you own > it. If you receive an unpacked buffer from someone else as a call argument, > you borrow it, and you know that your caller (or the caller of your caller, > etc.) owns it and keeps it alive until you return. If you receive it as > return value of a function call, it's less clear, but my intuition tells me > that you'd normally either receive an owned Python object or a borrowed > unpacked buffer. > > In the case at hand, you'd always receive a borrowed buffer from the caller > as argument. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel That makes sense, but it means a lot of overhead for memoryview slices, which I think justifies syntax for custom types in general. From markflorisson88 at gmail.com Sun Apr 15 20:59:29 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 15 Apr 2012 19:59:29 +0100 Subject: [Cython] Cython 0.16 RC 2 Message-ID: Hopefully a final release candidate for the 0.16 release can be found here: http://wiki.cython.org/ReleaseNotes-0.16 . This corresponds to the 'release' branch of the cython repository on github. From greg.ewing at canterbury.ac.nz Mon Apr 16 00:01:14 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 16 Apr 2012 10:01:14 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8A6A9A.9000400@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F8A1F5F.3030602@canterbury.ac.nz> <4F8A6A9A.9000400@behnel.de> Message-ID: <4F8B452A.6050908@canterbury.ac.nz> Stefan Behnel wrote: > It wasn't really a proposed syntax, I guess, more of a way to write down an > example. That's okay, although you might want to mention in the PEP that the actual syntax is yet to be determined. Being a PEP, anything it says tends to come across as being a specification otherwise. -- Greg From greg.ewing at canterbury.ac.nz Sun Apr 15 23:56:38 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 16 Apr 2012 09:56:38 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F875867.3070401@astro.uio.no> <4F87E937.9050705@astro.uio.no> <4F881050.7000302@behnel.de> <4F881531.4090406@astro.uio.no> <1756339e-3afe-4e85-9f34-18a81a52ac8d@email.android.com> <66d71577-ea01-4a83-a89d-f3ff3e28e8a9@email.android.com> <4F88C3DA.6000009@canterbury.ac.nz> <62a760fa-c160-47ec-b23e-52df5e77728a@email.android.com> <4F893BCC.7040706@behnel.de> <4F8A164B.1050507@canterbury.ac.nz> Message-ID: <4F8B4416.7020008@canterbury.ac.nz> Robert Bradshaw wrote: > Brevity, especially if the signature is inlined. (Encoding could take > care of this by, e.g. ignoring the redundant opening, or we could just > write di=d.) Yes, I was thinking in terms of replacing the paren with some other character, rather than inserting more parens. -- Greg From stefan_ml at behnel.de Mon Apr 16 10:05:37 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 16 Apr 2012 10:05:37 +0200 Subject: [Cython] Py3k builds broken due to CPython changes Message-ID: <4F8BD2D1.9020104@behnel.de> Hi, just a quick heads-up that the Py3k builds are completely broken since this week-end because the internal CPython import mechanism changed. The most visible effect for us is that Py2-style imports no longer work in Py3.3, which, I guess, impacts the majority of existing Py2-style Cython code. Only absolute and relative imports continue to work, import level "-1" raises a ValueError. I asked on python-dev to see what they think about this regression. http://thread.gmane.org/gmane.comp.python.devel/131858/focus=131909 If they end up considering it the intended behaviour, we may have to duplicate the original behaviour in one way or another to keep this working. Stefan From stefan_ml at behnel.de Mon Apr 16 11:28:51 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 16 Apr 2012 11:28:51 +0200 Subject: [Cython] Py3k builds broken due to CPython changes In-Reply-To: <4F8BD2D1.9020104@behnel.de> References: <4F8BD2D1.9020104@behnel.de> Message-ID: <4F8BE653.2070809@behnel.de> Stefan Behnel, 16.04.2012 10:05: > just a quick heads-up that the Py3k builds are completely broken since this > week-end because the internal CPython import mechanism changed. The most > visible effect for us is that Py2-style imports no longer work in Py3.3, > which, I guess, impacts the majority of existing Py2-style Cython code. > Only absolute and relative imports continue to work, import level "-1" > raises a ValueError. > > I asked on python-dev to see what they think about this regression. > > http://thread.gmane.org/gmane.comp.python.devel/131858/focus=131909 > > If they end up considering it the intended behaviour, we may have to > duplicate the original behaviour in one way or another to keep this working. Apparently, the easiest work-around is to execute the whole import twice in Py3.3, first for a relative import, then for an absolute one. Likely somewhat slower than before, but at least it keeps working. https://github.com/cython/cython/commit/cb40a3e6264b794681492a01c223cc23d40d8350 Stefan From markflorisson88 at gmail.com Mon Apr 16 19:56:08 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 16 Apr 2012 18:56:08 +0100 Subject: [Cython] [cython-users] buffer access to ndarrays as cdef class attributes In-Reply-To: <6988002.374.1334590847872.JavaMail.geo-discussion-forums@vbuc18> References: <16283966.1925.1334136907099.JavaMail.geo-discussion-forums@ynbj3> <4F856946.4000002@behnel.de> <31698140.99.1334563801065.JavaMail.geo-discussion-forums@vbdn7> <6988002.374.1334590847872.JavaMail.geo-discussion-forums@vbuc18> Message-ID: On 16 April 2012 16:40, becker.nils wrote: > >> > 1. memoryview assignments inside inner loops are not a good idea. >> > although >> > no data is being copied, making a new slice involves quite some error >> > checking overhead >> >> Do you mean assignment to the data or to the slice itself? > > i meant something like > > cdef float[:] new_view = existing_numpy_array > > inside a loop. i guess the obvious solution is to do this outside of the > loop. Definitely, that is pretty slow :) >> >> > 2. memoryviews are more general than the ndarray buffer access but they >> > are >> > not a drop-in replacement, because one cannot return a memoryview object >> > as >> > a full ndarray without explicit conversion with np.asarray. so to have >> > both >> > fast access from the C side and a handle on the array on the python >> > side, >> > requires two local variables: one an ndarray and one a memoryview into >> > it. >> > (previously the ndarray with buffer access did both of these things) >> >> Yes, that's correct. That's because you can now slice the memoryviews, >> which does not invoke anything on the original buffer object, so when >> converting to an object it may be out of sync with the original, which >> means you'd have to convert it explicitly. > > that makes sense. > >> >> We could allow the user to register a conversion function to do this >> automatically - only invoked if the slice was re-sliced - (and cache >> the results), but it would mean that conversion back from the object >> to a memoryview slice would have to validate the buffer again, which >> would be more expensive. Maybe that could be mitigated by >> special-casing numpy arrays and some other tricks. > > so for the time being, it seems that the most efficient way of handling this > is > that cdef functions or any fast C-side manipulation uses only memoryviews, > and allocation and communication with python then uses the underlying > ndarrays. > Yes, it is best to minimize conversion to and from numpy (which is quite expensive either way). >> > 3. one slow-down that i was not able to avoid is this: >> > >> > ?143: ? ? ? ? for i in range(x.shape[0]): >> > >> > ?144: ? ? ? ? ? ? self.out[i] *= dt * self.M[i] >> > >> > >> > ?where all of x, self.out and self.M are memoryviews. in the for-loop, >> > cython checks for un-initialized memoryviews like so? (output from >> > cython >> > -a) >> > >> > ? ? if (unlikely(!__pyx_v_self->M.memview)) >> > {PyErr_SetString(PyExc_AttributeError,"Memoryview is not >> > initialized");{__pyx_filename = __pyx_f[0]; __pyx_lineno = 144; >> > __pyx_clineno = __LINE__; goto __pyx_L1_error;}} >> > ? ? __pyx_t_4 = __pyx_v_i; >> > >> > >> > is there a way to tell cython that these views are in fact initialized >> > (that's done in __init__ of the class) ? >> >> You can avoid this by annotating your function with >> @cython.initializedcheck(False), or by using a module-global directive >> at the top of your file '#cython: initializedcheck=False' (the same >> goes for boundscheck and wraparound). > > ah! helpful! i did not see this on the annotation wiki page. > (there is no official documentation on annotations it seems) > Indeed, this should be documented. Documentation for other directives can be found here: http://docs.cython.org/src/reference/compilation.html?highlight=boundscheck#compiler-directives . I'll add documentation for this directive too, currently it only works for memoryviews, but maybe it should also work for objects? >> >> These things should be pulled out of loops whenever possible, just >> like bounds checking and wrap-around code (if possible). Currently >> that is very hard, as even a decref of an object could mean invocation >> of a destructor which rebinds or deletes the memoryview (highly >> unlikely but possible). To enable optimizations for 99.9% of the use >> cases, I think we should be restrictive and allow only explicit >> rebinding of memoryview slices in loops themselves, but not in called >> functions or destructors. In other words, if a memoryview slice is not >> rebound directly in the loop, the compiler is free to create a >> temporary outside of the loop and use that everywhere in the loop. > > > or a memoryview context manager which makes the memoryview non-rebindable? > "with old_array as cdef float_t[:] new_view: > ? ? loop ... > " > (just fantasizing, probably nonsense). anyway, thanks! > nils We could support final fields and variables, but it would be kind of a pain to declare that everywhere. From markflorisson88 at gmail.com Mon Apr 16 20:04:24 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 16 Apr 2012 19:04:24 +0100 Subject: [Cython] [cython-users] buffer access to ndarrays as cdef class attributes In-Reply-To: References: <16283966.1925.1334136907099.JavaMail.geo-discussion-forums@ynbj3> <4F856946.4000002@behnel.de> <31698140.99.1334563801065.JavaMail.geo-discussion-forums@vbdn7> <6988002.374.1334590847872.JavaMail.geo-discussion-forums@vbuc18> Message-ID: On 16 April 2012 18:56, mark florisson wrote: > On 16 April 2012 16:40, becker.nils wrote: >> >>> > 1. memoryview assignments inside inner loops are not a good idea. >>> > although >>> > no data is being copied, making a new slice involves quite some error >>> > checking overhead >>> >>> Do you mean assignment to the data or to the slice itself? >> >> i meant something like >> >> cdef float[:] new_view = existing_numpy_array >> >> inside a loop. i guess the obvious solution is to do this outside of the >> loop. > > Definitely, that is pretty slow :) > >>> >>> > 2. memoryviews are more general than the ndarray buffer access but they >>> > are >>> > not a drop-in replacement, because one cannot return a memoryview object >>> > as >>> > a full ndarray without explicit conversion with np.asarray. so to have >>> > both >>> > fast access from the C side and a handle on the array on the python >>> > side, >>> > requires two local variables: one an ndarray and one a memoryview into >>> > it. >>> > (previously the ndarray with buffer access did both of these things) >>> >>> Yes, that's correct. That's because you can now slice the memoryviews, >>> which does not invoke anything on the original buffer object, so when >>> converting to an object it may be out of sync with the original, which >>> means you'd have to convert it explicitly. >> >> that makes sense. >> >>> >>> We could allow the user to register a conversion function to do this >>> automatically - only invoked if the slice was re-sliced - (and cache >>> the results), but it would mean that conversion back from the object >>> to a memoryview slice would have to validate the buffer again, which >>> would be more expensive. Maybe that could be mitigated by >>> special-casing numpy arrays and some other tricks. >> >> so for the time being, it seems that the most efficient way of handling this >> is >> that cdef functions or any fast C-side manipulation uses only memoryviews, >> and allocation and communication with python then uses the underlying >> ndarrays. >> > > Yes, it is best to minimize conversion to and from numpy (which is > quite expensive either way). > >>> > 3. one slow-down that i was not able to avoid is this: >>> > >>> > ?143: ? ? ? ? for i in range(x.shape[0]): >>> > >>> > ?144: ? ? ? ? ? ? self.out[i] *= dt * self.M[i] >>> > >>> > >>> > ?where all of x, self.out and self.M are memoryviews. in the for-loop, >>> > cython checks for un-initialized memoryviews like so? (output from >>> > cython >>> > -a) >>> > >>> > ? ? if (unlikely(!__pyx_v_self->M.memview)) >>> > {PyErr_SetString(PyExc_AttributeError,"Memoryview is not >>> > initialized");{__pyx_filename = __pyx_f[0]; __pyx_lineno = 144; >>> > __pyx_clineno = __LINE__; goto __pyx_L1_error;}} >>> > ? ? __pyx_t_4 = __pyx_v_i; >>> > >>> > >>> > is there a way to tell cython that these views are in fact initialized >>> > (that's done in __init__ of the class) ? >>> >>> You can avoid this by annotating your function with >>> @cython.initializedcheck(False), or by using a module-global directive >>> at the top of your file '#cython: initializedcheck=False' (the same >>> goes for boundscheck and wraparound). >> >> ah! helpful! i did not see this on the annotation wiki page. >> (there is no official documentation on annotations it seems) >> > > Indeed, this should be documented. Documentation for other directives > can be found here: > http://docs.cython.org/src/reference/compilation.html?highlight=boundscheck#compiler-directives > . I'll add documentation for this directive too, currently it only > works for memoryviews, but maybe it should also work for objects? > >>> >>> These things should be pulled out of loops whenever possible, just >>> like bounds checking and wrap-around code (if possible). Currently >>> that is very hard, as even a decref of an object could mean invocation >>> of a destructor which rebinds or deletes the memoryview (highly >>> unlikely but possible). To enable optimizations for 99.9% of the use >>> cases, I think we should be restrictive and allow only explicit >>> rebinding of memoryview slices in loops themselves, but not in called >>> functions or destructors. In other words, if a memoryview slice is not >>> rebound directly in the loop, the compiler is free to create a >>> temporary outside of the loop and use that everywhere in the loop. >> >> >> or a memoryview context manager which makes the memoryview non-rebindable? >> "with old_array as cdef float_t[:] new_view: >> ? ? loop ... >> " >> (just fantasizing, probably nonsense). anyway, thanks! >> nils > > We could support final fields and variables, but it would be kind of a > pain to declare that everywhere. Maybe final would not be too bad, as you'd only need it for globals (who uses them anyway) and attributes, but not for local variables. From d.s.seljebotn at astro.uio.no Tue Apr 17 13:54:32 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 13:54:32 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> Message-ID: <4F8D59F8.4080809@astro.uio.no> On 04/15/2012 10:48 AM, Nathaniel Smith wrote: > On Sun, Apr 15, 2012 at 9:15 AM, Dag Sverre Seljebotn > wrote: >> Do you really think it complicates the spec? SHA-1 is pretty standard, and >> Python ships with hashlib (the hashing part isn't performance critical). >> >> I prefer hashing to string-interning as it can still be done compile-time >> etc. 160 bits isn't worse than the second-to-best strcmp case of a 256-bit >> function entry. > > If you're *so* set on compile-time calculation, one could also > accommodate these within the intern framework pretty easily. Any > PyString/PyBytes * will be aligned, which means the low bit will not > be set, which means there are at least 2**31 bit-patterns that will > never be used by a run-time interned string. So we could write down a > lookup table in the spec that assigns arbitrary, well-known numbers to > every common signature. "dd->d" is 1, "ii->i" is 2, etc. If you have > 15 standard types, then you can assign such an id to every 0, 1, 2, 3, > 4, 5, and 6 argument function with space left over. > > And this could all be abstracted away inside the intern() function. > The only thing is that if you wanted to look at the characters in the > interned string, you'd have to call a disintern() function instead of > just following the pointer. I should note that to me this is the worst of all worlds. The main point of avoiding interning was to avoid interning! There's just an intrinsic beauty to a stateless data-driven spec. IMO, anything requiring run-time state has the be explicitly justified. I don't believe doing interning right without a common dependency .so is all that easy. I'd love to see a concrete spec for it (e.g., if you use Python bytes in a dict in sys.modules['_nativecall'], the bytes objects could be deallocated before callables containing the interned string -- unless you Py_INCREF once too many, but then valgrind complains -- and so on). And as Robert says, if you don't have an interning machinery the spec becomes mostly programming language/platform neutral (you could use it in Perl or Ruby or Java too, if you drop "O" and so on.) However, the prospect of rather long signatures for OO code is driving me to perhaps prefer the interned string approach -- i.e. the justification would be that signatures can be long (and that cryptographic hashes are too complex). Benchmarks soon to follow... Dag > > I still think all this stuff would be complexity for its own sake, though. > >> Shortening the hash to 120 bits (truncation) we could have a spec like this: >> >> - Short signature: [64 bit encoded signature. 64 bit funcptr] >> - Long signature: [64 bit hash, 64 bit pointer to full signature, >> 8 bit guard byte, 56 bits remaining hash, >> 64 bit funcptr] > > This is a fixed length encoding, so why does it need a guard byte? > > BTW, the guard byte design in the last version of the CEP looks buggy > to me -- there's no guarantee that a valid pointer might not contain > the guard byte by accident. A solution would be to move the > to-be-continued byte (or bit) to the first word. This would also mean > that if you're looking for a one-word signature via switch(), you > won't hit signatures which have your signature as a prefix. In the > variable-length encoding with the lookup rule you suggested you'd also > want a second bit to mark the actual beginning of each structure, so > you don't get hits on the middle of structures. > >> Anyway: Looks like it's about time to do some benchmarks. I'll try to get >> around to it next week. > > Agreed :-). > > - N > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Tue Apr 17 14:24:50 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 14:24:50 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F87530F.7050000@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> Message-ID: <4F8D6112.1000906@astro.uio.no> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: > Travis Oliphant recently raised the issue on the NumPy list of what > mechanisms to use to box native functions produced by his Numba so > that SciPy functions can call it, e.g. (I'm making the numba part > up): > > @numba # Compiles function using LLVM def f(x): return 3 * x > > print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! > > Obviously, we want something standard, so that Cython functions can > also be called in a fast way. OK, here's the benchmark code I've written: https://github.com/dagss/cep1000 Assumptions etc.: - (Very) warm cache case is tested - I compile and link libmycallable.so, libmycaller.so and ./bench; with -fPIC, to emulate the Python environment - I use mostly pure C but use PyTypeObject in order to get the offsets to tp_flags etc right (I emulate the checking that would happen on a PyObject* according to CEP1000). - The test function is "double f(double x) { return x * x; } - The benchmark is run in a loop J=1000000 times (and time divided by J). This is repeated K=10000 times and the minimum walltime of the K run is used. This gave very stable readings on my system. Fixing loop iterations: In the initial results I just scanned the overload list until NULL-termination. It seemed to me that the code generated for this scanning was the most important factor. Therefore I fixed the number of overloads as a known compile-time macro N *in the caller*. This is somewhat optimistic; however I didn't want to play with figuring out loop unrolling etc. at the same time, and hardcoding the length of the overload list sort of took that part out of the equation. Table explanation: - N: Number of overloads in list. For N=10, there's 9 non-matching overloads in the list before the matching 10 (but caller doesn't know this). For N=1, the caller knows this and optimize for a hit in the first entry. - MISMATCHES: If set, the caller tries 4 non-matching signatures before hitting the final one. If not set, only the correct signature is tried. - LIKELY: If set, a GCC likely() macro is used to expect that the signature matches. RESULTS: Direct call (and execution of!) the function in benchmark loop took 4.8 ns. An indirect dispatch through a function pointer of known type took 5.4 ns Notation below is (intern key), in ns N=1: MISMATCHES=False: LIKELY=True: 6.44 6.44 LIKELY=False: 7.52 8.06 MISMATCHES=True: 8.59 8.59 N=10: MISMATCHES=False: 17.19 19.20 MISMATCHES=True: 36.52 37.59 To be clear, "intern" is an interned "char*" (comparison with a 64 bits global variable), while key is comparison of a size_t (comparison of a 64-bit immediate in the instruction stream). PRELIMINARY BENCHMARK CONCLUSION: Intern appears to be as fast or faster than strcmp. I don't know why (is the pointer offset to the global variable stored in less than 64 bits in the x86-64 instruction stream? What gdb (or other) commands would I use to figure that out?) What happens in the assembly is: movq (%rdi,%rax), %rax movq interned_dd(%rip), %rdx cmpq %rdx, (%rax) jne .L3 vs. movabsq $20017697242043, %rdx movq (%rdi,%rax), %rax cmpq %rdx, (%rax) jne .L6 TODO: The caller tried, for each entry in the overload list, to match all the signatures. Changing the order of these loops should also be tried. Dag From d.s.seljebotn at astro.uio.no Tue Apr 17 14:36:45 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 14:36:45 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D6112.1000906@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> Message-ID: <4F8D63DD.9020600@astro.uio.no> On 04/17/2012 02:24 PM, Dag Sverre Seljebotn wrote: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so >> that SciPy functions can call it, e.g. (I'm making the numba part >> up): >> >> @numba # Compiles function using LLVM def f(x): return 3 * x >> >> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >> >> Obviously, we want something standard, so that Cython functions can >> also be called in a fast way. > > OK, here's the benchmark code I've written: > > https://github.com/dagss/cep1000 > > Assumptions etc.: > > - (Very) warm cache case is tested > > - I compile and link libmycallable.so, libmycaller.so and ./bench; with > -fPIC, to emulate the Python environment > > - I use mostly pure C but use PyTypeObject in order to get the offsets > to tp_flags etc right (I emulate the checking that would happen on a > PyObject* according to CEP1000). > > - The test function is "double f(double x) { return x * x; } > > - The benchmark is run in a loop J=1000000 times (and time divided by > J). This is repeated K=10000 times and the minimum walltime of the K run > is used. This gave very stable readings on my system. > > Fixing loop iterations: > > In the initial results I just scanned the overload list until > NULL-termination. It seemed to me that the code generated for this > scanning was the most important factor. > > Therefore I fixed the number of overloads as a known compile-time macro > N *in the caller*. This is somewhat optimistic; however I didn't want to > play with figuring out loop unrolling etc. at the same time, and > hardcoding the length of the overload list sort of took that part out of > the equation. > > > Table explanation: > > - N: Number of overloads in list. For N=10, there's 9 non-matching > overloads in the list before the matching 10 (but caller doesn't know > this). For N=1, the caller knows this and optimize for a hit in the > first entry. > > - MISMATCHES: If set, the caller tries 4 non-matching signatures before > hitting the final one. If not set, only the correct signature is tried. > > - LIKELY: If set, a GCC likely() macro is used to expect that the > signature matches. > > > RESULTS: > > Direct call (and execution of!) the function in benchmark loop took 4.8 ns. > > An indirect dispatch through a function pointer of known type took 5.4 ns > > Notation below is (intern key), in ns > > N=1: > MISMATCHES=False: > LIKELY=True: 6.44 6.44 > LIKELY=False: 7.52 8.06 > MISMATCHES=True: 8.59 8.59 > N=10: > MISMATCHES=False: 17.19 19.20 > MISMATCHES=True: 36.52 37.59 > > To be clear, "intern" is an interned "char*" (comparison with a 64 bits > global variable), while key is comparison of a size_t (comparison of a > 64-bit immediate in the instruction stream). > > PRELIMINARY BENCHMARK CONCLUSION: > > Intern appears to be as fast or faster than strcmp. > > I don't know why (is the pointer offset to the global variable stored in > less than 64 bits in the x86-64 instruction stream? What gdb (or other) > commands would I use to figure that out?) > > What happens in the assembly is: > > movq (%rdi,%rax), %rax > movq interned_dd(%rip), %rdx > cmpq %rdx, (%rax) > jne .L3 > > vs. > > movabsq $20017697242043, %rdx > movq (%rdi,%rax), %rax > cmpq %rdx, (%rax) > jne .L6 > > > TODO: > > The caller tried, for each entry in the overload list, to match all the > signatures. Changing the order of these loops should also be tried. One more data-point: When comparing a 96-bit key directly, the fastest benchmark for keys (N=1, MISMATCHES=False, LIKELY=True) grows from 6.44 to 6.98 nsec. (It should perform relatively better when N>1 unless prefixes match) 448-bit key is 8.59 ns. Dag From njs at pobox.com Tue Apr 17 14:40:35 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Apr 2012 13:40:35 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D59F8.4080809@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> Message-ID: On Tue, Apr 17, 2012 at 12:54 PM, Dag Sverre Seljebotn wrote: > I don't believe doing interning right without a common dependency .so is all > that easy. I'd love to see a concrete spec for it (e.g., if you use Python > bytes in a dict in sys.modules['_nativecall'], the bytes objects could be > deallocated before callables containing the interned string -- unless you > Py_INCREF once too many, but then valgrind complains -- and so on). I don't understand. A C-callable object would hold a reference to each interned string it contains, just like any other Python data structure does. When the C-callable object is deallocated, then it drops this reference, and if it there are no other live C-callable objects that use the same signature, and there are no callers that are statically caching the signature, then it will drop out of the intern table. Otherwise it doesn't. Basically you do the memory management like you would for any other bytes object, and everything just works. -- Nathaniel From d.s.seljebotn at astro.uio.no Tue Apr 17 14:40:39 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 14:40:39 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D63DD.9020600@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8D63DD.9020600@astro.uio.no> Message-ID: <4F8D64C7.6050606@astro.uio.no> On 04/17/2012 02:36 PM, Dag Sverre Seljebotn wrote: > On 04/17/2012 02:24 PM, Dag Sverre Seljebotn wrote: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so >>> that SciPy functions can call it, e.g. (I'm making the numba part >>> up): >>> >>> @numba # Compiles function using LLVM def f(x): return 3 * x >>> >>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>> >>> Obviously, we want something standard, so that Cython functions can >>> also be called in a fast way. >> >> OK, here's the benchmark code I've written: >> >> https://github.com/dagss/cep1000 >> >> Assumptions etc.: >> >> - (Very) warm cache case is tested >> >> - I compile and link libmycallable.so, libmycaller.so and ./bench; with >> -fPIC, to emulate the Python environment >> >> - I use mostly pure C but use PyTypeObject in order to get the offsets >> to tp_flags etc right (I emulate the checking that would happen on a >> PyObject* according to CEP1000). >> >> - The test function is "double f(double x) { return x * x; } >> >> - The benchmark is run in a loop J=1000000 times (and time divided by >> J). This is repeated K=10000 times and the minimum walltime of the K run >> is used. This gave very stable readings on my system. >> >> Fixing loop iterations: >> >> In the initial results I just scanned the overload list until >> NULL-termination. It seemed to me that the code generated for this >> scanning was the most important factor. >> >> Therefore I fixed the number of overloads as a known compile-time macro >> N *in the caller*. This is somewhat optimistic; however I didn't want to >> play with figuring out loop unrolling etc. at the same time, and >> hardcoding the length of the overload list sort of took that part out of >> the equation. >> >> >> Table explanation: >> >> - N: Number of overloads in list. For N=10, there's 9 non-matching >> overloads in the list before the matching 10 (but caller doesn't know >> this). For N=1, the caller knows this and optimize for a hit in the >> first entry. >> >> - MISMATCHES: If set, the caller tries 4 non-matching signatures before >> hitting the final one. If not set, only the correct signature is tried. >> >> - LIKELY: If set, a GCC likely() macro is used to expect that the >> signature matches. >> >> >> RESULTS: >> >> Direct call (and execution of!) the function in benchmark loop took >> 4.8 ns. >> >> An indirect dispatch through a function pointer of known type took 5.4 ns >> >> Notation below is (intern key), in ns >> >> N=1: >> MISMATCHES=False: >> LIKELY=True: 6.44 6.44 >> LIKELY=False: 7.52 8.06 >> MISMATCHES=True: 8.59 8.59 >> N=10: >> MISMATCHES=False: 17.19 19.20 >> MISMATCHES=True: 36.52 37.59 >> >> To be clear, "intern" is an interned "char*" (comparison with a 64 bits >> global variable), while key is comparison of a size_t (comparison of a >> 64-bit immediate in the instruction stream). >> >> PRELIMINARY BENCHMARK CONCLUSION: >> >> Intern appears to be as fast or faster than strcmp. >> >> I don't know why (is the pointer offset to the global variable stored in >> less than 64 bits in the x86-64 instruction stream? What gdb (or other) >> commands would I use to figure that out?) >> >> What happens in the assembly is: >> >> movq (%rdi,%rax), %rax >> movq interned_dd(%rip), %rdx >> cmpq %rdx, (%rax) >> jne .L3 >> >> vs. >> >> movabsq $20017697242043, %rdx >> movq (%rdi,%rax), %rax >> cmpq %rdx, (%rax) >> jne .L6 OK, silly to quote the assembly for the case where they perform the same (LIKELY=True). Changing LIKELY=False, the jne changes to je both places, and strcmp performance drops relatively to intern. Dag >> >> >> TODO: >> >> The caller tried, for each entry in the overload list, to match all the >> signatures. Changing the order of these loops should also be tried. > > One more data-point: > > When comparing a 96-bit key directly, the fastest benchmark for keys > (N=1, MISMATCHES=False, LIKELY=True) grows from 6.44 to 6.98 nsec. (It > should perform relatively better when N>1 unless prefixes match) > > 448-bit key is 8.59 ns. > > Dag > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Tue Apr 17 14:53:06 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 14:53:06 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> Message-ID: <4F8D67B2.4060706@astro.uio.no> On 04/17/2012 02:40 PM, Nathaniel Smith wrote: > On Tue, Apr 17, 2012 at 12:54 PM, Dag Sverre Seljebotn > wrote: >> I don't believe doing interning right without a common dependency .so is all >> that easy. I'd love to see a concrete spec for it (e.g., if you use Python >> bytes in a dict in sys.modules['_nativecall'], the bytes objects could be >> deallocated before callables containing the interned string -- unless you >> Py_INCREF once too many, but then valgrind complains -- and so on). > > I don't understand. A C-callable object would hold a reference to each > interned string it contains, just like any other Python data structure > does. When the C-callable object is deallocated, then it drops this > reference, and if it there are no other live C-callable objects that > use the same signature, and there are no callers that are statically > caching the signature, then it will drop out of the intern table. > Otherwise it doesn't. Thanks!, I'm just being dense. In fact I was the one who first proposed just storing interned PyBytesObject* way back in the start of the thread, but it met opposition in favour of an interned char* or an allocated "string id"; perhaps that's why I shut the possibility out of my mind. Would we store just PyBytesObject* or a char* in addition? Is bytes a vararg object or does it wrap a char*? If the former I think we should just store the PyBytesObject*. DS From stefan_ml at behnel.de Tue Apr 17 15:07:12 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 17 Apr 2012 15:07:12 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D67B2.4060706@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> <4F8D67B2.4060706@astro.uio.no> Message-ID: <4F8D6B00.7040208@behnel.de> Dag Sverre Seljebotn, 17.04.2012 14:53: > On 04/17/2012 02:40 PM, Nathaniel Smith wrote: >> On Tue, Apr 17, 2012 at 12:54 PM, Dag Sverre Seljebotn wrote: >>> I don't believe doing interning right without a common dependency .so is >>> all >>> that easy. I'd love to see a concrete spec for it (e.g., if you use Python >>> bytes in a dict in sys.modules['_nativecall'], the bytes objects could be >>> deallocated before callables containing the interned string -- unless you >>> Py_INCREF once too many, but then valgrind complains -- and so on). >> >> I don't understand. A C-callable object would hold a reference to each >> interned string it contains, just like any other Python data structure >> does. When the C-callable object is deallocated, then it drops this >> reference, and if it there are no other live C-callable objects that >> use the same signature, and there are no callers that are statically >> caching the signature, then it will drop out of the intern table. >> Otherwise it doesn't. > > Thanks!, I'm just being dense. > > In fact I was the one who first proposed just storing interned > PyBytesObject* way back in the start of the thread, but it met opposition > in favour of an interned char* or an allocated "string id"; perhaps that's > why I shut the possibility out of my mind. > > Would we store just PyBytesObject* or a char* in addition? Is bytes a > vararg object or does it wrap a char*? If the former I think we should just > store the PyBytesObject*. I had originally thought that callables would include C implemented functions, which would make this difficult for non-Cython code, because they don't normally have destructors in CPython. If we rule that out and restrict ourselves to callable extension types, binding the lifetime of yet another object to their lifetime would not hurt. Stefan From njs at pobox.com Tue Apr 17 15:10:23 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Apr 2012 14:10:23 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D67B2.4060706@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> <4F8D67B2.4060706@astro.uio.no> Message-ID: On Tue, Apr 17, 2012 at 1:53 PM, Dag Sverre Seljebotn wrote: > On 04/17/2012 02:40 PM, Nathaniel Smith wrote: >> >> On Tue, Apr 17, 2012 at 12:54 PM, Dag Sverre Seljebotn >> ?wrote: >>> >>> I don't believe doing interning right without a common dependency .so is >>> all >>> that easy. I'd love to see a concrete spec for it (e.g., if you use >>> Python >>> bytes in a dict in sys.modules['_nativecall'], the bytes objects could be >>> deallocated before callables containing the interned string -- unless you >>> Py_INCREF once too many, but then valgrind complains -- and so on). >> >> >> I don't understand. A C-callable object would hold a reference to each >> interned string it contains, just like any other Python data structure >> does. When the C-callable object is deallocated, then it drops this >> reference, and if it there are no other live C-callable objects that >> use the same signature, and there are no callers that are statically >> caching the signature, then it will drop out of the intern table. >> Otherwise it doesn't. > > > Thanks!, I'm just being dense. > > In fact I was the one who first proposed just storing interned > PyBytesObject* way back in the start of the thread, but it met opposition in > favour of an interned char* or an allocated "string id"; perhaps that's why > I shut the possibility out of my mind. > > Would we store just PyBytesObject* or a char* in addition? Is bytes a vararg > object or does it wrap a char*? If the former I think we should just store > the PyBytesObject*. In 2.7, PyBytesObject is #defined to PyStringObject, and stringobject.h says that PyStringObject is variable-size. So that's fine. From stefan_ml at behnel.de Tue Apr 17 15:16:09 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 17 Apr 2012 15:16:09 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D67B2.4060706@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> <4F8D67B2.4060706@astro.uio.no> Message-ID: <4F8D6D19.3020204@behnel.de> Dag Sverre Seljebotn, 17.04.2012 14:53: > Is bytes a vararg object or does it wrap a char*? The data is stored internally in all CPython versions. Note that access to it may not be efficient in other Python implementations, but at least PyPy would also have a problem with providing a non-reference counted char* value (unless they let it live forever, that is...). Stefan From d.s.seljebotn at astro.uio.no Tue Apr 17 15:20:09 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 15:20:09 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D6D19.3020204@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> <4F8D67B2.4060706@astro.uio.no> <4F8D6D19.3020204@behnel.de> Message-ID: <4F8D6E09.9050107@astro.uio.no> On 04/17/2012 03:16 PM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 17.04.2012 14:53: >> Is bytes a vararg object or does it wrap a char*? > > The data is stored internally in all CPython versions. Note that access to > it may not be efficient in other Python implementations, but at least PyPy > would also have a problem with providing a non-reference counted char* > value (unless they let it live forever, that is...). Are there any implications for PyPy as the *caller* w.r.t. a bytes object or a char*? E.g. if it wants to parse the format string and JIT a dispatch. Dag From njs at pobox.com Tue Apr 17 16:20:14 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Apr 2012 15:20:14 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D6112.1000906@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> Message-ID: On Tue, Apr 17, 2012 at 1:24 PM, Dag Sverre Seljebotn wrote: > OK, here's the benchmark code I've written: > > https://github.com/dagss/cep1000 This is great! > Assumptions etc.: > > ?- (Very) warm cache case is tested > > ?- I compile and link libmycallable.so, libmycaller.so and ./bench; with > -fPIC, to emulate the Python environment > > ?- I use mostly pure C but use PyTypeObject in order to get the offsets > to tp_flags etc right (I emulate the checking that would happen on a > PyObject* according to CEP1000). > > ?- The test function is "double f(double x) { return x * x; } > > ?- The benchmark is run in a loop J=1000000 times (and time divided by > J). This is repeated K=10000 times and the minimum walltime of the K run > is used. This gave very stable readings on my system. > > Fixing loop iterations: > > In the initial results I just scanned the overload list until > NULL-termination. It seemed to me that the code generated for this > scanning was the most important factor. > > Therefore I fixed the number of overloads as a known compile-time macro > N *in the caller*. This is somewhat optimistic; however I didn't want to > play with figuring out loop unrolling etc. at the same time, and > hardcoding the length of the overload list sort of took that part out of > the equation. Since you've set this up... I have a suggestion for something that may be worth trying, though I've hesitated to propose it seriously. And that is, an API where instead of scanning a table, the C-callable exposes a pointer-to-function, something like int get_funcptr(PyObject * self, PyBytesObject * signature, struct c_function_info * out) The rationale is, if we want to support JITed functions where new function pointers may be generated on the fly, the array approach has a serious problem. You have to decide how many array slots to allocate ahead of time, and if you run out, then... too bad. I guess you get to throw away one of the existing pointers (i.e., leak memory) to make room. Adding an indirection here for the actual lookup means that a JIT could use a more complex lookup structure if justified, while a simple C function pointer could just hardcode this to a signature-check + return, with no lookup step at all. It also would give us a lot of flexibility for future optimizations (e.g., is it worth sorting the lookup table in LRU order?). And it would allow for a JIT to generate a C function pointer on first use, rather than requiring the first use to go via the Python-level __call__ fallback. (Which is pretty important when the first use is to fetch the function pointer before entering an inner loop!) OTOH the extra indirection will obviously have some overhead, so it'd be nice to know if it's actually a problem. > Table explanation: > > ?- N: Number of overloads in list. For N=10, there's 9 non-matching > overloads in the list before the matching 10 (but caller doesn't know > this). For N=1, the caller knows this and optimize for a hit in the > first entry. > > ?- MISMATCHES: If set, the caller tries 4 non-matching signatures before > hitting the final one. If not set, only the correct signature is tried. > > ?- LIKELY: If set, a GCC likely() macro is used to expect that the signature > matches. > > > RESULTS: > > Direct call (and execution of!) the function in benchmark loop took 4.8 ns. > > An indirect dispatch through a function pointer of known type took 5.4 ns > > Notation below is (intern key), in ns > > N=1: > ?MISMATCHES=False: > ? ?LIKELY=True: ? ? 6.44 ?6.44 > ? ?LIKELY=False: ? ?7.52 ?8.06 > ?MISMATCHES=True: ? 8.59 ?8.59 > N=10: > ?MISMATCHES=False: ?17.19 ?19.20 > ?MISMATCHES=True: ? 36.52 ?37.59 > > To be clear, "intern" is an interned "char*" (comparison with a 64 bits > global variable), while key is comparison of a size_t (comparison of a > 64-bit immediate in the instruction stream). > > PRELIMINARY BENCHMARK CONCLUSION: > > Intern appears to be as fast or faster than strcmp. > > I don't know why (is the pointer offset to the global variable stored in > less than 64 bits in the x86-64 instruction stream? What gdb (or other) > commands would I use to figure that out?) I don't know why. It's entirely possible that this is just an accident of alignment or something. You're probably using, what, a 2 GHz CPU or so? So we're talking about a difference on the order of 2-4 cycles. (Actually, I'm surprised that LIKELY made any difference. The CPU knows which branch you're going to take regardless; all the compiler can do is try to improve memory locality for the "expected" path. But your entire benchmark probably fits in L1, so why would memory locality matter? Unless you got unlucky with cache associativity or something...) Generally, I think the conclusion I draw from these numbers is that in the hot-cache case, the lookup overhead is negligible regardless of how we do it. On my laptop (i7 L640 @ 2.13 GHz), a single L3 cache miss costs 150 ns, and even an L2 miss is 30 ns. -- Nathaniel From d.s.seljebotn at astro.uio.no Tue Apr 17 16:34:48 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 16:34:48 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> Message-ID: <4F8D7F88.7040509@astro.uio.no> On 04/17/2012 04:20 PM, Nathaniel Smith wrote: > On Tue, Apr 17, 2012 at 1:24 PM, Dag Sverre Seljebotn > wrote: >> OK, here's the benchmark code I've written: >> >> https://github.com/dagss/cep1000 > > This is great! > >> Assumptions etc.: >> >> - (Very) warm cache case is tested >> >> - I compile and link libmycallable.so, libmycaller.so and ./bench; with >> -fPIC, to emulate the Python environment >> >> - I use mostly pure C but use PyTypeObject in order to get the offsets >> to tp_flags etc right (I emulate the checking that would happen on a >> PyObject* according to CEP1000). >> >> - The test function is "double f(double x) { return x * x; } >> >> - The benchmark is run in a loop J=1000000 times (and time divided by >> J). This is repeated K=10000 times and the minimum walltime of the K run >> is used. This gave very stable readings on my system. >> >> Fixing loop iterations: >> >> In the initial results I just scanned the overload list until >> NULL-termination. It seemed to me that the code generated for this >> scanning was the most important factor. >> >> Therefore I fixed the number of overloads as a known compile-time macro >> N *in the caller*. This is somewhat optimistic; however I didn't want to >> play with figuring out loop unrolling etc. at the same time, and >> hardcoding the length of the overload list sort of took that part out of >> the equation. > > Since you've set this up... I have a suggestion for something that may > be worth trying, though I've hesitated to propose it seriously. And > that is, an API where instead of scanning a table, the C-callable > exposes a pointer-to-function, something like > int get_funcptr(PyObject * self, PyBytesObject * signature, struct > c_function_info * out) Hmm. There's many ways to implement that function though. It shifts the scanning logic from the caller to the callee; you would need to call it multiple times for different signatures... But if the overhead can be shown to be miniscule then it does perhaps make the API nicer, even if it feels like paying for nothing at the moment. But see below. Will definitely not get around to this today; anyone else feel free... > > The rationale is, if we want to support JITed functions where new > function pointers may be generated on the fly, the array approach has > a serious problem. You have to decide how many array slots to allocate > ahead of time, and if you run out, then... too bad. I guess you get to Note that the table is jumped to by a pointer in the PyObject, i.e. the PyObject I've tested with is [object data, &table, table] So a JIT could have the table in a separate location on the heap, then it can allocate a new table, copy over the contents, and when everything is ready, then do an atomic pointer update (using the assembly instructions/gcc intrinsics, not pthreads or locking). The old table would need to linger for a bit, but could at latest be deallocated when the PyObject is deallocated. > throw away one of the existing pointers (i.e., leak memory) to make > room. Adding an indirection here for the actual lookup means that a > JIT could use a more complex lookup structure if justified, while a > simple C function pointer could just hardcode this to a > signature-check + return, with no lookup step at all. It also would > give us a lot of flexibility for future optimizations (e.g., is it > worth sorting the lookup table in LRU order?). And it would allow for > a JIT to generate a C function pointer on first use, rather than > requiring the first use to go via the Python-level __call__ fallback. > (Which is pretty important when the first use is to fetch the function > pointer before entering an inner loop!) What I was thinking about along these lines is to add another function pointer in the PyUnofficialTypeObject. A caller would then: if found in table: do dispatch else if object supports get_funcptr: call get_funcptr else: python dispatch > > OTOH the extra indirection will obviously have some overhead, so it'd > be nice to know if it's actually a problem. > >> Table explanation: >> >> - N: Number of overloads in list. For N=10, there's 9 non-matching >> overloads in the list before the matching 10 (but caller doesn't know >> this). For N=1, the caller knows this and optimize for a hit in the >> first entry. >> >> - MISMATCHES: If set, the caller tries 4 non-matching signatures before >> hitting the final one. If not set, only the correct signature is tried. >> >> - LIKELY: If set, a GCC likely() macro is used to expect that the signature >> matches. >> >> >> RESULTS: >> >> Direct call (and execution of!) the function in benchmark loop took 4.8 ns. >> >> An indirect dispatch through a function pointer of known type took 5.4 ns >> >> Notation below is (intern key), in ns >> >> N=1: >> MISMATCHES=False: >> LIKELY=True: 6.44 6.44 >> LIKELY=False: 7.52 8.06 >> MISMATCHES=True: 8.59 8.59 >> N=10: >> MISMATCHES=False: 17.19 19.20 >> MISMATCHES=True: 36.52 37.59 >> >> To be clear, "intern" is an interned "char*" (comparison with a 64 bits >> global variable), while key is comparison of a size_t (comparison of a >> 64-bit immediate in the instruction stream). >> >> PRELIMINARY BENCHMARK CONCLUSION: >> >> Intern appears to be as fast or faster than strcmp. >> >> I don't know why (is the pointer offset to the global variable stored in >> less than 64 bits in the x86-64 instruction stream? What gdb (or other) >> commands would I use to figure that out?) > > I don't know why. It's entirely possible that this is just an accident > of alignment or something. You're probably using, what, a 2 GHz CPU or > so? So we're talking about a difference on the order of 2-4 cycles. > (Actually, I'm surprised that LIKELY made any difference. The CPU > knows which branch you're going to take regardless; all the compiler > can do is try to improve memory locality for the "expected" path. But > your entire benchmark probably fits in L1, so why would memory > locality matter? Unless you got unlucky with cache associativity or > something...) Oh, sorry: ~ $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 30 model name : Intel(R) Core(TM) i7 CPU Q 840 @ 1.87GHz stepping : 5 cpu MHz : 1866.000 cache size : 8192 KB ... Dag From njs at pobox.com Tue Apr 17 17:16:33 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Apr 2012 16:16:33 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D7F88.7040509@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8D7F88.7040509@astro.uio.no> Message-ID: On Tue, Apr 17, 2012 at 3:34 PM, Dag Sverre Seljebotn wrote: > On 04/17/2012 04:20 PM, Nathaniel Smith wrote: >> Since you've set this up... I have a suggestion for something that may >> be worth trying, though I've hesitated to propose it seriously. And >> that is, an API where instead of scanning a table, the C-callable >> exposes a pointer-to-function, something like >> ? int get_funcptr(PyObject * self, PyBytesObject * signature, struct >> c_function_info * out) > > > Hmm. There's many ways to implement that function though. It shifts the > scanning logic from the caller to the callee; Yes, that's part of the point :-). Or, well, I guess the point is more that it shifts the scanning logic from the ABI docs to the callee. > you would need to call it > multiple times for different signatures... Yes, I'm not sure what I think about this -- there are arguments either way for who should handle promotion. E.g., imagine the following situation: We have a JITable function We have already JITed the int64 version of this function Now we want to call it with an int32 Question: should we promote to int64, or should we JIT? Later you write: > if found in table: > do dispatch > else if object supports get_funcptr: > call get_funcptr > else: > python dispatch If we do promotion during the table scanning, then we'll never call get_funcptr and we'll never JIT an int32 version. OTOH, if we call get_funcptr before doing promotion, then we'll end up calling get_funcptr multiple times for different signatures regardless. OTOOH, there are a *lot* of possible coercions for, say, a 3-argument function with return, so just enumerating them is not necessarily a good strategy. Possibly if get_functpr can't handle the initial signature, it should return a table of signatures that it *is* willing to handle... assuming that most callees will either be able to handle a fixed set of types (cython variants) or else handle pretty much anything (JIT), and only the former will reach this code path. Or we could write down the allowed promotions (stealing from the C99 spec), and require the callee to pick the best promotion if it can't handle the initial request. Or we could put this part off until version 2, once we see how eager callers are to actually implement a real promotion engine. > But if the overhead can be shown to be miniscule then it does perhaps make > the API nicer, even if it feels like paying for nothing at the moment. But > see below. > > Will definitely not get around to this today; anyone else feel free... > > >> >> The rationale is, if we want to support JITed functions where new >> function pointers may be generated on the fly, the array approach has >> a serious problem. You have to decide how many array slots to allocate >> ahead of time, and if you run out, then... too bad. I guess you get to > > > Note that the table is jumped to by a pointer in the PyObject, i.e. the > PyObject I've tested with is > > [object data, &table, table] Oh, I see! I thought you were embedding it in the object, to avoid an extra indirection (and potential cache miss). That's probably necessary, for the reasons you say, but also makes the get_funcptr approach potentially more competitive. > So a JIT could have the table in a separate location on the heap, then it > can allocate a new table, copy over the contents, and when everything is > ready, then do an atomic pointer update (using the assembly instructions/gcc > intrinsics, not pthreads or locking). > > The old table would need to linger for a bit, but could at latest be > deallocated when the PyObject is deallocated. IMHO we should just hold the GIL through lookups, which would simplify tihs, but that's mostly based on the naive intuition that we shouldn't be passing around Python boxes in no-GIL code. Maybe there are good reasons to. - N From stefan_ml at behnel.de Tue Apr 17 17:55:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 17 Apr 2012 17:55:44 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D6E09.9050107@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> <4F8D67B2.4060706@astro.uio.no> <4F8D6D19.3020204@behnel.de> <4F8D6E09.9050107@astro.uio.no> Message-ID: <4F8D9280.8020303@behnel.de> Dag Sverre Seljebotn, 17.04.2012 15:20: > On 04/17/2012 03:16 PM, Stefan Behnel wrote: >> Dag Sverre Seljebotn, 17.04.2012 14:53: >>> Is bytes a vararg object or does it wrap a char*? >> >> The data is stored internally in all CPython versions. Note that access to >> it may not be efficient in other Python implementations, but at least PyPy >> would also have a problem with providing a non-reference counted char* >> value (unless they let it live forever, that is...). > > Are there any implications for PyPy as the *caller* w.r.t. a bytes object > or a char*? E.g. if it wants to parse the format string and JIT a dispatch. I don't think so. Stefan From d.s.seljebotn at astro.uio.no Tue Apr 17 21:07:38 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 17 Apr 2012 21:07:38 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8D7F88.7040509@astro.uio.no> Message-ID: <88e09774-5be9-45c7-91eb-98f92490ddd2@email.android.com> Nathaniel Smith wrote: >On Tue, Apr 17, 2012 at 3:34 PM, Dag Sverre Seljebotn > wrote: >> On 04/17/2012 04:20 PM, Nathaniel Smith wrote: >>> Since you've set this up... I have a suggestion for something that >may >>> be worth trying, though I've hesitated to propose it seriously. And >>> that is, an API where instead of scanning a table, the C-callable >>> exposes a pointer-to-function, something like >>> ? int get_funcptr(PyObject * self, PyBytesObject * signature, struct >>> c_function_info * out) >> >> >> Hmm. There's many ways to implement that function though. It shifts >the >> scanning logic from the caller to the callee; > >Yes, that's part of the point :-). Or, well, I guess the point is more >that it shifts the scanning logic from the ABI docs to the callee. Well, really it shifts the logic to the getfuncptr argument specification -- is the signature argument an interned string, encoded string, sha1 hash,... Part of the table storage format is shifted from the CEP but that is so unimportant it has not even been discussed. > >> you would need to call it >> multiple times for different signatures... > >Yes, I'm not sure what I think about this -- there are arguments >either way for who should handle promotion. E.g., imagine the >following situation: > >We have a JITable function >We have already JITed the int64 version of this function >Now we want to call it with an int32 >Question: should we promote to int64, or should we JIT? I think we got close to a good solution to this dilemma earlier in this thread: - Callers promote scalars to 64 bit if no exact match is found (and JITs only use 64 bit scalars) - Arrays and pointers are the real issue. In this case the caller request another signature (and the JIT kicks in) The utility of re-jiting for scalars is very limited; it is vital for arrays and pointers. > >Later you write: >> if found in table: >> do dispatch >> else if object supports get_funcptr: >> call get_funcptr >> else: >> python dispatch > >If we do promotion during the table scanning, then we'll never call >get_funcptr and we'll never JIT an int32 version. OTOH, if we call >get_funcptr before doing promotion, then we'll end up calling >get_funcptr multiple times for different signatures regardless. > >OTOOH, there are a *lot* of possible coercions for, say, a 3-argument >function with return, so just enumerating them is not necessarily a >good strategy. Possibly if get_functpr can't handle the initial >signature, it should return a table of signatures that it *is* willing >to handle... assuming that most callees will either be able to handle >a fixed set of types (cython variants) or else handle pretty much >anything (JIT), and only the former will reach this code path. Or we >could write down the allowed promotions (stealing from the C99 spec), >and require the callee to pick the best promotion if it can't handle >the initial request. Or we could put this part off until version 2, >once we see how eager callers are to actually implement a real >promotion engine. I wanted to leave getfuncptr for another CEP. There's all kind of stuff -- how does the JIT determine that the argument arrays are large enough to justify JITing? Etc. > >> But if the overhead can be shown to be miniscule then it does perhaps >make >> the API nicer, even if it feels like paying for nothing at the >moment. But >> see below. >> >> Will definitely not get around to this today; anyone else feel >free... >> >> >>> >>> The rationale is, if we want to support JITed functions where new >>> function pointers may be generated on the fly, the array approach >has >>> a serious problem. You have to decide how many array slots to >allocate >>> ahead of time, and if you run out, then... too bad. I guess you get >to >> >> >> Note that the table is jumped to by a pointer in the PyObject, i.e. >the >> PyObject I've tested with is >> >> [object data, &table, table] > >Oh, I see! I thought you were embedding it in the object, to avoid an >extra indirection (and potential cache miss). That's probably Note that in my benchmark the data was right next to the pointer, I think the cost was minor. >necessary, for the reasons you say, but also makes the get_funcptr >approach potentially more competitive. > >> So a JIT could have the table in a separate location on the heap, >then it >> can allocate a new table, copy over the contents, and when everything >is >> ready, then do an atomic pointer update (using the assembly >instructions/gcc >> intrinsics, not pthreads or locking). >> >> The old table would need to linger for a bit, but could at latest be >> deallocated when the PyObject is deallocated. > >IMHO we should just hold the GIL through lookups, which would simplify >tihs, but that's mostly based on the naive intuition that we shouldn't >be passing around Python boxes in no-GIL code. Maybe there are good >reasons to. Your intuition about the GIL is wrong as far as Cython is concerned, you are allowed to call cdef 'nogil' methods on refcounted Cython objects without the GIL. Dag > >- N >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From njs at pobox.com Tue Apr 17 21:38:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 Apr 2012 20:38:41 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <88e09774-5be9-45c7-91eb-98f92490ddd2@email.android.com> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8D7F88.7040509@astro.uio.no> <88e09774-5be9-45c7-91eb-98f92490ddd2@email.android.com> Message-ID: On Tue, Apr 17, 2012 at 8:07 PM, Dag Sverre Seljebotn wrote: > > > Nathaniel Smith wrote: > >>On Tue, Apr 17, 2012 at 3:34 PM, Dag Sverre Seljebotn >> wrote: >>> On 04/17/2012 04:20 PM, Nathaniel Smith wrote: >>>> Since you've set this up... I have a suggestion for something that >>may >>>> be worth trying, though I've hesitated to propose it seriously. And >>>> that is, an API where instead of scanning a table, the C-callable >>>> exposes a pointer-to-function, something like >>>> ? int get_funcptr(PyObject * self, PyBytesObject * signature, struct >>>> c_function_info * out) >>> >>> >>> Hmm. There's many ways to implement that function though. It shifts >>the >>> scanning logic from the caller to the callee; >> >>Yes, that's part of the point :-). Or, well, I guess the point is more >>that it shifts the scanning logic from the ABI docs to the callee. > > Well, really it shifts the logic to the getfuncptr argument specification -- is the signature argument an interned string, encoded string, sha1 hash,... > > Part of the table storage format is shifted from the CEP but that is so unimportant it has not even been discussed. > >> >>> you would need to call it >>> multiple times for different signatures... >> >>Yes, I'm not sure what I think about this -- there are arguments >>either way for who should handle promotion. E.g., imagine the >>following situation: >> >>We have a JITable function >>We have already JITed the int64 version of this function >>Now we want to call it with an int32 >>Question: should we promote to int64, or should we JIT? > > I think we got close to a good solution to this dilemma earlier in this thread: > > ?- Callers promote scalars to 64 bit if no exact match is found (and JITs only use 64 bit scalars) > > ?- Arrays and pointers are the real issue. In this case the caller request another signature (and the JIT kicks in) > > The utility of re-jiting for scalars is very limited; it is vital for arrays and pointers. Nonetheless, I think these rules will run into some trouble (starting with, JITs only use long double?), and esp. if you want to convince python-dev of them. But again, I don't think it's so terrible if the caller just picks some different signatures that it's willing to deal with, for now. > >> >>Later you write: >>> if found in table: >>> ? do dispatch >>> else if object supports get_funcptr: >>> ? call get_funcptr >>> else: >>> ? python dispatch >> >>If we do promotion during the table scanning, then we'll never call >>get_funcptr and we'll never JIT an int32 version. OTOH, if we call >>get_funcptr before doing promotion, then we'll end up calling >>get_funcptr multiple times for different signatures regardless. >> >>OTOOH, there are a *lot* of possible coercions for, say, a 3-argument >>function with return, so just enumerating them is not necessarily a >>good strategy. Possibly if get_functpr can't handle the initial >>signature, it should return a table of signatures that it *is* willing >>to handle... assuming that most callees will either be able to handle >>a fixed set of types (cython variants) or else handle pretty much >>anything (JIT), and only the former will reach this code path. Or we >>could write down the allowed promotions (stealing from the C99 spec), >>and require the callee to pick the best promotion if it can't handle >>the initial request. Or we could put this part off until version 2, >>once we see how eager callers are to actually implement a real >>promotion engine. > > I wanted to leave getfuncptr for another CEP. > > There's all kind of stuff -- how does the JIT determine that the argument arrays are large enough to justify JITing? Etc. I'm sort of inclined to follow KISS here, and say that this isn't PyPy, we aren't trying to get optimal performance on large, arbitrary programs. If someone took the trouble to write a function in a special JIT-able Python subset/dialect and then passed it to a C code, it's because they know that JITing is worth it we and should just do it unconditionally. Maybe that'll have to be revised later, but it seems like a plausible way to get started... Anyway, getfuncptr alone is actually simpler spec-wise than the array lookup approach, and the flexibility is an added bonus; it's just a question of whether it will work. >> >>> But if the overhead can be shown to be miniscule then it does perhaps >>make >>> the API nicer, even if it feels like paying for nothing at the >>moment. But >>> see below. >>> >>> Will definitely not get around to this today; anyone else feel >>free... >>> >>> >>>> >>>> The rationale is, if we want to support JITed functions where new >>>> function pointers may be generated on the fly, the array approach >>has >>>> a serious problem. You have to decide how many array slots to >>allocate >>>> ahead of time, and if you run out, then... too bad. I guess you get >>to >>> >>> >>> Note that the table is jumped to by a pointer in the PyObject, i.e. >>the >>> PyObject I've tested with is >>> >>> [object data, &table, table] >> >>Oh, I see! I thought you were embedding it in the object, to avoid an >>extra indirection (and potential cache miss). > That's probably > > Note that in my benchmark the data was right next to the pointer, I think the cost was minor. Yeah, I'm not worried about your benchmark; the only case that seems to really matter is when the cache is cold. Two cache misses are worse than one. >>necessary, for the reasons you say, but also makes the get_funcptr >>approach potentially more competitive. >> >>> So a JIT could have the table in a separate location on the heap, >>then it >>> can allocate a new table, copy over the contents, and when everything >>is >>> ready, then do an atomic pointer update (using the assembly >>instructions/gcc >>> intrinsics, not pthreads or locking). >>> >>> The old table would need to linger for a bit, but could at latest be >>> deallocated when the PyObject is deallocated. >> >>IMHO we should just hold the GIL through lookups, which would simplify >>tihs, but that's mostly based on the naive intuition that we shouldn't >>be passing around Python boxes in no-GIL code. Maybe there are good >>reasons to. > > Your intuition about the GIL is wrong as far as Cython is concerned, you are allowed to call cdef 'nogil' methods on refcounted Cython objects without the GIL. But at least the docs claim that you can't pass a boxed C-callable to such a method: "If you are implementing such a function in Cython, it cannot have any Python arguments, ...". -- Nathaniel From greg.ewing at canterbury.ac.nz Wed Apr 18 00:55:23 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 18 Apr 2012 10:55:23 +1200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D59F8.4080809@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F89CB1D.6000109@astro.uio.no> <4F89E5CD.5060301@behnel.de> <4F8A717B.6000107@astro.uio.no> <4F8A83B9.904@astro.uio.no> <4F8D59F8.4080809@astro.uio.no> Message-ID: <4F8DF4DB.7080506@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > if you use > Python bytes in a dict in sys.modules['_nativecall'], the bytes objects > could be deallocated before callables containing the interned string -- > unless you Py_INCREF once too many, I don't understand that. Is there some reason that refcounting of the interned strings can't be done correctly? -- Greg From d.s.seljebotn at astro.uio.no Wed Apr 18 23:35:42 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 18 Apr 2012 23:35:42 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8D6112.1000906@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> Message-ID: <4F8F33AE.50401@astro.uio.no> On 04/17/2012 02:24 PM, Dag Sverre Seljebotn wrote: > On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >> Travis Oliphant recently raised the issue on the NumPy list of what >> mechanisms to use to box native functions produced by his Numba so >> that SciPy functions can call it, e.g. (I'm making the numba part >> up): >> >> @numba # Compiles function using LLVM def f(x): return 3 * x >> >> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >> >> Obviously, we want something standard, so that Cython functions can >> also be called in a fast way. > > OK, here's the benchmark code I've written: > > https://github.com/dagss/cep1000 > > Assumptions etc.: > > - (Very) warm cache case is tested > > - I compile and link libmycallable.so, libmycaller.so and ./bench; with > -fPIC, to emulate the Python environment > > - I use mostly pure C but use PyTypeObject in order to get the offsets > to tp_flags etc right (I emulate the checking that would happen on a > PyObject* according to CEP1000). > > - The test function is "double f(double x) { return x * x; } > > - The benchmark is run in a loop J=1000000 times (and time divided by > J). This is repeated K=10000 times and the minimum walltime of the K run > is used. This gave very stable readings on my system. > > Fixing loop iterations: > > In the initial results I just scanned the overload list until > NULL-termination. It seemed to me that the code generated for this > scanning was the most important factor. > > Therefore I fixed the number of overloads as a known compile-time macro > N *in the caller*. This is somewhat optimistic; however I didn't want to > play with figuring out loop unrolling etc. at the same time, and > hardcoding the length of the overload list sort of took that part out of > the equation. > > > Table explanation: > > - N: Number of overloads in list. For N=10, there's 9 non-matching > overloads in the list before the matching 10 (but caller doesn't know > this). For N=1, the caller knows this and optimize for a hit in the > first entry. > > - MISMATCHES: If set, the caller tries 4 non-matching signatures before > hitting the final one. If not set, only the correct signature is tried. > > - LIKELY: If set, a GCC likely() macro is used to expect that the > signature matches. > > > RESULTS: > > Direct call (and execution of!) the function in benchmark loop took 4.8 ns. > > An indirect dispatch through a function pointer of known type took 5.4 ns > > Notation below is (intern key), in ns > > N=1: > MISMATCHES=False: > LIKELY=True: 6.44 6.44 > LIKELY=False: 7.52 8.06 > MISMATCHES=True: 8.59 8.59 > N=10: > MISMATCHES=False: 17.19 19.20 > MISMATCHES=True: 36.52 37.59 > > To be clear, "intern" is an interned "char*" (comparison with a 64 bits > global variable), while key is comparison of a size_t (comparison of a > 64-bit immediate in the instruction stream). First: My benchmarks today are a little inconsistent with earlier results. I think I have converged now in terms of number of iterations (higher than last time), but that doesn't explain why indirect dispatch through function pointer is now *higher*: Direct took 4.83 ns Dispatch took 5.91 ns Anyway, even if crude, hopefully this will tell us something. Order of benchmark numbers are: intern key get_func_intern get_func_key where the get_func_XX versions retrieve a function pointer taking either a single interned signature or a single key as argument (just see mycallable.c). In the MISMATCHES case, the get_func_XX is called 4 times with a miss and then with the match. N=1 - MISMATCHES=False: --- LIKELY=True: 5.91 6.44 8.59 9.13 --- LIKELY=False: 7.52 7.52 9.13 9.13 - MISMATCHES=True: 11.28 11.28 22.56 22.56 N=10 - MISMATCHES=False: 17.18 18.80 29.75 10.74(*) - MISMATCHES=True: 36.06 38.13 105.00 36.52 Benchmark comments: The one marked (*) is implemented as a switch statement with known keys compile-time. I tried shifting around the case label values a bit but the result persists; it could just be that the compiler does a very good job of the switch as well. Overall comments: Picking a winner doesn't get easier. I'll try to (make myself) get some perspective. Thinking of extreme cases where performance matters, here's one (sqrt is apparently 0.5 ns on my machine; sin would be 40 ns): from numpy import sqrt, sin cdef double f(double x): return sqrt(x * x) # or sin(x * x) Of course, here one could get the pointer in the module at import time. However, here: from numpy import sqrt cdef double f(double x): return np.sqrt(x * x) # or np.sin(x * x) the __getattr__ on np sure is larger than any effect we discuss. From the numbers above, I think I'm ready to accept the "getfuncptr" approach penalty (1.6 ns for a direct hit, larger when the caller accepts more signatures) as acceptable, given the added flexibility. When you care about the 1.6 ns, you're always going to want to do early binding anyway. However, just as I'm convinced about interning, there appears to be two new arguments for keys: - For a large number of overloads with getfuncptr, it can be much faster than interning. A 20ns difference starts to get interesting. - PSF GSoC proposals are not public yet, but I think I can say as much as that there's a PEP 3121 (multiple interpreter states) proposal, and that Martin von Lowis is favourable about it. If that goes anywhere it doesn't make interning impossible but it requires a shared C component and changing the spec from PyBytesObject to char*. Perhaps that can be done in a PEP-ification revision though. Just mentioning it. I'm not sure at this point myself; I think +1 on getfuncptr, but not sure about keys vs. interning. Dag From d.s.seljebotn at astro.uio.no Wed Apr 18 23:58:17 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 18 Apr 2012 23:58:17 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8F33AE.50401@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> Message-ID: <4F8F38F9.7020008@astro.uio.no> On 04/18/2012 11:35 PM, Dag Sverre Seljebotn wrote: > On 04/17/2012 02:24 PM, Dag Sverre Seljebotn wrote: >> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>> Travis Oliphant recently raised the issue on the NumPy list of what >>> mechanisms to use to box native functions produced by his Numba so >>> that SciPy functions can call it, e.g. (I'm making the numba part >>> up): >>> >>> @numba # Compiles function using LLVM def f(x): return 3 * x >>> >>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>> >>> Obviously, we want something standard, so that Cython functions can >>> also be called in a fast way. >> >> OK, here's the benchmark code I've written: >> >> https://github.com/dagss/cep1000 >> >> Assumptions etc.: >> >> - (Very) warm cache case is tested >> >> - I compile and link libmycallable.so, libmycaller.so and ./bench; with >> -fPIC, to emulate the Python environment >> >> - I use mostly pure C but use PyTypeObject in order to get the offsets >> to tp_flags etc right (I emulate the checking that would happen on a >> PyObject* according to CEP1000). >> >> - The test function is "double f(double x) { return x * x; } >> >> - The benchmark is run in a loop J=1000000 times (and time divided by >> J). This is repeated K=10000 times and the minimum walltime of the K run >> is used. This gave very stable readings on my system. >> >> Fixing loop iterations: >> >> In the initial results I just scanned the overload list until >> NULL-termination. It seemed to me that the code generated for this >> scanning was the most important factor. >> >> Therefore I fixed the number of overloads as a known compile-time macro >> N *in the caller*. This is somewhat optimistic; however I didn't want to >> play with figuring out loop unrolling etc. at the same time, and >> hardcoding the length of the overload list sort of took that part out of >> the equation. >> >> >> Table explanation: >> >> - N: Number of overloads in list. For N=10, there's 9 non-matching >> overloads in the list before the matching 10 (but caller doesn't know >> this). For N=1, the caller knows this and optimize for a hit in the >> first entry. >> >> - MISMATCHES: If set, the caller tries 4 non-matching signatures before >> hitting the final one. If not set, only the correct signature is tried. >> >> - LIKELY: If set, a GCC likely() macro is used to expect that the >> signature matches. >> >> >> RESULTS: >> >> Direct call (and execution of!) the function in benchmark loop took >> 4.8 ns. >> >> An indirect dispatch through a function pointer of known type took 5.4 ns >> >> Notation below is (intern key), in ns >> >> N=1: >> MISMATCHES=False: >> LIKELY=True: 6.44 6.44 >> LIKELY=False: 7.52 8.06 >> MISMATCHES=True: 8.59 8.59 >> N=10: >> MISMATCHES=False: 17.19 19.20 >> MISMATCHES=True: 36.52 37.59 >> >> To be clear, "intern" is an interned "char*" (comparison with a 64 bits >> global variable), while key is comparison of a size_t (comparison of a >> 64-bit immediate in the instruction stream). > > First: My benchmarks today are a little inconsistent with earlier > results. I think I have converged now in terms of number of iterations > (higher than last time), but that doesn't explain why indirect dispatch > through function pointer is now *higher*: > > Direct took 4.83 ns > Dispatch took 5.91 ns > > Anyway, even if crude, hopefully this will tell us something. Order of > benchmark numbers are: > > intern key get_func_intern get_func_key > > where the get_func_XX versions retrieve a function pointer taking either > a single interned signature or a single key as argument (just see > mycallable.c). > > In the MISMATCHES case, the get_func_XX is called 4 times with a miss > and then with the match. > > N=1 > - MISMATCHES=False: > --- LIKELY=True: 5.91 6.44 8.59 9.13 > --- LIKELY=False: 7.52 7.52 9.13 9.13 > - MISMATCHES=True: 11.28 11.28 22.56 22.56 > > N=10 > - MISMATCHES=False: 17.18 18.80 29.75 10.74(*) > - MISMATCHES=True: 36.06 38.13 105.00 36.52 > > Benchmark comments: > > The one marked (*) is implemented as a switch statement with known keys > compile-time. I tried shifting around the case label values a bit but > the result persists; it could just be that the compiler does a very good > job of the switch as well. I should make this clearer: The issue is that the compiler may have reordered the labels so that the hit came close to first; in the intern case the code is written so that the hit is always after 9 mismatches. So I redid the (*) test using 10 cases with very different numeric values, and then tried each 10 as the matching case. Timings were stable for each choice of label (so this is not noise), with values: 13.4 11.8 11.8 12.3 10.7 11.2 12.3 Guess this is the binary decision tree Mark talked about... Dag > > Overall comments: > > Picking a winner doesn't get easier. I'll try to (make myself) get some > perspective. Thinking of extreme cases where performance matters, here's > one (sqrt is apparently 0.5 ns on my machine; sin would be 40 ns): > > from numpy import sqrt, sin > > cdef double f(double x): > return sqrt(x * x) # or sin(x * x) > > Of course, here one could get the pointer in the module at import time. > However, here: > > from numpy import sqrt > > cdef double f(double x): > return np.sqrt(x * x) # or np.sin(x * x) > > the __getattr__ on np sure is larger than any effect we discuss. > > From the numbers above, I think I'm ready to accept the "getfuncptr" > approach penalty (1.6 ns for a direct hit, larger when the caller > accepts more signatures) as acceptable, given the added flexibility. > When you care about the 1.6 ns, you're always going to want to do early > binding anyway. > > However, just as I'm convinced about interning, there appears to be two > new arguments for keys: > > - For a large number of overloads with getfuncptr, it can be much faster > than interning. A 20ns difference starts to get interesting. > > - PSF GSoC proposals are not public yet, but I think I can say as much > as that there's a PEP 3121 (multiple interpreter states) proposal, and > that Martin von Lowis is favourable about it. If that goes anywhere it > doesn't make interning impossible but it requires a shared C component > and changing the spec from PyBytesObject to char*. Perhaps that can be > done in a PEP-ification revision though. > > Just mentioning it. I'm not sure at this point myself; I think +1 on > getfuncptr, but not sure about keys vs. interning. > > Dag From stefan_ml at behnel.de Thu Apr 19 08:41:40 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 19 Apr 2012 08:41:40 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8F33AE.50401@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> Message-ID: <4F8FB3A4.7080700@behnel.de> Dag Sverre Seljebotn, 18.04.2012 23:35: > from numpy import sqrt, sin > > cdef double f(double x): > return sqrt(x * x) # or sin(x * x) > > Of course, here one could get the pointer in the module at import time. That optimisation would actually be very worthwhile all by itself. I mean, we know what signatures we need for globally imported functions throughout the module, so we can reduce the call to a single jump through a function pointer (although likely with a preceding NULL check, which the branch prediction would be happy to give us for free). At least as long as sqrt is not being reassigned, but that should hit the 99% case. > However, here: > > from numpy import sqrt > > cdef double f(double x): > return np.sqrt(x * x) # or np.sin(x * x) > > the __getattr__ on np sure is larger than any effect we discuss. Yes, that would have to stay a .pxd case, I guess. > From the numbers above, I think I'm ready to accept the "getfuncptr" > approach penalty (1.6 ns for a direct hit, larger when the caller accepts > more signatures) as acceptable, given the added flexibility. When you care > about the 1.6 ns, you're always going to want to do early binding anyway. > > However, just as I'm convinced about interning, there appears to be two new > arguments for keys: > > - For a large number of overloads with getfuncptr, it can be much faster > than interning. A 20ns difference starts to get interesting. I don't think any of the numbers you presented marks any of the solutions as "expensive" or "wrong". The advantage of a callback function for this is that it is the most flexible solution that will most easily hit all use cases. The only problem I see with getfuncptr() is that it shifts not only the runtime work to the callee but also the development work, debugging, optimisation, etc. We should provide a default implementation for non-JITs in that case, preferably one that fits into a header file rather than requiring a library. It could still become a set of C-API functions when (if?) CPython starts to adopt this (and exposes it also for its builtins). > - PSF GSoC proposals are not public yet, but I think I can say as much as > that there's a PEP 3121 (multiple interpreter states) proposal, and that > Martin von Lowis is favourable about it. If that goes anywhere it doesn't > make interning impossible but it requires a shared C component and changing > the spec from PyBytesObject to char*. Perhaps that can be done in a > PEP-ification revision though. I asked him what he thinks about the status of that PEP and he seems to be unhappy about the current massive lack of evaluation data regarding the general applicability and completeness of the infrastructure. One of the outcomes of the GSoC would be that we learn what problems actually exist and what needs to be done (if anything) to make this work for more code out there. IMHO, that would be a very valuable result, also for us. Note that the focus for the GSoC project is on the stdlib C modules. Without those, general support in Cython wouldn't be very helpful for any real-world code. We should move any PEP3121 related discussion to a separate (mammoth?) thread and a new CEP, though (the tracker tickets are already there). This is a large topic that is only loosely related to your CEP. Note that module global C variables would no longer exist with PEP3121 either. They would move into a module struct (basically a module closure). So we'd pay with an indirection already, for everything. Stefan From d.s.seljebotn at astro.uio.no Thu Apr 19 09:17:09 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 09:17:09 +0200 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F8FB3A4.7080700@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> Message-ID: <4F8FBBF5.4090709@astro.uio.no> On 04/19/2012 08:41 AM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 18.04.2012 23:35: >> from numpy import sqrt, sin >> >> cdef double f(double x): >> return sqrt(x * x) # or sin(x * x) >> >> Of course, here one could get the pointer in the module at import time. > > That optimisation would actually be very worthwhile all by itself. I mean, > we know what signatures we need for globally imported functions throughout > the module, so we can reduce the call to a single jump through a function > pointer (although likely with a preceding NULL check, which the branch > prediction would be happy to give us for free). At least as long as sqrt is > not being reassigned, but that should hit the 99% case. > > >> However, here: >> >> from numpy import sqrt Correction: "import numpy as np" >> >> cdef double f(double x): >> return np.sqrt(x * x) # or np.sin(x * x) >> >> the __getattr__ on np sure is larger than any effect we discuss. > > Yes, that would have to stay a .pxd case, I guess. How about this mini-CEP: Modules are allowed to specify __nomonkey__ (or __const__, or __notreassigned__), a list of strings naming module-level variables where "we don't hold you responsible if you assume no monkey-patching of these". When doing "import numpy as np", then (assuming "np" is never reassigned in the module), at import time we check all names looked up from it in __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'", i.e. the "np." is just a namespace mechanism. Needs a bit more work, it ignores the possibility that others could monkey-patch "np" in the Cython module. Problem with .pxd is that currently you need to pick one overload (np.sqrt works for n-dimensional arrays too, or takes a list and returns an array). And even after adding 3-4 language features to Cython to make this work, you're stuck with having to reimplement parts of NumPy in the pxd files just so that you can early bind from Cython. Dag From vitja.makarov at gmail.com Thu Apr 19 10:35:55 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Thu, 19 Apr 2012 12:35:55 +0400 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F8FBBF5.4090709@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> Message-ID: 2012/4/19 Dag Sverre Seljebotn : > On 04/19/2012 08:41 AM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 18.04.2012 23:35: >>> >>> from numpy import sqrt, sin >>> >>> cdef double f(double x): >>> ? ? return sqrt(x * x) # or sin(x * x) >>> >>> Of course, here one could get the pointer in the module at import time. >> >> >> That optimisation would actually be very worthwhile all by itself. I mean, >> we know what signatures we need for globally imported functions throughout >> the module, so we can reduce the call to a single jump through a function >> pointer (although likely with a preceding NULL check, which the branch >> prediction would be happy to give us for free). At least as long as sqrt >> is >> not being reassigned, but that should hit the 99% case. >> >> >>> However, here: >>> >>> from numpy import sqrt > > > Correction: "import numpy as np" > >>> >>> cdef double f(double x): >>> ? ? return np.sqrt(x * x) # or np.sin(x * x) >>> >>> the __getattr__ on np sure is larger than any effect we discuss. >> >> >> Yes, that would have to stay a .pxd case, I guess. > > > How about this mini-CEP: > > Modules are allowed to specify __nomonkey__ (or __const__, or > __notreassigned__), a list of strings naming module-level variables where > "we don't hold you responsible if you assume no monkey-patching of these". > > When doing "import numpy as np", then (assuming "np" is never reassigned in > the module), at import time we check all names looked up from it in > __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'", > i.e. the "np." is just a namespace mechanism. > > Needs a bit more work, it ignores the possibility that others could > monkey-patch "np" in the Cython module. > > Problem with .pxd is that currently you need to pick one overload (np.sqrt > works for n-dimensional arrays too, or takes a list and returns an array). > And even after adding 3-4 language features to Cython to make this work, > you're stuck with having to reimplement parts of NumPy in the pxd files just > so that you can early bind from Cython. > Sorry, I'm a bit late. When should __nomonkey__ be checked at compile time or at import time? It seems to me that compiler must guess function signature at compile time. And then check it at runtime. What if integer signature is guessed for sqrt() based on the argument type sqrt(16) should this call fallback to PyObject_Call() or cast an integer to a double at some point? I've tried to implement trivial approach for CyFunction. Trivial means that function accepts PyObjects as arguments and returns an PyObject, so trivial signature is only one integer: 1 + len(args). If signature match occurs dirct C-function is called and PyObject_Call() is used otherwise. I didn't succeed because of argument cloning problems, we discussed before. About dict lookups: it's possible to speedup dict lookup by a constant key if we have access to dict's internal implementation. I've implemented it for module-level lookups here: https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9 And it gave 4 times speed up for dummy test: def foo(): cdef int i, r = 0 o = foo for i in range(10000000): if o is foo: r += 1 %timeit foo() 1 loops, best of 3: 229 ms per loop %timeit foo_optimized() 10 loops, best of 3: 54.1 ms per loop -- vitja. From njs at pobox.com Thu Apr 19 11:07:20 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 19 Apr 2012 10:07:20 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8F38F9.7020008@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> Message-ID: On Wed, Apr 18, 2012 at 10:58 PM, Dag Sverre Seljebotn wrote: > On 04/18/2012 11:35 PM, Dag Sverre Seljebotn wrote: >> >> On 04/17/2012 02:24 PM, Dag Sverre Seljebotn wrote: >>> >>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>>> >>>> Travis Oliphant recently raised the issue on the NumPy list of what >>>> mechanisms to use to box native functions produced by his Numba so >>>> that SciPy functions can call it, e.g. (I'm making the numba part >>>> up): >>>> >>>> @numba # Compiles function using LLVM def f(x): return 3 * x >>>> >>>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>>> >>>> Obviously, we want something standard, so that Cython functions can >>>> also be called in a fast way. >>> >>> >>> OK, here's the benchmark code I've written: >>> >>> https://github.com/dagss/cep1000 >>> >>> Assumptions etc.: >>> >>> - (Very) warm cache case is tested >>> >>> - I compile and link libmycallable.so, libmycaller.so and ./bench; with >>> -fPIC, to emulate the Python environment >>> >>> - I use mostly pure C but use PyTypeObject in order to get the offsets >>> to tp_flags etc right (I emulate the checking that would happen on a >>> PyObject* according to CEP1000). >>> >>> - The test function is "double f(double x) { return x * x; } >>> >>> - The benchmark is run in a loop J=1000000 times (and time divided by >>> J). This is repeated K=10000 times and the minimum walltime of the K run >>> is used. This gave very stable readings on my system. >>> >>> Fixing loop iterations: >>> >>> In the initial results I just scanned the overload list until >>> NULL-termination. It seemed to me that the code generated for this >>> scanning was the most important factor. >>> >>> Therefore I fixed the number of overloads as a known compile-time macro >>> N *in the caller*. This is somewhat optimistic; however I didn't want to >>> play with figuring out loop unrolling etc. at the same time, and >>> hardcoding the length of the overload list sort of took that part out of >>> the equation. >>> >>> >>> Table explanation: >>> >>> - N: Number of overloads in list. For N=10, there's 9 non-matching >>> overloads in the list before the matching 10 (but caller doesn't know >>> this). For N=1, the caller knows this and optimize for a hit in the >>> first entry. >>> >>> - MISMATCHES: If set, the caller tries 4 non-matching signatures before >>> hitting the final one. If not set, only the correct signature is tried. >>> >>> - LIKELY: If set, a GCC likely() macro is used to expect that the >>> signature matches. >>> >>> >>> RESULTS: >>> >>> Direct call (and execution of!) the function in benchmark loop took >>> 4.8 ns. >>> >>> An indirect dispatch through a function pointer of known type took 5.4 ns >>> >>> Notation below is (intern key), in ns >>> >>> N=1: >>> MISMATCHES=False: >>> LIKELY=True: 6.44 6.44 >>> LIKELY=False: 7.52 8.06 >>> MISMATCHES=True: 8.59 8.59 >>> N=10: >>> MISMATCHES=False: 17.19 19.20 >>> MISMATCHES=True: 36.52 37.59 >>> >>> To be clear, "intern" is an interned "char*" (comparison with a 64 bits >>> global variable), while key is comparison of a size_t (comparison of a >>> 64-bit immediate in the instruction stream). >> >> >> First: My benchmarks today are a little inconsistent with earlier >> results. I think I have converged now in terms of number of iterations >> (higher than last time), but that doesn't explain why indirect dispatch >> through function pointer is now *higher*: >> >> Direct took 4.83 ns >> Dispatch took 5.91 ns >> >> Anyway, even if crude, hopefully this will tell us something. Order of >> benchmark numbers are: >> >> intern key get_func_intern get_func_key >> >> where the get_func_XX versions retrieve a function pointer taking either >> a single interned signature or a single key as argument (just see >> mycallable.c). >> >> In the MISMATCHES case, the get_func_XX is called 4 times with a miss >> and then with the match. >> >> N=1 >> - MISMATCHES=False: >> --- LIKELY=True: 5.91 6.44 8.59 9.13 >> --- LIKELY=False: 7.52 7.52 9.13 9.13 >> - MISMATCHES=True: 11.28 11.28 22.56 22.56 >> >> N=10 >> - MISMATCHES=False: 17.18 18.80 29.75 10.74(*) >> - MISMATCHES=True: 36.06 38.13 105.00 36.52 >> >> Benchmark comments: >> >> The one marked (*) is implemented as a switch statement with known keys >> compile-time. I tried shifting around the case label values a bit but >> the result persists; it could just be that the compiler does a very good >> job of the switch as well. > > > I should make this clearer: The issue is that the compiler may have > reordered the labels so that the hit came close to first; in the intern case > the code is written so that the hit is always after 9 mismatches. > > So I redid the (*) test using 10 cases with very different numeric values, > and then tried each 10 as the matching case. Timings were stable for each > choice of label (so this is not noise), with values: > > 13.4 11.8 11.8 12.3 10.7 11.2 12.3 > > Guess this is the binary decision tree Mark talked about... Yes, if you look at the ASM (this is easier to keep track of if you make the switch cases into round decimal numbers, like 1000, 2000, 3000...), then you can see that gcc is generating a fully unrolled binary search, as basically a set of nested if/else's, like: if (value < 5000) { if (value == 2000) { return &ptr_2000; } else if (value == 4000) { return &ptr_4000; } } else { if (value == 6000) { return &ptr_6000; } else if (value == 8000) { return &ptr_8000; } } I suppose if we're ambitious we could do the same with the intern table for Cython compile-time variants (we don't know the values ahead of time, but we know how many there will be, so we'd generate the list of intern values, sort it, and then replace the comparison values above with table[middle], etc.). -- Nathaniel From d.s.seljebotn at astro.uio.no Thu Apr 19 12:43:15 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 12:43:15 +0200 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> Message-ID: <4F8FEC43.2020801@astro.uio.no> On 04/19/2012 10:35 AM, Vitja Makarov wrote: > 2012/4/19 Dag Sverre Seljebotn: >> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>> >>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>> >>>> from numpy import sqrt, sin >>>> >>>> cdef double f(double x): >>>> return sqrt(x * x) # or sin(x * x) >>>> >>>> Of course, here one could get the pointer in the module at import time. >>> >>> >>> That optimisation would actually be very worthwhile all by itself. I mean, >>> we know what signatures we need for globally imported functions throughout >>> the module, so we can reduce the call to a single jump through a function >>> pointer (although likely with a preceding NULL check, which the branch >>> prediction would be happy to give us for free). At least as long as sqrt >>> is >>> not being reassigned, but that should hit the 99% case. >>> >>> >>>> However, here: >>>> >>>> from numpy import sqrt >> >> >> Correction: "import numpy as np" >> >>>> >>>> cdef double f(double x): >>>> return np.sqrt(x * x) # or np.sin(x * x) >>>> >>>> the __getattr__ on np sure is larger than any effect we discuss. >>> >>> >>> Yes, that would have to stay a .pxd case, I guess. >> >> >> How about this mini-CEP: >> >> Modules are allowed to specify __nomonkey__ (or __const__, or >> __notreassigned__), a list of strings naming module-level variables where >> "we don't hold you responsible if you assume no monkey-patching of these". >> >> When doing "import numpy as np", then (assuming "np" is never reassigned in >> the module), at import time we check all names looked up from it in >> __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'", >> i.e. the "np." is just a namespace mechanism. >> >> Needs a bit more work, it ignores the possibility that others could >> monkey-patch "np" in the Cython module. >> >> Problem with .pxd is that currently you need to pick one overload (np.sqrt >> works for n-dimensional arrays too, or takes a list and returns an array). >> And even after adding 3-4 language features to Cython to make this work, >> you're stuck with having to reimplement parts of NumPy in the pxd files just >> so that you can early bind from Cython. >> > > Sorry, I'm a bit late. > > When should __nomonkey__ be checked at compile time or at import time? At import time. At compile time we generate one (potential) function pointer per call-signature we might try. At import time we fill them in if they are in __nomonkey__ using CEP 1000. At call time we likely() around the pointer being non-empty, since the cost of a dict-lookup is so large anyway. > It seems to me that compiler must guess function signature at compile > time. And then check it at runtime. Yes. Just like Fortran 77, where you don't declare functions, just call them.At least with Cython it'll just go slower if you get them wrong, we won't get a crash :-) If you want to help the compiler along explicitly, you would instead do something like cdef double (*sqrt_double)(double) from numpy import sqrt > What if integer signature is guessed for sqrt() based on the argument > type sqrt(16) should this call fallback to PyObject_Call() or cast an > integer to a double at some point? a) np.sqrt could export functions for all basic types (this is how NumPy currently works under the hood anyway) b) It doesn't help here, but I also imagine Cython doing a 3-step or 4-step call down the line: - Direct call using the types given. - Promote all scalars to 64 bit, try again. [- (Optional if an FFI library or LLVM is available): Parse signature string of function and build call dynamically using a FFI library)] - Python call Without an FFI library, I think giving the user a speedup if he/she writes sqrt(3.) rather than sqrt(3) is fine... c) An optimize-for-host-libraries compilation flag could of course just probe for the signatures, similar to profile-guided optimization. > I've tried to implement trivial approach for CyFunction. Trivial means > that function accepts PyObjects as arguments and returns an PyObject, > so trivial signature is only one integer: 1 + len(args). If signature > match occurs dirct C-function is called and PyObject_Call() is used > otherwise. I didn't succeed because of argument cloning problems, we > discussed before. > > About dict lookups: it's possible to speedup dict lookup by a constant > key if we have access to dict's internal implementation. I've > implemented it for module-level lookups here: > > https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9 > > And it gave 4 times speed up for dummy test: > > def foo(): > cdef int i, r = 0 > o = foo > for i in range(10000000): > if o is foo: > r += 1 > > %timeit foo() > 1 loops, best of 3: 229 ms per loop > > %timeit foo_optimized() > 10 loops, best of 3: 54.1 ms per loop > Cool! Am I right that that translates to 5.4 ns? That's pretty good. (What CPU did you use?) Still, a function pointer call done at import time appears to be roughly 1 ns, so a full sqrt bound the way I proposed would be 2-3 ns, so 5.4ns in addition still relatively large. But, it does mean that __nomonkey__, if not completely invalid, is perhaps not exactly high-priority. For a JIT that would consult it at compile time the gain would be higher though. Dag From markflorisson88 at gmail.com Thu Apr 19 12:53:55 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 19 Apr 2012 11:53:55 +0100 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F8FBBF5.4090709@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> Message-ID: On 19 April 2012 08:17, Dag Sverre Seljebotn wrote: > On 04/19/2012 08:41 AM, Stefan Behnel wrote: >> >> Dag Sverre Seljebotn, 18.04.2012 23:35: >>> >>> from numpy import sqrt, sin >>> >>> cdef double f(double x): >>> ? ? return sqrt(x * x) # or sin(x * x) >>> >>> Of course, here one could get the pointer in the module at import time. >> >> >> That optimisation would actually be very worthwhile all by itself. I mean, >> we know what signatures we need for globally imported functions throughout >> the module, so we can reduce the call to a single jump through a function >> pointer (although likely with a preceding NULL check, which the branch >> prediction would be happy to give us for free). At least as long as sqrt >> is >> not being reassigned, but that should hit the 99% case. >> >> >>> However, here: >>> >>> from numpy import sqrt > > > Correction: "import numpy as np" > >>> >>> cdef double f(double x): >>> ? ? return np.sqrt(x * x) # or np.sin(x * x) >>> >>> the __getattr__ on np sure is larger than any effect we discuss. >> >> >> Yes, that would have to stay a .pxd case, I guess. > > > How about this mini-CEP: > > Modules are allowed to specify __nomonkey__ (or __const__, or > __notreassigned__), a list of strings naming module-level variables where > "we don't hold you responsible if you assume no monkey-patching of these". > > When doing "import numpy as np", then (assuming "np" is never reassigned in > the module), at import time we check all names looked up from it in > __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'", > i.e. the "np." is just a namespace mechanism. I like the idea. I think this could be generalized to a 'final' keyword, that could also enable optimizations for cdef class attributes. So you'd say cdef final object np import numpy as np For class attributes this would tell the compiler that it will not be rebound, which means you could check if attributes are initialized in the initializer, or just pull such checks (as wel as bounds checks), at least for memoryviews, out of loops, without worrying whether it will be reassigned in the meantime. > Needs a bit more work, it ignores the possibility that others could > monkey-patch "np" in the Cython module. > > Problem with .pxd is that currently you need to pick one overload (np.sqrt > works for n-dimensional arrays too, or takes a list and returns an array). > And even after adding 3-4 language features to Cython to make this work, > you're stuck with having to reimplement parts of NumPy in the pxd files just > so that you can early bind from Cython. > > > Dag > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Thu Apr 19 12:56:58 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 12:56:58 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> Message-ID: <4F8FEF7A.2090501@astro.uio.no> On 04/19/2012 11:07 AM, Nathaniel Smith wrote: > On Wed, Apr 18, 2012 at 10:58 PM, Dag Sverre Seljebotn > wrote: >> On 04/18/2012 11:35 PM, Dag Sverre Seljebotn wrote: >>> >>> On 04/17/2012 02:24 PM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>>>> >>>>> Travis Oliphant recently raised the issue on the NumPy list of what >>>>> mechanisms to use to box native functions produced by his Numba so >>>>> that SciPy functions can call it, e.g. (I'm making the numba part >>>>> up): >>>>> >>>>> @numba # Compiles function using LLVM def f(x): return 3 * x >>>>> >>>>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>>>> >>>>> Obviously, we want something standard, so that Cython functions can >>>>> also be called in a fast way. >>>> >>>> >>>> OK, here's the benchmark code I've written: >>>> >>>> https://github.com/dagss/cep1000 >>>> >>>> Assumptions etc.: >>>> >>>> - (Very) warm cache case is tested >>>> >>>> - I compile and link libmycallable.so, libmycaller.so and ./bench; with >>>> -fPIC, to emulate the Python environment >>>> >>>> - I use mostly pure C but use PyTypeObject in order to get the offsets >>>> to tp_flags etc right (I emulate the checking that would happen on a >>>> PyObject* according to CEP1000). >>>> >>>> - The test function is "double f(double x) { return x * x; } >>>> >>>> - The benchmark is run in a loop J=1000000 times (and time divided by >>>> J). This is repeated K=10000 times and the minimum walltime of the K run >>>> is used. This gave very stable readings on my system. >>>> >>>> Fixing loop iterations: >>>> >>>> In the initial results I just scanned the overload list until >>>> NULL-termination. It seemed to me that the code generated for this >>>> scanning was the most important factor. >>>> >>>> Therefore I fixed the number of overloads as a known compile-time macro >>>> N *in the caller*. This is somewhat optimistic; however I didn't want to >>>> play with figuring out loop unrolling etc. at the same time, and >>>> hardcoding the length of the overload list sort of took that part out of >>>> the equation. >>>> >>>> >>>> Table explanation: >>>> >>>> - N: Number of overloads in list. For N=10, there's 9 non-matching >>>> overloads in the list before the matching 10 (but caller doesn't know >>>> this). For N=1, the caller knows this and optimize for a hit in the >>>> first entry. >>>> >>>> - MISMATCHES: If set, the caller tries 4 non-matching signatures before >>>> hitting the final one. If not set, only the correct signature is tried. >>>> >>>> - LIKELY: If set, a GCC likely() macro is used to expect that the >>>> signature matches. >>>> >>>> >>>> RESULTS: >>>> >>>> Direct call (and execution of!) the function in benchmark loop took >>>> 4.8 ns. >>>> >>>> An indirect dispatch through a function pointer of known type took 5.4 ns >>>> >>>> Notation below is (intern key), in ns >>>> >>>> N=1: >>>> MISMATCHES=False: >>>> LIKELY=True: 6.44 6.44 >>>> LIKELY=False: 7.52 8.06 >>>> MISMATCHES=True: 8.59 8.59 >>>> N=10: >>>> MISMATCHES=False: 17.19 19.20 >>>> MISMATCHES=True: 36.52 37.59 >>>> >>>> To be clear, "intern" is an interned "char*" (comparison with a 64 bits >>>> global variable), while key is comparison of a size_t (comparison of a >>>> 64-bit immediate in the instruction stream). >>> >>> >>> First: My benchmarks today are a little inconsistent with earlier >>> results. I think I have converged now in terms of number of iterations >>> (higher than last time), but that doesn't explain why indirect dispatch >>> through function pointer is now *higher*: >>> >>> Direct took 4.83 ns >>> Dispatch took 5.91 ns >>> >>> Anyway, even if crude, hopefully this will tell us something. Order of >>> benchmark numbers are: >>> >>> intern key get_func_intern get_func_key >>> >>> where the get_func_XX versions retrieve a function pointer taking either >>> a single interned signature or a single key as argument (just see >>> mycallable.c). >>> >>> In the MISMATCHES case, the get_func_XX is called 4 times with a miss >>> and then with the match. >>> >>> N=1 >>> - MISMATCHES=False: >>> --- LIKELY=True: 5.91 6.44 8.59 9.13 >>> --- LIKELY=False: 7.52 7.52 9.13 9.13 >>> - MISMATCHES=True: 11.28 11.28 22.56 22.56 >>> >>> N=10 >>> - MISMATCHES=False: 17.18 18.80 29.75 10.74(*) >>> - MISMATCHES=True: 36.06 38.13 105.00 36.52 >>> >>> Benchmark comments: >>> >>> The one marked (*) is implemented as a switch statement with known keys >>> compile-time. I tried shifting around the case label values a bit but >>> the result persists; it could just be that the compiler does a very good >>> job of the switch as well. >> >> >> I should make this clearer: The issue is that the compiler may have >> reordered the labels so that the hit came close to first; in the intern case >> the code is written so that the hit is always after 9 mismatches. >> >> So I redid the (*) test using 10 cases with very different numeric values, >> and then tried each 10 as the matching case. Timings were stable for each >> choice of label (so this is not noise), with values: >> >> 13.4 11.8 11.8 12.3 10.7 11.2 12.3 >> >> Guess this is the binary decision tree Mark talked about... > > Yes, if you look at the ASM (this is easier to keep track of if you > make the switch cases into round decimal numbers, like 1000, 2000, > 3000...), then you can see that gcc is generating a fully unrolled > binary search, as basically a set of nested if/else's, like: > > if (value< 5000) { > if (value == 2000) { > return&ptr_2000; > } else if (value == 4000) { > return&ptr_4000; > } > } else { > if (value == 6000) { > return&ptr_6000; > } else if (value == 8000) { > return&ptr_8000; > } > } > > I suppose if we're ambitious we could do the same with the intern > table for Cython compile-time variants (we don't know the values ahead > of time, but we know how many there will be, so we'd generate the list > of intern values, sort it, and then replace the comparison values > above with table[middle], etc.). Right. With everything being essentially equal, this isn't getting easier. I thought of some drawbacks of getfuncptr: - Important: Doesn't allow you to actually inspect the supported signatures, which is needed (or at least convenient) if you want to use an FFI library or do some JIT-ing. So an iteration mechanism is still needed in addition, meaning the number of things for the object to implement grows a bit large. Default implementations help -- OTOH there really wasn't a major drawback with the table approach as long as JIT's can just replace it? - Minor: I've read that on Intel's Sandy Bridge, the micro-ops are cached after instruction decoding, and that micro-ops cache is so precious (and decoding so expensive) that the recommendation is no loop unrolling at all! So essentially sticking the table in unrolled instructions may not continue to be a good idea. (Of course, getfuncptr doesn't ). Dag From d.s.seljebotn at astro.uio.no Thu Apr 19 13:05:51 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 13:05:51 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8FEF7A.2090501@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> Message-ID: <4F8FF18F.1020909@astro.uio.no> On 04/19/2012 12:56 PM, Dag Sverre Seljebotn wrote: > On 04/19/2012 11:07 AM, Nathaniel Smith wrote: >> On Wed, Apr 18, 2012 at 10:58 PM, Dag Sverre Seljebotn >> wrote: >>> On 04/18/2012 11:35 PM, Dag Sverre Seljebotn wrote: >>>> >>>> On 04/17/2012 02:24 PM, Dag Sverre Seljebotn wrote: >>>>> >>>>> On 04/13/2012 12:11 AM, Dag Sverre Seljebotn wrote: >>>>>> >>>>>> Travis Oliphant recently raised the issue on the NumPy list of what >>>>>> mechanisms to use to box native functions produced by his Numba so >>>>>> that SciPy functions can call it, e.g. (I'm making the numba part >>>>>> up): >>>>>> >>>>>> @numba # Compiles function using LLVM def f(x): return 3 * x >>>>>> >>>>>> print scipy.integrate.quad(f, 1, 2) # do many callbacks natively! >>>>>> >>>>>> Obviously, we want something standard, so that Cython functions can >>>>>> also be called in a fast way. >>>>> >>>>> >>>>> OK, here's the benchmark code I've written: >>>>> >>>>> https://github.com/dagss/cep1000 >>>>> >>>>> Assumptions etc.: >>>>> >>>>> - (Very) warm cache case is tested >>>>> >>>>> - I compile and link libmycallable.so, libmycaller.so and ./bench; >>>>> with >>>>> -fPIC, to emulate the Python environment >>>>> >>>>> - I use mostly pure C but use PyTypeObject in order to get the offsets >>>>> to tp_flags etc right (I emulate the checking that would happen on a >>>>> PyObject* according to CEP1000). >>>>> >>>>> - The test function is "double f(double x) { return x * x; } >>>>> >>>>> - The benchmark is run in a loop J=1000000 times (and time divided by >>>>> J). This is repeated K=10000 times and the minimum walltime of the >>>>> K run >>>>> is used. This gave very stable readings on my system. >>>>> >>>>> Fixing loop iterations: >>>>> >>>>> In the initial results I just scanned the overload list until >>>>> NULL-termination. It seemed to me that the code generated for this >>>>> scanning was the most important factor. >>>>> >>>>> Therefore I fixed the number of overloads as a known compile-time >>>>> macro >>>>> N *in the caller*. This is somewhat optimistic; however I didn't >>>>> want to >>>>> play with figuring out loop unrolling etc. at the same time, and >>>>> hardcoding the length of the overload list sort of took that part >>>>> out of >>>>> the equation. >>>>> >>>>> >>>>> Table explanation: >>>>> >>>>> - N: Number of overloads in list. For N=10, there's 9 non-matching >>>>> overloads in the list before the matching 10 (but caller doesn't know >>>>> this). For N=1, the caller knows this and optimize for a hit in the >>>>> first entry. >>>>> >>>>> - MISMATCHES: If set, the caller tries 4 non-matching signatures >>>>> before >>>>> hitting the final one. If not set, only the correct signature is >>>>> tried. >>>>> >>>>> - LIKELY: If set, a GCC likely() macro is used to expect that the >>>>> signature matches. >>>>> >>>>> >>>>> RESULTS: >>>>> >>>>> Direct call (and execution of!) the function in benchmark loop took >>>>> 4.8 ns. >>>>> >>>>> An indirect dispatch through a function pointer of known type took >>>>> 5.4 ns >>>>> >>>>> Notation below is (intern key), in ns >>>>> >>>>> N=1: >>>>> MISMATCHES=False: >>>>> LIKELY=True: 6.44 6.44 >>>>> LIKELY=False: 7.52 8.06 >>>>> MISMATCHES=True: 8.59 8.59 >>>>> N=10: >>>>> MISMATCHES=False: 17.19 19.20 >>>>> MISMATCHES=True: 36.52 37.59 >>>>> >>>>> To be clear, "intern" is an interned "char*" (comparison with a 64 >>>>> bits >>>>> global variable), while key is comparison of a size_t (comparison of a >>>>> 64-bit immediate in the instruction stream). >>>> >>>> >>>> First: My benchmarks today are a little inconsistent with earlier >>>> results. I think I have converged now in terms of number of iterations >>>> (higher than last time), but that doesn't explain why indirect dispatch >>>> through function pointer is now *higher*: >>>> >>>> Direct took 4.83 ns >>>> Dispatch took 5.91 ns >>>> >>>> Anyway, even if crude, hopefully this will tell us something. Order of >>>> benchmark numbers are: >>>> >>>> intern key get_func_intern get_func_key >>>> >>>> where the get_func_XX versions retrieve a function pointer taking >>>> either >>>> a single interned signature or a single key as argument (just see >>>> mycallable.c). >>>> >>>> In the MISMATCHES case, the get_func_XX is called 4 times with a miss >>>> and then with the match. >>>> >>>> N=1 >>>> - MISMATCHES=False: >>>> --- LIKELY=True: 5.91 6.44 8.59 9.13 >>>> --- LIKELY=False: 7.52 7.52 9.13 9.13 >>>> - MISMATCHES=True: 11.28 11.28 22.56 22.56 >>>> >>>> N=10 >>>> - MISMATCHES=False: 17.18 18.80 29.75 10.74(*) >>>> - MISMATCHES=True: 36.06 38.13 105.00 36.52 >>>> >>>> Benchmark comments: >>>> >>>> The one marked (*) is implemented as a switch statement with known keys >>>> compile-time. I tried shifting around the case label values a bit but >>>> the result persists; it could just be that the compiler does a very >>>> good >>>> job of the switch as well. >>> >>> >>> I should make this clearer: The issue is that the compiler may have >>> reordered the labels so that the hit came close to first; in the >>> intern case >>> the code is written so that the hit is always after 9 mismatches. >>> >>> So I redid the (*) test using 10 cases with very different numeric >>> values, >>> and then tried each 10 as the matching case. Timings were stable for >>> each >>> choice of label (so this is not noise), with values: >>> >>> 13.4 11.8 11.8 12.3 10.7 11.2 12.3 >>> >>> Guess this is the binary decision tree Mark talked about... >> >> Yes, if you look at the ASM (this is easier to keep track of if you >> make the switch cases into round decimal numbers, like 1000, 2000, >> 3000...), then you can see that gcc is generating a fully unrolled >> binary search, as basically a set of nested if/else's, like: >> >> if (value< 5000) { >> if (value == 2000) { >> return&ptr_2000; >> } else if (value == 4000) { >> return&ptr_4000; >> } >> } else { >> if (value == 6000) { >> return&ptr_6000; >> } else if (value == 8000) { >> return&ptr_8000; >> } >> } >> >> I suppose if we're ambitious we could do the same with the intern >> table for Cython compile-time variants (we don't know the values ahead >> of time, but we know how many there will be, so we'd generate the list >> of intern values, sort it, and then replace the comparison values >> above with table[middle], etc.). > > Right. With everything being essentially equal, this isn't getting easier. > > I thought of some drawbacks of getfuncptr: > > - Important: Doesn't allow you to actually inspect the supported > signatures, which is needed (or at least convenient) if you want to use > an FFI library or do some JIT-ing. So an iteration mechanism is still > needed in addition, meaning the number of things for the object to > implement grows a bit large. Default implementations help -- OTOH there > really wasn't a major drawback with the table approach as long as JIT's > can just replace it? > > - Minor: I've read that on Intel's Sandy Bridge, the micro-ops are > cached after instruction decoding, and that micro-ops cache is so > precious (and decoding so expensive) that the recommendation is no loop > unrolling at all! So essentially sticking the table in unrolled > instructions may not continue to be a good idea. (Of course, getfuncptr > doesn't ). ... (Of course, getfuncptr doesn't force you to do that, you could keep traversing data, but without getfuncptr you are forced to take a table-driven approach which may be better for the micro-op cache. Pure speculation though.). Dag From stefan_ml at behnel.de Thu Apr 19 13:11:00 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 19 Apr 2012 13:11:00 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8FF18F.1020909@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F8FF18F.1020909@astro.uio.no> Message-ID: <4F8FF2C4.4040601@behnel.de> Dag Sverre Seljebotn, 19.04.2012 13:05: > Pure speculation though I think we should leave speculation to the CPUs. They are quite good at it these days. Stefan From vitja.makarov at gmail.com Thu Apr 19 13:20:38 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Thu, 19 Apr 2012 15:20:38 +0400 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F8FEC43.2020801@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> <4F8FEC43.2020801@astro.uio.no> Message-ID: 2012/4/19 Dag Sverre Seljebotn : > On 04/19/2012 10:35 AM, Vitja Makarov wrote: >> >> 2012/4/19 Dag Sverre Seljebotn: >>> >>> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>>> >>>> >>>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>>> >>>>> >>>>> from numpy import sqrt, sin >>>>> >>>>> cdef double f(double x): >>>>> ? ? return sqrt(x * x) # or sin(x * x) >>>>> >>>>> Of course, here one could get the pointer in the module at import time. >>>> >>>> >>>> >>>> That optimisation would actually be very worthwhile all by itself. I >>>> mean, >>>> we know what signatures we need for globally imported functions >>>> throughout >>>> the module, so we can reduce the call to a single jump through a >>>> function >>>> pointer (although likely with a preceding NULL check, which the branch >>>> prediction would be happy to give us for free). At least as long as sqrt >>>> is >>>> not being reassigned, but that should hit the 99% case. >>>> >>>> >>>>> However, here: >>>>> >>>>> from numpy import sqrt >>> >>> >>> >>> Correction: "import numpy as np" >>> >>>>> >>>>> cdef double f(double x): >>>>> ? ? return np.sqrt(x * x) # or np.sin(x * x) >>>>> >>>>> the __getattr__ on np sure is larger than any effect we discuss. >>>> >>>> >>>> >>>> Yes, that would have to stay a .pxd case, I guess. >>> >>> >>> >>> How about this mini-CEP: >>> >>> Modules are allowed to specify __nomonkey__ (or __const__, or >>> __notreassigned__), a list of strings naming module-level variables where >>> "we don't hold you responsible if you assume no monkey-patching of >>> these". >>> >>> When doing "import numpy as np", then (assuming "np" is never reassigned >>> in >>> the module), at import time we check all names looked up from it in >>> __nomonkey__, and if so treat them as "from numpy import sqrt as >>> 'np.sqrt'", >>> i.e. the "np." is just a namespace mechanism. >>> >>> Needs a bit more work, it ignores the possibility that others could >>> monkey-patch "np" in the Cython module. >>> >>> Problem with .pxd is that currently you need to pick one overload >>> (np.sqrt >>> works for n-dimensional arrays too, or takes a list and returns an >>> array). >>> And even after adding 3-4 language features to Cython to make this work, >>> you're stuck with having to reimplement parts of NumPy in the pxd files >>> just >>> so that you can early bind from Cython. >>> >> >> Sorry, I'm a bit late. >> >> When should __nomonkey__ be checked at compile time or at import time? > > > At import time. At compile time we generate one (potential) function pointer > per call-signature we might try. At import time we fill them in if they are > in __nomonkey__ using CEP 1000. At call time we likely() around the pointer > being non-empty, since the cost of a dict-lookup is so large anyway. > > >> It seems to me that compiler must guess function signature at compile >> time. And then check it at runtime. > > > Yes. Just like Fortran 77, where you don't declare functions, just call > them.At least with Cython it'll just go slower if you get them wrong, we > won't get a crash :-) > > If you want to help the compiler along explicitly, you would instead do > something like > > cdef double (*sqrt_double)(double) > from numpy import sqrt > > >> What if integer signature is guessed for sqrt() based on the argument >> type sqrt(16) should this call fallback to PyObject_Call() or cast an >> integer to a double at some point? > > > a) np.sqrt could export functions for all basic types (this is how NumPy > currently works under the hood anyway) > > b) It doesn't help here, but I also imagine Cython doing a 3-step or 4-step > call down the line: > > - Direct call using the types given. > - Promote all scalars to 64 bit, try again. > [- (Optional if an FFI library or LLVM is available): Parse signature string > of function and build call dynamically using a FFI library)] > - Python call > > Without an FFI library, I think giving the user a speedup if he/she writes > sqrt(3.) rather than sqrt(3) is fine... > > c) An optimize-for-host-libraries compilation flag could of course just > probe for the signatures, similar to profile-guided optimization. > Ok, np.sqrt() supports different signatures how would cython know which C-function to use? > >> I've tried to implement trivial approach for CyFunction. Trivial means >> that function accepts PyObjects as arguments and returns an PyObject, >> so trivial signature is only one integer: 1 + len(args). If signature >> match occurs dirct C-function is called and PyObject_Call() is used >> otherwise. I didn't succeed because of argument cloning problems, we >> discussed before. >> >> About dict lookups: it's possible to speedup dict lookup by a constant >> key if we have access to dict's internal implementation. I've >> implemented it for module-level lookups here: >> >> >> https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9 >> >> And it gave 4 times speed up for dummy test: >> >> def foo(): >> ? ? cdef int i, r = 0 >> ? ? o = foo >> ? ? for i in range(10000000): >> ? ? ? ? if o is foo: >> ? ? ? ? ? ? r += 1 >> >> %timeit foo() >> 1 loops, best of 3: 229 ms per loop >> >> %timeit foo_optimized() >> 10 loops, best of 3: 54.1 ms per loop >> > > Cool! Am I right that that translates to 5.4 ns? That's pretty good. (What > CPU did you use?) > Yes, the test was run on Intel I3 3.2Ghz > Still, a function pointer call done at import time appears to be roughly 1 > ns, so a full sqrt bound the way I proposed would be 2-3 ns, so 5.4ns in > addition still relatively large. > > But, it does mean that __nomonkey__, if not completely invalid, is perhaps > not exactly high-priority. For a JIT that would consult it at compile time > the gain would be higher though. > > -- vitja. From njs at pobox.com Thu Apr 19 13:20:47 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 19 Apr 2012 12:20:47 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8FEF7A.2090501@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> Message-ID: On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn wrote: > I thought of some drawbacks of getfuncptr: > > ?- Important: Doesn't allow you to actually inspect the supported > signatures, which is needed (or at least convenient) if you want to use an > FFI library or do some JIT-ing. So an iteration mechanism is still needed in > addition, meaning the number of things for the object to implement grows a > bit large. Default implementations help -- OTOH there really wasn't a major > drawback with the table approach as long as JIT's can just replace it? But this is orthogonal to the table vs. getfuncptr discussion. We're assuming that the table might be extended at runtime, which means you can't use it to determine which signatures are supported. So we need some sort of extra interface for the caller and callee to negotiate a type anyway. (I'm intentionally agnostic about whether it makes more sense for the caller or the callee to be doing the iterating... in general type negotiation could be quite complicated, and I don't think we know enough to get that interface right yet.) The other other option would be to go to the far other end of simplicity, and just forget for now about allowing multiple signatures in the same object. Do signature selection by having the user select one explicitly: @cython.inline def square(x): return x * x # .specialize is an un-standardized Cython interface # square_double is an object implementing the standardized C-callable interface square_double = square.specialize("d->d") scipy.integrate.quad(square_double) That'd be enough to get started, and doesn't rule out later extensions that do automatic type selection, once we have more experience. -- Nathaniel From stefan_ml at behnel.de Thu Apr 19 13:25:58 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 19 Apr 2012 13:25:58 +0200 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> <4F8FEC43.2020801@astro.uio.no> Message-ID: <4F8FF646.2080000@behnel.de> Vitja Makarov, 19.04.2012 13:20: > Ok, np.sqrt() supports different signatures how would cython know > which C-function to use? You might want to read through this mailing list thread and the CEP. Stefan From stefan_ml at behnel.de Thu Apr 19 13:51:16 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 19 Apr 2012 13:51:16 +0200 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> Message-ID: <4F8FFC34.9000205@behnel.de> Vitja Makarov, 19.04.2012 10:35: > I've tried to implement trivial approach for CyFunction. Trivial means > that function accepts PyObjects as arguments and returns an PyObject, > so trivial signature is only one integer: 1 + len(args). If signature > match occurs dirct C-function is called and PyObject_Call() is used > otherwise. I didn't succeed because of argument cloning problems, we > discussed before. > > About dict lookups: it's possible to speedup dict lookup by a constant > key if we have access to dict's internal implementation. I've > implemented it for module-level lookups here: > > https://github.com/vitek/cython/commit/1d134fe54a74e6fc6d39d09973db499680b2a8d9 My gut feeling tells me that this optimisation is seriously pushing it. Stefan From d.s.seljebotn at astro.uio.no Thu Apr 19 14:53:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 14:53:11 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F8FF2C4.4040601@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F8FF18F.1020909@astro.uio.no> <4F8FF2C4.4040601@behnel.de> Message-ID: <4F900AB7.7050906@astro.uio.no> On 04/19/2012 01:11 PM, Stefan Behnel wrote: > Dag Sverre Seljebotn, 19.04.2012 13:05: >> Pure speculation though > > I think we should leave speculation to the CPUs. They are quite good at it > these days. Yes, I agree, given these benchmarks, we should focus on a) Usability in C b) Simplicity c) Extensability/probability of backwards incompatability from future CEPs But even for these criteria, I don't really have an opinion about table vs. getfuncptr, or intern vs. strcmp. Where do others put their +1's? Dag From d.s.seljebotn at astro.uio.no Thu Apr 19 15:18:40 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 15:18:40 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> Message-ID: <4F9010B0.5080100@astro.uio.no> On 04/19/2012 01:20 PM, Nathaniel Smith wrote: > On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn > wrote: >> I thought of some drawbacks of getfuncptr: >> >> - Important: Doesn't allow you to actually inspect the supported >> signatures, which is needed (or at least convenient) if you want to use an >> FFI library or do some JIT-ing. So an iteration mechanism is still needed in >> addition, meaning the number of things for the object to implement grows a >> bit large. Default implementations help -- OTOH there really wasn't a major >> drawback with the table approach as long as JIT's can just replace it? > > But this is orthogonal to the table vs. getfuncptr discussion. We're > assuming that the table might be extended at runtime, which means you > can't use it to determine which signatures are supported. So we need > some sort of extra interface for the caller and callee to negotiate a > type anyway. (I'm intentionally agnostic about whether it makes more > sense for the caller or the callee to be doing the iterating... in > general type negotiation could be quite complicated, and I don't think > we know enough to get that interface right yet.) Hmm. Right. Let's define an explicit goal for the CEP then. What I care about at is getting the spec right enough such that, e.g., NumPy and SciPy, and other (mostly manually written) C extensions with slow development pace, can be forward-compatible with whatever crazy things Cython or Numba does. There's 4 cases: 1) JIT calls JIT (ruled out straight away) 2) JIT calls static: Say that Numba wants to optimize calls to np.sin etc. without special-casing; this seem to require reading a table of static signatures 3) Static calls JIT: This is the case when scipy.integrate routines calls a Numba callback and Numba generates a specialization for the dtype they explicitly needs. This calls for getfuncptr (but perhaps in a form which we can't quite determine yet?). 4) Static calls static: Either table or getfuncptr works. My gut feeling is go for 2) and 4) in this round => table. The fact that the table can be extended at runtime is then not really relevant -- perhaps there will be an API to trigger that in the future, but it can't really be made use of today. > The other other option would be to go to the far other end of > simplicity, and just forget for now about allowing multiple signatures > in the same object. Do signature selection by having the user select > one explicitly: > > @cython.inline > def square(x): > return x * x > > # .specialize is an un-standardized Cython interface > # square_double is an object implementing the standardized C-callable interface > square_double = square.specialize("d->d") > scipy.integrate.quad(square_double) > > That'd be enough to get started, and doesn't rule out later extensions > that do automatic type selection, once we have more experience. Well, I want np.sin to replace "cdef extern from 'math.h'", and then this seems to be needed... at least the possibility to have both "d->d" and "O->O". Dag From njs at pobox.com Thu Apr 19 16:28:54 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 19 Apr 2012 15:28:54 +0100 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F9010B0.5080100@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> Message-ID: On Thu, Apr 19, 2012 at 2:18 PM, Dag Sverre Seljebotn wrote: > On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >> @cython.inline >> def square(x): >> ? ? return x * x >> >> # .specialize is an un-standardized Cython interface >> # square_double is an object implementing the standardized C-callable >> interface >> square_double = square.specialize("d->d") >> scipy.integrate.quad(square_double) >> >> That'd be enough to get started, and doesn't rule out later extensions >> that do automatic type selection, once we have more experience. > > > Well, I want np.sin to replace "cdef extern from 'math.h'", and then this > seems to be needed... at least the possibility to have both "d->d" and > "O->O". Except, the C function implementing np.sin on doubles actually has a signature that's something like "&&t&i&i&t->" (PyUFuncGenericFunction), not "d->d"... so maybe this is a good example to work through! It isn't at all obvious to me how this should be made to work in any of these proposals. (Isn't "O->O" just obj->ob_type->tp_call in any case?) -- Nathaniel From d.s.seljebotn at astro.uio.no Thu Apr 19 19:01:07 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 19 Apr 2012 19:01:07 +0200 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> Message-ID: Nathaniel Smith wrote: >On Thu, Apr 19, 2012 at 2:18 PM, Dag Sverre Seljebotn > wrote: >> On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >>> @cython.inline >>> def square(x): >>> ? ? return x * x >>> >>> # .specialize is an un-standardized Cython interface >>> # square_double is an object implementing the standardized >C-callable >>> interface >>> square_double = square.specialize("d->d") >>> scipy.integrate.quad(square_double) >>> >>> That'd be enough to get started, and doesn't rule out later >extensions >>> that do automatic type selection, once we have more experience. >> >> >> Well, I want np.sin to replace "cdef extern from 'math.h'", and then >this >> seems to be needed... at least the possibility to have both "d->d" >and >> "O->O". > >Except, the C function implementing np.sin on doubles actually has a >signature that's something like "&&t&i&i&t->" >(PyUFuncGenericFunction), not "d->d"... so maybe this is a good >example to work through! It isn't at all obvious to me how this should >be made to work in any of these proposals. > >(Isn't "O->O" just obj->ob_type->tp_call in any case?) I just touched on this on the NumPy list -- the NumPy ufunc object could support the CEP in an optional way (each instance may, but doesn't have to), and then one could plug in fast scalar versions on a case by case basis, mainly using them as a namespace mechanism rather than reusing any implementation. So the C signature of the current implementation is irrelevant. I think you are right about the object case. But there is still float/double/longdouble...with ufuncs a possible target it seems a good idea in general to support multiple specializations. Though I admit that with scalars, double-only gets one pretty far. np.add(1, 2) is a rather contrived example... Dag > >-- Nathaniel >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From robertwb at gmail.com Fri Apr 20 02:52:53 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 19 Apr 2012 17:52:53 -0700 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> Message-ID: On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote: > On 19 April 2012 08:17, Dag Sverre Seljebotn wrote: >> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>> >>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>> >>>> from numpy import sqrt, sin >>>> >>>> cdef double f(double x): >>>> ? ? return sqrt(x * x) # or sin(x * x) >>>> >>>> Of course, here one could get the pointer in the module at import time. >>> >>> >>> That optimisation would actually be very worthwhile all by itself. I mean, >>> we know what signatures we need for globally imported functions throughout >>> the module, so we can reduce the call to a single jump through a function >>> pointer (although likely with a preceding NULL check, which the branch >>> prediction would be happy to give us for free). At least as long as sqrt >>> is >>> not being reassigned, but that should hit the 99% case. >>> >>> >>>> However, here: >>>> >>>> from numpy import sqrt >> >> >> Correction: "import numpy as np" >> >>>> >>>> cdef double f(double x): >>>> ? ? return np.sqrt(x * x) # or np.sin(x * x) >>>> >>>> the __getattr__ on np sure is larger than any effect we discuss. >>> >>> >>> Yes, that would have to stay a .pxd case, I guess. >> >> >> How about this mini-CEP: >> >> Modules are allowed to specify __nomonkey__ (or __const__, or >> __notreassigned__), a list of strings naming module-level variables where >> "we don't hold you responsible if you assume no monkey-patching of these". >> >> When doing "import numpy as np", then (assuming "np" is never reassigned in >> the module), at import time we check all names looked up from it in >> __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'", >> i.e. the "np." is just a namespace mechanism. > > I like the idea. I think this could be generalized to a 'final' > keyword, that could also enable optimizations for cdef class > attributes. So you'd say > > cdef final object np > import numpy as np > > For class attributes this would tell the compiler that it will not be > rebound, which means you could check if attributes are initialized in > the initializer, or just pull such checks (as wel as bounds checks), > at least for memoryviews, out of loops, without worrying whether it > will be reassigned in the meantime. final is a nice way to describe this. If we were to introduce a new keyword, static might do as well. It seems more natural to do this in the numpy.pxd file (perhaps it could just be declared as a final object) and that would allow us to not worry about re-assignment. Cython could then try to keep that contract for any modules it compiles. (This is, however, a bit more restrictive, though one can always cimport and import modules under different names.) - Robert From stefan_ml at behnel.de Fri Apr 20 08:21:41 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 20 Apr 2012 08:21:41 +0200 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> Message-ID: <4F910075.6090904@behnel.de> Robert Bradshaw, 20.04.2012 02:52: > On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote: >> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote: >>> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>>> >>>>> from numpy import sqrt, sin >>>>> >>>>> cdef double f(double x): >>>>> return sqrt(x * x) # or sin(x * x) >>>>> >>>>> Of course, here one could get the pointer in the module at import time. >>>> >>>> That optimisation would actually be very worthwhile all by itself. I mean, >>>> we know what signatures we need for globally imported functions throughout >>>> the module, so we can reduce the call to a single jump through a function >>>> pointer (although likely with a preceding NULL check, which the branch >>>> prediction would be happy to give us for free). At least as long as sqrt >>>> is not being reassigned, but that should hit the 99% case. >>>> >>>>> However, here: >>>>> >>>>> from numpy import sqrt >>> Correction: "import numpy as np" >>>>> >>>>> cdef double f(double x): >>>>> return np.sqrt(x * x) # or np.sin(x * x) >>>>> >>>>> the __getattr__ on np sure is larger than any effect we discuss. >>>> >>>> Yes, that would have to stay a .pxd case, I guess. >>> >>> How about this mini-CEP: >>> >>> Modules are allowed to specify __nomonkey__ (or __const__, or >>> __notreassigned__), a list of strings naming module-level variables where >>> "we don't hold you responsible if you assume no monkey-patching of these". >>> >>> When doing "import numpy as np", then (assuming "np" is never reassigned in >>> the module), at import time we check all names looked up from it in >>> __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'", >>> i.e. the "np." is just a namespace mechanism. >> >> I like the idea. I think this could be generalized to a 'final' >> keyword, that could also enable optimizations for cdef class >> attributes. So you'd say >> >> cdef final object np >> import numpy as np >> >> For class attributes this would tell the compiler that it will not be >> rebound, which means you could check if attributes are initialized in >> the initializer, or just pull such checks (as wel as bounds checks), >> at least for memoryviews, out of loops, without worrying whether it >> will be reassigned in the meantime. > > final is a nice way to describe this. If we were to introduce a new > keyword, static might do as well. > > It seems more natural to do this in the numpy.pxd file (perhaps it > could just be declared as a final object) and that would allow us to > not worry about re-assignment. Cython could then try to keep that > contract for any modules it compiles. (This is, however, a bit more > restrictive, though one can always cimport and import modules under > different names.) However, it's actually not the module that's "final" in this regard but the functions it exports - *they* do not change and neither do their C signatures. So the "final" modifier should stick to the functions (possibly declared at the "cdef extern" line), which would then allow us to resolve and cache the C function pointers at import time. That mimics the case of the current "final" classes and methods, where we take off the method pointers at compile time. And inside of numpy.pxd is the perfect place to declare this, not as part of the import. Stefan From d.s.seljebotn at astro.uio.no Fri Apr 20 08:49:32 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 20 Apr 2012 08:49:32 +0200 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F910075.6090904@behnel.de> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> <4F910075.6090904@behnel.de> Message-ID: <4F9106FC.8030603@astro.uio.no> On 04/20/2012 08:21 AM, Stefan Behnel wrote: > Robert Bradshaw, 20.04.2012 02:52: >> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote: >>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote: >>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>>>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>>>> >>>>>> from numpy import sqrt, sin >>>>>> >>>>>> cdef double f(double x): >>>>>> return sqrt(x * x) # or sin(x * x) >>>>>> >>>>>> Of course, here one could get the pointer in the module at import time. >>>>> >>>>> That optimisation would actually be very worthwhile all by itself. I mean, >>>>> we know what signatures we need for globally imported functions throughout >>>>> the module, so we can reduce the call to a single jump through a function >>>>> pointer (although likely with a preceding NULL check, which the branch >>>>> prediction would be happy to give us for free). At least as long as sqrt >>>>> is not being reassigned, but that should hit the 99% case. >>>>> >>>>>> However, here: >>>>>> >>>>>> from numpy import sqrt >>>> Correction: "import numpy as np" >>>>>> >>>>>> cdef double f(double x): >>>>>> return np.sqrt(x * x) # or np.sin(x * x) >>>>>> >>>>>> the __getattr__ on np sure is larger than any effect we discuss. >>>>> >>>>> Yes, that would have to stay a .pxd case, I guess. >>>> >>>> How about this mini-CEP: >>>> >>>> Modules are allowed to specify __nomonkey__ (or __const__, or >>>> __notreassigned__), a list of strings naming module-level variables where >>>> "we don't hold you responsible if you assume no monkey-patching of these". >>>> >>>> When doing "import numpy as np", then (assuming "np" is never reassigned in >>>> the module), at import time we check all names looked up from it in >>>> __nomonkey__, and if so treat them as "from numpy import sqrt as 'np.sqrt'", >>>> i.e. the "np." is just a namespace mechanism. >>> >>> I like the idea. I think this could be generalized to a 'final' >>> keyword, that could also enable optimizations for cdef class >>> attributes. So you'd say >>> >>> cdef final object np >>> import numpy as np >>> >>> For class attributes this would tell the compiler that it will not be >>> rebound, which means you could check if attributes are initialized in >>> the initializer, or just pull such checks (as wel as bounds checks), >>> at least for memoryviews, out of loops, without worrying whether it >>> will be reassigned in the meantime. >> >> final is a nice way to describe this. If we were to introduce a new >> keyword, static might do as well. >> >> It seems more natural to do this in the numpy.pxd file (perhaps it >> could just be declared as a final object) and that would allow us to >> not worry about re-assignment. Cython could then try to keep that >> contract for any modules it compiles. (This is, however, a bit more >> restrictive, though one can always cimport and import modules under >> different names.) > > However, it's actually not the module that's "final" in this regard but the > functions it exports - *they* do not change and neither do their C > signatures. So the "final" modifier should stick to the functions (possibly > declared at the "cdef extern" line), which would then allow us to resolve > and cache the C function pointers at import time. Are there any advantages at getting this information at compile time rather than import time? If you got the full signature it would be a different matter (for type inference etc.); you could essentially do something like cdef final double sin(double) cdef final float sin(float) cdef final double cos(double) ...and you would know types at compile-time, and get pointers for those at import time. > > That mimics the case of the current "final" classes and methods, where we > take off the method pointers at compile time. And inside of numpy.pxd is > the perfect place to declare this, not as part of the import. However, a) a __finals__ in the NumPy Python module is something the NumPy project can maintain, and which can be different on different releases etc. (OK, NumPy is special because it is so high profile, but any other library) b) a __finals__ is something PyPy, Numba, etc. could benefit from as well Of course, one doesn't exclude the other. And if a library implements CEP1000 + provides __finals__, it would be trivial to run a pxd generator on it. Dag From d.s.seljebotn at astro.uio.no Fri Apr 20 08:55:40 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 20 Apr 2012 08:55:40 +0200 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F9106FC.8030603@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> <4F910075.6090904@behnel.de> <4F9106FC.8030603@astro.uio.no> Message-ID: <4F91086C.4080406@astro.uio.no> On 04/20/2012 08:49 AM, Dag Sverre Seljebotn wrote: > On 04/20/2012 08:21 AM, Stefan Behnel wrote: >> Robert Bradshaw, 20.04.2012 02:52: >>> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote: >>>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote: >>>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>>>>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>>>>> >>>>>>> from numpy import sqrt, sin >>>>>>> >>>>>>> cdef double f(double x): >>>>>>> return sqrt(x * x) # or sin(x * x) >>>>>>> >>>>>>> Of course, here one could get the pointer in the module at import >>>>>>> time. >>>>>> >>>>>> That optimisation would actually be very worthwhile all by itself. >>>>>> I mean, >>>>>> we know what signatures we need for globally imported functions >>>>>> throughout >>>>>> the module, so we can reduce the call to a single jump through a >>>>>> function >>>>>> pointer (although likely with a preceding NULL check, which the >>>>>> branch >>>>>> prediction would be happy to give us for free). At least as long >>>>>> as sqrt >>>>>> is not being reassigned, but that should hit the 99% case. >>>>>> >>>>>>> However, here: >>>>>>> >>>>>>> from numpy import sqrt >>>>> Correction: "import numpy as np" >>>>>>> >>>>>>> cdef double f(double x): >>>>>>> return np.sqrt(x * x) # or np.sin(x * x) >>>>>>> >>>>>>> the __getattr__ on np sure is larger than any effect we discuss. >>>>>> >>>>>> Yes, that would have to stay a .pxd case, I guess. >>>>> >>>>> How about this mini-CEP: >>>>> >>>>> Modules are allowed to specify __nomonkey__ (or __const__, or >>>>> __notreassigned__), a list of strings naming module-level variables >>>>> where >>>>> "we don't hold you responsible if you assume no monkey-patching of >>>>> these". >>>>> >>>>> When doing "import numpy as np", then (assuming "np" is never >>>>> reassigned in >>>>> the module), at import time we check all names looked up from it in >>>>> __nomonkey__, and if so treat them as "from numpy import sqrt as >>>>> 'np.sqrt'", >>>>> i.e. the "np." is just a namespace mechanism. >>>> >>>> I like the idea. I think this could be generalized to a 'final' >>>> keyword, that could also enable optimizations for cdef class >>>> attributes. So you'd say >>>> >>>> cdef final object np >>>> import numpy as np >>>> >>>> For class attributes this would tell the compiler that it will not be >>>> rebound, which means you could check if attributes are initialized in >>>> the initializer, or just pull such checks (as wel as bounds checks), >>>> at least for memoryviews, out of loops, without worrying whether it >>>> will be reassigned in the meantime. >>> >>> final is a nice way to describe this. If we were to introduce a new >>> keyword, static might do as well. >>> >>> It seems more natural to do this in the numpy.pxd file (perhaps it >>> could just be declared as a final object) and that would allow us to >>> not worry about re-assignment. Cython could then try to keep that >>> contract for any modules it compiles. (This is, however, a bit more >>> restrictive, though one can always cimport and import modules under >>> different names.) >> >> However, it's actually not the module that's "final" in this regard >> but the >> functions it exports - *they* do not change and neither do their C >> signatures. So the "final" modifier should stick to the functions >> (possibly >> declared at the "cdef extern" line), which would then allow us to resolve >> and cache the C function pointers at import time. > > Are there any advantages at getting this information at compile time > rather than import time? > > If you got the full signature it would be a different matter (for type > inference etc.); you could essentially do something like > > cdef final double sin(double) > cdef final float sin(float) > cdef final double cos(double) In fact, "final" is sort of implied whenever a pxd is implied. The mere act of providing a pxd means you expect early binding to happen. So I think this boils down to simply allowing to resolve ABIs declared in pxd files through CEP 1000 instead of assuming it is a Cython module: cdef double sin(double) cdef double cos(double) We could first look for the Cython ABI at import time, and if that isn't there, fall back to CEP 1000. And in time, deprecate the Cython ABI in favour of CEP 1000 (and follow-up CEPs to make it complete enough). The __nomonkey__ was something else, a proposal about a pxd-less approach. We can do both. Dag > > ...and you would know types at compile-time, and get pointers for those > at import time. > >> >> That mimics the case of the current "final" classes and methods, where we >> take off the method pointers at compile time. And inside of numpy.pxd is >> the perfect place to declare this, not as part of the import. > > However, > > a) a __finals__ in the NumPy Python module is something the NumPy > project can maintain, and which can be different on different releases > etc. (OK, NumPy is special because it is so high profile, but any other > library) > > b) a __finals__ is something PyPy, Numba, etc. could benefit from as well > > Of course, one doesn't exclude the other. And if a library implements > CEP1000 + provides __finals__, it would be trivial to run a pxd > generator on it. > > Dag From robertwb at gmail.com Fri Apr 20 08:58:18 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Thu, 19 Apr 2012 23:58:18 -0700 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F9106FC.8030603@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> <4F910075.6090904@behnel.de> <4F9106FC.8030603@astro.uio.no> Message-ID: On Thu, Apr 19, 2012 at 11:49 PM, Dag Sverre Seljebotn wrote: > On 04/20/2012 08:21 AM, Stefan Behnel wrote: >> >> Robert Bradshaw, 20.04.2012 02:52: >>> >>> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote: >>>> >>>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote: >>>>> >>>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>>>>> >>>>>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>>>>> >>>>>>> >>>>>>> from numpy import sqrt, sin >>>>>>> >>>>>>> cdef double f(double x): >>>>>>> ? ? return sqrt(x * x) # or sin(x * x) >>>>>>> >>>>>>> Of course, here one could get the pointer in the module at import >>>>>>> time. >>>>>> >>>>>> >>>>>> That optimisation would actually be very worthwhile all by itself. I >>>>>> mean, >>>>>> we know what signatures we need for globally imported functions >>>>>> throughout >>>>>> the module, so we can reduce the call to a single jump through a >>>>>> function >>>>>> pointer (although likely with a preceding NULL check, which the branch >>>>>> prediction would be happy to give us for free). At least as long as >>>>>> sqrt >>>>>> is not being reassigned, but that should hit the 99% case. >>>>>> >>>>>>> However, here: >>>>>>> >>>>>>> from numpy import sqrt >>>>> >>>>> Correction: "import numpy as np" >>>>>>> >>>>>>> >>>>>>> cdef double f(double x): >>>>>>> ? ? return np.sqrt(x * x) # or np.sin(x * x) >>>>>>> >>>>>>> the __getattr__ on np sure is larger than any effect we discuss. >>>>>> >>>>>> >>>>>> Yes, that would have to stay a .pxd case, I guess. >>>>> >>>>> >>>>> How about this mini-CEP: >>>>> >>>>> Modules are allowed to specify __nomonkey__ (or __const__, or >>>>> __notreassigned__), a list of strings naming module-level variables >>>>> where >>>>> "we don't hold you responsible if you assume no monkey-patching of >>>>> these". >>>>> >>>>> When doing "import numpy as np", then (assuming "np" is never >>>>> reassigned in >>>>> the module), at import time we check all names looked up from it in >>>>> __nomonkey__, and if so treat them as "from numpy import sqrt as >>>>> 'np.sqrt'", >>>>> i.e. the "np." is just a namespace mechanism. >>>> >>>> >>>> I like the idea. I think this could be generalized to a 'final' >>>> keyword, that could also enable optimizations for cdef class >>>> attributes. So you'd say >>>> >>>> cdef final object np >>>> import numpy as np >>>> >>>> For class attributes this would tell the compiler that it will not be >>>> rebound, which means you could check if attributes are initialized in >>>> the initializer, or just pull such checks (as wel as bounds checks), >>>> at least for memoryviews, out of loops, without worrying whether it >>>> will be reassigned in the meantime. >>> >>> >>> final is a nice way to describe this. If we were to introduce a new >>> keyword, static might do as well. >>> >>> It seems more natural to do this in the numpy.pxd file (perhaps it >>> could just be declared as a final object) and that would allow us to >>> not worry about re-assignment. Cython could then try to keep that >>> contract for any modules it compiles. (This is, however, a bit more >>> restrictive, though one can always cimport and import modules under >>> different names.) >> >> >> However, it's actually not the module that's "final" in this regard but >> the >> functions it exports - *they* do not change and neither do their C >> signatures. So the "final" modifier should stick to the functions >> (possibly >> declared at the "cdef extern" line), which would then allow us to resolve >> and cache the C function pointers at import time. Yes, I was thinking about decorating the functions, not the module. > Are there any advantages at getting this information at compile time rather > than import time? > > If you got the full signature it would be a different matter (for type > inference etc.); you could essentially do something like > > cdef final double sin(double) > cdef final float sin(float) > cdef final double cos(double) > > ...and you would know types at compile-time, and get pointers for those at > import time. > > >> >> That mimics the case of the current "final" classes and methods, where we >> take off the method pointers at compile time. And inside of numpy.pxd is >> the perfect place to declare this, not as part of the import. > > > However, > > a) a __finals__ in the NumPy Python module is something the NumPy project > can maintain, and which can be different on different releases etc. (OK, > NumPy is special because it is so high profile, but any other library) > > b) a __finals__ is something PyPy, Numba, etc. could benefit from as well > > Of course, one doesn't exclude the other. And if a library implements > CEP1000 + provides __finals__, it would be trivial to run a pxd generator on > it. This seems rather orthogonal to the CEP 1000 proposal; there are lots of optimizations that could be done by knowing a member of an object will not be re-assigned. One can currently write cdef np_sin from numpy import sin as np_sin which would accomplish the same thing, right? - Robert From robertwb at gmail.com Fri Apr 20 09:02:59 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 20 Apr 2012 00:02:59 -0700 Subject: [Cython] New early-binding concept [was: CEP1000] In-Reply-To: <4F91086C.4080406@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8FB3A4.7080700@behnel.de> <4F8FBBF5.4090709@astro.uio.no> <4F910075.6090904@behnel.de> <4F9106FC.8030603@astro.uio.no> <4F91086C.4080406@astro.uio.no> Message-ID: On Thu, Apr 19, 2012 at 11:55 PM, Dag Sverre Seljebotn wrote: > On 04/20/2012 08:49 AM, Dag Sverre Seljebotn wrote: >> >> On 04/20/2012 08:21 AM, Stefan Behnel wrote: >>> >>> Robert Bradshaw, 20.04.2012 02:52: >>>> >>>> On Thu, Apr 19, 2012 at 3:53 AM, mark florisson wrote: >>>>> >>>>> On 19 April 2012 08:17, Dag Sverre Seljebotn wrote: >>>>>> >>>>>> On 04/19/2012 08:41 AM, Stefan Behnel wrote: >>>>>>> >>>>>>> Dag Sverre Seljebotn, 18.04.2012 23:35: >>>>>>>> >>>>>>>> >>>>>>>> from numpy import sqrt, sin >>>>>>>> >>>>>>>> cdef double f(double x): >>>>>>>> return sqrt(x * x) # or sin(x * x) >>>>>>>> >>>>>>>> Of course, here one could get the pointer in the module at import >>>>>>>> time. >>>>>>> >>>>>>> >>>>>>> That optimisation would actually be very worthwhile all by itself. >>>>>>> I mean, >>>>>>> we know what signatures we need for globally imported functions >>>>>>> throughout >>>>>>> the module, so we can reduce the call to a single jump through a >>>>>>> function >>>>>>> pointer (although likely with a preceding NULL check, which the >>>>>>> branch >>>>>>> prediction would be happy to give us for free). At least as long >>>>>>> as sqrt >>>>>>> is not being reassigned, but that should hit the 99% case. >>>>>>> >>>>>>>> However, here: >>>>>>>> >>>>>>>> from numpy import sqrt >>>>>> >>>>>> Correction: "import numpy as np" >>>>>>>> >>>>>>>> >>>>>>>> cdef double f(double x): >>>>>>>> return np.sqrt(x * x) # or np.sin(x * x) >>>>>>>> >>>>>>>> the __getattr__ on np sure is larger than any effect we discuss. >>>>>>> >>>>>>> >>>>>>> Yes, that would have to stay a .pxd case, I guess. >>>>>> >>>>>> >>>>>> How about this mini-CEP: >>>>>> >>>>>> Modules are allowed to specify __nomonkey__ (or __const__, or >>>>>> __notreassigned__), a list of strings naming module-level variables >>>>>> where >>>>>> "we don't hold you responsible if you assume no monkey-patching of >>>>>> these". >>>>>> >>>>>> When doing "import numpy as np", then (assuming "np" is never >>>>>> reassigned in >>>>>> the module), at import time we check all names looked up from it in >>>>>> __nomonkey__, and if so treat them as "from numpy import sqrt as >>>>>> 'np.sqrt'", >>>>>> i.e. the "np." is just a namespace mechanism. >>>>> >>>>> >>>>> I like the idea. I think this could be generalized to a 'final' >>>>> keyword, that could also enable optimizations for cdef class >>>>> attributes. So you'd say >>>>> >>>>> cdef final object np >>>>> import numpy as np >>>>> >>>>> For class attributes this would tell the compiler that it will not be >>>>> rebound, which means you could check if attributes are initialized in >>>>> the initializer, or just pull such checks (as wel as bounds checks), >>>>> at least for memoryviews, out of loops, without worrying whether it >>>>> will be reassigned in the meantime. >>>> >>>> >>>> final is a nice way to describe this. If we were to introduce a new >>>> keyword, static might do as well. >>>> >>>> It seems more natural to do this in the numpy.pxd file (perhaps it >>>> could just be declared as a final object) and that would allow us to >>>> not worry about re-assignment. Cython could then try to keep that >>>> contract for any modules it compiles. (This is, however, a bit more >>>> restrictive, though one can always cimport and import modules under >>>> different names.) >>> >>> >>> However, it's actually not the module that's "final" in this regard >>> but the >>> functions it exports - *they* do not change and neither do their C >>> signatures. So the "final" modifier should stick to the functions >>> (possibly >>> declared at the "cdef extern" line), which would then allow us to resolve >>> and cache the C function pointers at import time. >> >> >> Are there any advantages at getting this information at compile time >> rather than import time? >> >> If you got the full signature it would be a different matter (for type >> inference etc.); you could essentially do something like >> >> cdef final double sin(double) >> cdef final float sin(float) >> cdef final double cos(double) > > > In fact, "final" is sort of implied whenever a pxd is implied. The mere act > of providing a pxd means you expect early binding to happen. So I think this > boils down to simply allowing to resolve ABIs declared in pxd files through > CEP 1000 instead of assuming it is a Cython module: > > cdef double sin(double) > cdef double cos(double) > > We could first look for the Cython ABI at import time, and if that isn't > there, fall back to CEP 1000. And in time, deprecate the Cython ABI in > favour of CEP 1000 (and follow-up CEPs to make it complete enough). Makes sense. > The __nomonkey__ was something else, a proposal about a pxd-less approach. > We can do both. If __nomonkey__ is inspected at runtime, then the calling module would have to opportunistically guess what might be in that list at compile time, and still generate the lookup code just in case. I guess this idea doesn't seem very fleshed out yet; its advantages, caveats, and semantics are still quite fuzzy. From robertwb at gmail.com Fri Apr 20 09:30:54 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Fri, 20 Apr 2012 00:30:54 -0700 Subject: [Cython] CEP1000: Native dispatch through callables In-Reply-To: <4F9010B0.5080100@astro.uio.no> References: <4F87530F.7050000@astro.uio.no> <4F8D6112.1000906@astro.uio.no> <4F8F33AE.50401@astro.uio.no> <4F8F38F9.7020008@astro.uio.no> <4F8FEF7A.2090501@astro.uio.no> <4F9010B0.5080100@astro.uio.no> Message-ID: On Thu, Apr 19, 2012 at 6:18 AM, Dag Sverre Seljebotn wrote: > On 04/19/2012 01:20 PM, Nathaniel Smith wrote: >> >> On Thu, Apr 19, 2012 at 11:56 AM, Dag Sverre Seljebotn >> ?wrote: >>> >>> I thought of some drawbacks of getfuncptr: >>> >>> ?- Important: Doesn't allow you to actually inspect the supported >>> signatures, which is needed (or at least convenient) if you want to use >>> an >>> FFI library or do some JIT-ing. So an iteration mechanism is still needed >>> in >>> addition, meaning the number of things for the object to implement grows >>> a >>> bit large. Default implementations help -- OTOH there really wasn't a >>> major >>> drawback with the table approach as long as JIT's can just replace it? >> >> >> But this is orthogonal to the table vs. getfuncptr discussion. We're >> assuming that the table might be extended at runtime, which means you >> can't use it to determine which signatures are supported. So we need >> some sort of extra interface for the caller and callee to negotiate a >> type anyway. (I'm intentionally agnostic about whether it makes more >> sense for the caller or the callee to be doing the iterating... in >> general type negotiation could be quite complicated, and I don't think >> we know enough to get that interface right yet.) > > > Hmm. Right. Let's define an explicit goal for the CEP then. > > What I care about at is getting the spec right enough such that, e.g., NumPy > and SciPy, and other (mostly manually written) C extensions with slow > development pace, can be forward-compatible with whatever crazy things > Cython or Numba does. > > There's 4 cases: > > ?1) JIT calls JIT (ruled out straight away) > > ?2) JIT calls static: Say that Numba wants to optimize calls to np.sin etc. > without special-casing; this seem to require reading a table of static > signatures > > ?3) Static calls JIT: This is the case when scipy.integrate routines calls a > Numba callback and Numba generates a specialization for the dtype they > explicitly needs. This calls for getfuncptr (but perhaps in a form which we > can't quite determine yet?). > > ?4) Static calls static: Either table or getfuncptr works. > > My gut feeling is go for 2) and 4) in this round => table. getfuncptr is really simple and flexible, but I'm with you on both of these to points, and the overhead was not trivial. Of course we could offer both, i.e. look at the table first, if it's not there call getfuncptr if it's non-null, then fall back to "slow" call or error. These are all opt-in depending on how hard you want to try to optimize things. As far as keys vs. interning, I'm also tempted to try to have my cake and eat it too. Define a space-friendly encoding for signatures and require interning for anything that doesn't fit into a single sizeof(void*). The fact that this cutoff would vary for 32 vs 64-bit would require some care, but could be done with macros in C. If the signatures produce non-aligned "pointer" values there won't be any collisions, and this way libraries only have to share in the global (Python-level?) interning scheme iff they want to expose/use "large" signatures. > The fact that the table can be extended at runtime is then not really > relevant -- perhaps there will be an API to trigger that in the future, but > it can't really be made use of today. > > >> The other other option would be to go to the far other end of >> simplicity, and just forget for now about allowing multiple signatures >> in the same object. Do signature selection by having the user select >> one explicitly: >> >> @cython.inline >> def square(x): >> ? ? return x * x >> >> # .specialize is an un-standardized Cython interface >> # square_double is an object implementing the standardized C-callable >> interface >> square_double = square.specialize("d->d") >> scipy.integrate.quad(square_double) >> >> That'd be enough to get started, and doesn't rule out later extensions >> that do automatic type selection, once we have more experience. > > > Well, I want np.sin to replace "cdef extern from 'math.h'", and then this > seems to be needed... at least the possibility to have both "d->d" and > "O->O". +1, not supporting this would be a huge defect. - Robert From stefan_ml at behnel.de Fri Apr 20 21:04:20 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 20 Apr 2012 21:04:20 +0200 Subject: [Cython] Cython 0.16 RC 2 In-Reply-To: References: Message-ID: <4F91B334.8020307@behnel.de> mark florisson, 15.04.2012 20:59: > Hopefully a final release candidate for the 0.16 release can be found > here: http://wiki.cython.org/ReleaseNotes-0.16 . This corresponds to > the 'release' branch of the cython repository on github. I pushed another couple of fixes related to the recent importlib changes in Py3k which also apply to older Py3 releases, as well as Robert's fix for C keywords. Jenkins is very happy with them. Is there anything left that would block the release now? Stefan From d.s.seljebotn at astro.uio.no Sat Apr 21 07:27:09 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 21 Apr 2012 07:27:09 +0200 Subject: [Cython] Julialang Message-ID: <4F92452D.7030507@astro.uio.no> Just heard about the Julia language and wanted to make sure it's on everybody's radar: http://julialang.org It's the first really decent language designed for scientists. Seems impressive to me, there's a few Cython features: - Dynamic typing with optional static types - Call C directly And then comes: - JIT - Templates - "Green" threading/coroutiens - Multiple dispatch (yay!) - Lisp-like macros and other metaprogramming facilities - Designed for parallelism and distributed computation Dag From d.s.seljebotn at astro.uio.no Sat Apr 21 07:27:42 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 21 Apr 2012 07:27:42 +0200 Subject: [Cython] Julialang In-Reply-To: <4F92452D.7030507@astro.uio.no> References: <4F92452D.7030507@astro.uio.no> Message-ID: <4F92454E.3040000@astro.uio.no> On 04/21/2012 07:27 AM, Dag Sverre Seljebotn wrote: > Just heard about the Julia language and wanted to make sure it's on > everybody's radar: > > http://julialang.org > > It's the first really decent language designed for scientists. Seems ...that I've heard of, that is. Dag > impressive to me, there's a few Cython features: > > - Dynamic typing with optional static types > - Call C directly > > And then comes: > > - JIT > - Templates > - "Green" threading/coroutiens > - Multiple dispatch (yay!) > - Lisp-like macros and other metaprogramming facilities > - Designed for parallelism and distributed computation > > Dag From stefan_ml at behnel.de Sat Apr 21 08:20:00 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Apr 2012 08:20:00 +0200 Subject: [Cython] Julialang In-Reply-To: <4F92452D.7030507@astro.uio.no> References: <4F92452D.7030507@astro.uio.no> Message-ID: <4F925190.9020609@behnel.de> Dag Sverre Seljebotn, 21.04.2012 07:27: > Just heard about the Julia language and wanted to make sure it's on > everybody's radar: > > http://julialang.org > > It's the first really decent language designed for scientists. Seems > impressive to me, there's a few Cython features: > > - Dynamic typing with optional static types > - Call C directly > > And then comes: > > - JIT > - Templates > - "Green" threading/coroutiens > - Multiple dispatch (yay!) > - Lisp-like macros and other metaprogramming facilities > - Designed for parallelism and distributed computation They say that it comes as a shared library, so it might work to wrap it in Cython. However, it's not clear to me how you would call into Julia code from C code. They only emphasise the other direction, as seems to be usual for language implementations that try to advertise the beauties of their JIT compiler and their cool "we can call C" features. So it's hard to tell how much work it would be or how efficient it could become to call a wrapped Julia function efficiently based on a CEP1000 signature ID. At least the signature of the Julia function can easily be analysed, it seams. That's pretty cool already. Stefan From markflorisson88 at gmail.com Sat Apr 21 14:04:25 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 21 Apr 2012 13:04:25 +0100 Subject: [Cython] Cython 0.16 RC 2 In-Reply-To: <4F91B334.8020307@behnel.de> References: <4F91B334.8020307@behnel.de> Message-ID: On 20 April 2012 20:04, Stefan Behnel wrote: > mark florisson, 15.04.2012 20:59: >> Hopefully a final release candidate for the 0.16 release can be found >> here: http://wiki.cython.org/ReleaseNotes-0.16 . This corresponds to >> the 'release' branch of the cython repository on github. > > I pushed another couple of fixes related to the recent importlib changes in > Py3k which also apply to older Py3 releases, as well as Robert's fix for C > keywords. Jenkins is very happy with them. > > Is there anything left that would block the release now? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel We're all set, but I can't upload the sdist to pypi: Upload failed (403): You are not allowed to edit 'Cython' package information If someone can give me those permissions or upload the sdist, then we can send updates to the mailing lists. The website is already updated. From markflorisson88 at gmail.com Sat Apr 21 14:05:57 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 21 Apr 2012 13:05:57 +0100 Subject: [Cython] Cython 0.16 RC 2 In-Reply-To: <4F91B334.8020307@behnel.de> References: <4F91B334.8020307@behnel.de> Message-ID: On 20 April 2012 20:04, Stefan Behnel wrote: > mark florisson, 15.04.2012 20:59: >> Hopefully a final release candidate for the 0.16 release can be found >> here: http://wiki.cython.org/ReleaseNotes-0.16 . This corresponds to >> the 'release' branch of the cython repository on github. > > I pushed another couple of fixes related to the recent importlib changes in > Py3k which also apply to older Py3 releases, as well as Robert's fix for C > keywords. Jenkins is very happy with them. If you want, you can update the release notes accordingly (I also wouldn't mind a double check to see if I haven't missed any important release information). > Is there anything left that would block the release now? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sat Apr 21 15:02:33 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 21 Apr 2012 14:02:33 +0100 Subject: [Cython] Cython 0.16 Message-ID: We are pleased to announce a new version of Cython, 0.16 (http://cython.org/release/Cython-0.16.tar.gz). It comes with new features, improvements and bug fixes, including - super() without arguments - fused types - memoryviews - more Python-like functions Many thanks to the many contributors of this release and to all bug reporters and supporting users! A more comprehensive list of features and contributors can be found here: http://wiki.cython.org/ReleaseNotes-0.16 , and an overview of bug fixes can be found here: http://trac.cython.org/cython_trac/query?status=closed&group=component&order=id&col=id&col=summary&col=milestone&col=status&col=type&col=priority&col=component&milestone=0.16&desc=1 Enjoy! From stefan_ml at behnel.de Sat Apr 21 15:50:16 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Apr 2012 15:50:16 +0200 Subject: [Cython] Cython 0.16 In-Reply-To: References: Message-ID: <4F92BB18.60003@behnel.de> mark florisson, 21.04.2012 15:02: > We are pleased to announce a new version of Cython, 0.16 > Many thanks to the many contributors of this release and to all bug > reporters and supporting users! > Enjoy! Thanks, Mark! Stefan From vitja.makarov at gmail.com Sat Apr 21 17:50:53 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sat, 21 Apr 2012 19:50:53 +0400 Subject: [Cython] Cython 0.16 In-Reply-To: <4F92BB18.60003@behnel.de> References: <4F92BB18.60003@behnel.de> Message-ID: 2012/4/21 Stefan Behnel : > mark florisson, 21.04.2012 15:02: >> We are pleased to announce a new version of Cython, 0.16 >> Many thanks to the many contributors of this release and to all bug >> reporters and supporting users! >> Enjoy! > > Thanks, Mark! > Cool, thanks! -- vitja. From dtcaciuc at gmail.com Sat Apr 21 21:17:52 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Sat, 21 Apr 2012 12:17:52 -0700 Subject: [Cython] `cdef inline` and typed memory views Message-ID: Hey everyone, Congratulations on shipping 0.16! I think I found a problem which seems pretty straight forward. Say I want to factor out inner part of some N^2 loops over a flow array, I write something like cdef inline float _inner(size_t i, size_t j, float[:] x): cdef float d = x[i] - x[j] return sqrtf(d * d) In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and function is declared as inline, which is great. However, the memoryview structure is passed by value: static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i, size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) { ... This seems to hinder compiler's (in my case, GCC 4.3.4) ability to perform efficient inlining (although function does in fact get inlined). If I manually inline that distance calculation, I get 3x speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k elements). When I manually modified generated .c file to pass memory view slice by pointer, slowdown was eliminated completely. On a somewhat relevant node, have you considered enabling Issues page on Github? Thanks! Dimitri. From stefan_ml at behnel.de Sat Apr 21 23:48:02 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Apr 2012 23:48:02 +0200 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: References: Message-ID: <4F932B12.5020507@behnel.de> Dimitri Tcaciuc, 21.04.2012 21:17: > On a somewhat relevant node, have you considered enabling Issues page on Github? It was discussed, but the drawback of having two separate bug trackers is non-negligible. Stefan From nmd at illinois.edu Sun Apr 22 05:21:55 2012 From: nmd at illinois.edu (Nathan Dunfield) Date: Sat, 21 Apr 2012 22:21:55 -0500 Subject: [Cython] Cython 0.16: problems with "easy_install" Message-ID: Dear all, On OS X Snow Leopard with XCode 3.2.*, I encountered the following issues when using "easy_install" to install the new Cython 0.16: (a) With Python 2.7 where Cython 0.15.1 had previously been installed, "easy_install" failed with the below error message; looks like it's somehow using the existing Cython for part of the compilation and then failing. After I deleted the existing egg in site-packages it easy_installed fine. (b) With Python 3.2 and no Cython installed in site-packages, it chokes with the following error: haken ~ (8) py3 -m easy_install -U cython Searching for cython Reading http://pypi.python.org/simple/cython/ Couldn't find index page for 'cython' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading http://pypi.python.org/simple/ Reading http://pypi.python.org/simple/Cython/ Reading http://www.cython.org Reading http://cython.org Best match: Cython 0.16 Downloading http://www.cython.org/release/Cython-0.16.zip Processing Cython-0.16.zip Running Cython-0.16/setup.py -q bdist_egg --dist-dir /tmp/easy_install-ers4by/Cython-0.16/egg-dist-tmp-a6gz6a warning: no files found matching '*.pyx' under directory 'Cython/Debugger/Tests' warning: no files found matching '*.pxd' under directory 'Cython/Debugger/Tests' warning: no files found matching '*.h' under directory 'Cython/Debugger/Tests' warning: no files found matching '*.pxd' under directory 'Cython/Utility' warning: no files found matching '*.h' under directory 'Cython/Utility' warning: no files found matching '.cpp' under directory 'Cython/Utility' i686-apple-darwin10-gcc-4.2.1: /tmp/easy_install-ers4by/Cython-0.16/Cython/Runtime/refnanny.c: No such file or directory i686-apple-darwin10-gcc-4.2.1: no input files i686-apple-darwin10-gcc-4.2.1: /tmp/easy_install-ers4by/Cython-0.16/Cython/Runtime/refnanny.c: No such file or directory i686-apple-darwin10-gcc-4.2.1: no input files lipo: can't figure out the architecture type of: /var/tmp//ccvgdiS6.out error: Setup script exited with error: command 'gcc' failed with exit status 1 If I download the .zip file and run setup.py by hand it installs fine. Best, Nathan Error when easy_installing with Python 2.7: haken ~ (1) py -m easy_install -U cython Searching for cython Reading http://pypi.python.org/simple/cython/ Reading http://www.cython.org Reading http://cython.org Best match: Cython 0.16 Downloading http://www.cython.org/release/Cython-0.16.zip Processing Cython-0.16.zip Running Cython-0.16/setup.py -q bdist_egg --dist-dir /tmp/easy_install-1x8vwP/Cython-0.16/egg-dist-tmp-dx99kU Compiling module Cython.Plex.Scanners ... Compiling module Cython.Plex.Actions ... Compiling module Cython.Compiler.Lexicon ... Compiling module Cython.Compiler.Scanning ... Compiling module Cython.Compiler.Parsing ... Compiling module Cython.Compiler.Visitor ... Error compiling Cython file: ------------------------------------------------------------ ... raise Errors.CompilerCrash( getattr(last_node, 'pos', None), self.__class__.__name__, u'\n'.join(trace), e, stacktrace) @cython.final def find_handler(self, obj): ^ ------------------------------------------------------------ /tmp/easy_install-1x8vwP/Cython-0.16/Cython/Compiler/Visitor.py:138:4: The final compiler directive is not allowed in function scope Error compiling Cython file: ------------------------------------------------------------ ... def visit(self, obj): return self._visit(obj) @cython.final def _visit(self, obj): ^ ------------------------------------------------------------ /tmp/easy_install-1x8vwP/Cython-0.16/Cython/Compiler/Visitor.py:159:4: The final compiler directive is not allowed in function scope Error compiling Cython file: ------------------------------------------------------------ ... handler_method = self.find_handler(obj) self.dispatch_table[type(obj)] = handler_method return handler_method(obj) @cython.final def _visitchild(self, child, parent, attrname, idx): ^ ------------------------------------------------------------ /tmp/easy_install-1x8vwP/Cython-0.16/Cython/Compiler/Visitor.py:168:4: The final compiler directive is not allowed in function scope Error compiling Cython file: ------------------------------------------------------------ ... def visitchildren(self, parent, attrs=None): return self._visitchildren(parent, attrs) @cython.final def _visitchildren(self, parent, attrs): ^ ------------------------------------------------------------ /tmp/easy_install-1x8vwP/Cython-0.16/Cython/Compiler/Visitor.py:192:4: The final compiler directive is not allowed in function scope Compilation failed From robertwb at gmail.com Sun Apr 22 07:10:13 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Sat, 21 Apr 2012 22:10:13 -0700 Subject: [Cython] Julialang In-Reply-To: <4F92452D.7030507@astro.uio.no> References: <4F92452D.7030507@astro.uio.no> Message-ID: Yes, Julia looks really cool. It's been on my radar for a while, but I haven't had a chance to really try it out for anything yet. But I hadn't thought about low-level Python/Cython <-> Julia integration. That sounds very interesting. I wonder if Jython could give any insight into to the tight interaction between two languages that are usually used in isolation but have been made to call each other (though there are a lot of differences too, e.g. we're not targeting replacing the CPython interpreter (on first pass at least...)). - Robert On Fri, Apr 20, 2012 at 10:27 PM, Dag Sverre Seljebotn wrote: > Just heard about the Julia language and wanted to make sure it's on > everybody's radar: > > http://julialang.org > > It's the first really decent language designed for scientists. Seems > impressive to me, there's a few Cython features: > > ?- Dynamic typing with optional static types > ?- Call C directly > > And then comes: > > ?- JIT > ?- Templates > ?- "Green" threading/coroutiens > ?- Multiple dispatch (yay!) > ?- Lisp-like macros and other metaprogramming facilities > ?- Designed for parallelism and distributed computation > > Dag > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From dalcinl at gmail.com Sun Apr 22 10:59:08 2012 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sun, 22 Apr 2012 11:59:08 +0300 Subject: [Cython] Julialang In-Reply-To: References: <4F92452D.7030507@astro.uio.no> Message-ID: On 22 April 2012 08:10, Robert Bradshaw wrote: > Yes, Julia looks really cool. It's been on my radar for a while, but I > haven't had a chance to really try it out for anything yet. But I > hadn't thought about low-level Python/Cython <-> Julia integration. > That sounds very interesting. I wonder if Jython could give any > insight into to the tight interaction between two languages that are > usually used in isolation but have been made to call each other > (though there are a lot of differences too, e.g. we're not targeting > replacing the CPython interpreter (on first pass at least...)). > Are you all aware that "calling C" actually means a ctypes-like functionality based in dlopen()/dlsym() ? http://julialang.org/manual/calling-c-and-fortran-code/. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From stefan_ml at behnel.de Sun Apr 22 12:57:17 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 22 Apr 2012 12:57:17 +0200 Subject: [Cython] test crash in Py3.2 (NumPy/memoryview/refcounting related?) Message-ID: <4F93E40D.3030703@behnel.de> Hi, I keep seeing test crashes in Py3.2 debug builds on Jenkins with the latest master, referring to ref-counting problems. Here, for example: https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py32-ext/328/console and likewise in the series of builds starting at #312. I'm not sure what changed in that corner, it looks like one of those problems that was there for a while and suddenly starts showing when you touch that innocent looking brick at the other end of the house that was holding the balance, until now. In this case, that brick was this commit: https://github.com/cython/cython/commit/c8f61a668d0ce8af1020f520253e1b5f623cf349 I reverted it here and that fixed the problem at least temporarily: https://github.com/cython/cython/commit/5aac8caf1ba21933cc85a51af3319c78fc08d675 but it seems to be back now (after my refnanny optimisations). Before reverting my changes, I was able to reproduce it somewhat reliably on sage.math by running these three tests together (non-forking): memslice numpy_bufacc numpy_memoryview None of the tests shows the problem when run by itself. I can't tell if it's also in the latest py3k because I don't have a NumPy lying around that works there. So 3.2 is basically the latest Python version this can be tested with, and it doesn't occur in the 3.1 tests. The tests use NumPy 1.6.1, i.e. the latest official release. Mark, Dag, could any of you take a look to see if it appears in any way more obvious to you than to me? Stefan From markflorisson88 at gmail.com Sun Apr 22 13:54:44 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 22 Apr 2012 12:54:44 +0100 Subject: [Cython] test crash in Py3.2 (NumPy/memoryview/refcounting related?) In-Reply-To: <4F93E40D.3030703@behnel.de> References: <4F93E40D.3030703@behnel.de> Message-ID: On 22 April 2012 11:57, Stefan Behnel wrote: > Hi, > > I keep seeing test crashes in Py3.2 debug builds on Jenkins with the latest > master, referring to ref-counting problems. Here, for example: > > https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py32-ext/328/console > > and likewise in the series of builds starting at #312. > > I'm not sure what changed in that corner, it looks like one of those > problems that was there for a while and suddenly starts showing when you > touch that innocent looking brick at the other end of the house that was > holding the balance, until now. In this case, that brick was this commit: > > https://github.com/cython/cython/commit/c8f61a668d0ce8af1020f520253e1b5f623cf349 > > I reverted it here and that fixed the problem at least temporarily: > > https://github.com/cython/cython/commit/5aac8caf1ba21933cc85a51af3319c78fc08d675 > > but it seems to be back now (after my refnanny optimisations). Before > reverting my changes, I was able to reproduce it somewhat reliably on > sage.math by running these three tests together (non-forking): > > ? ? ? ?memslice numpy_bufacc numpy_memoryview > > None of the tests shows the problem when run by itself. I can't tell if > it's also in the latest py3k because I don't have a NumPy lying around that > works there. So 3.2 is basically the latest Python version this can be > tested with, and it doesn't occur in the 3.1 tests. The tests use NumPy > 1.6.1, i.e. the latest official release. > > Mark, Dag, could any of you take a look to see if it appears in any way > more obvious to you than to me? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Hm, I think you can try to disable the numpy_memoryview test and just continue development as normal. numpy_memoryview has a testcase function which inserts the test into __test__, so you could just comment out that line. The test seems to fail before it runs though? Is it possible to obtain a backtrace? From stefan_ml at behnel.de Sun Apr 22 14:34:20 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 22 Apr 2012 14:34:20 +0200 Subject: [Cython] test crash in Py3.2 (NumPy/memoryview/refcounting related?) In-Reply-To: References: <4F93E40D.3030703@behnel.de> Message-ID: <4F93FACC.8040506@behnel.de> mark florisson, 22.04.2012 13:54: > On 22 April 2012 11:57, Stefan Behnel wrote: >> I keep seeing test crashes in Py3.2 debug builds on Jenkins with the latest >> master, referring to ref-counting problems. Here, for example: >> >> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py32-ext/328/console >> >> and likewise in the series of builds starting at #312. >> >> I'm not sure what changed in that corner, it looks like one of those >> problems that was there for a while and suddenly starts showing when you >> touch that innocent looking brick at the other end of the house that was >> holding the balance, until now. In this case, that brick was this commit: >> >> https://github.com/cython/cython/commit/c8f61a668d0ce8af1020f520253e1b5f623cf349 >> >> I reverted it here and that fixed the problem at least temporarily: >> >> https://github.com/cython/cython/commit/5aac8caf1ba21933cc85a51af3319c78fc08d675 >> >> but it seems to be back now (after my refnanny optimisations). Before >> reverting my changes, I was able to reproduce it somewhat reliably on >> sage.math by running these three tests together (non-forking): >> >> memslice numpy_bufacc numpy_memoryview >> >> None of the tests shows the problem when run by itself. I can't tell if >> it's also in the latest py3k because I don't have a NumPy lying around that >> works there. So 3.2 is basically the latest Python version this can be >> tested with, and it doesn't occur in the 3.1 tests. The tests use NumPy >> 1.6.1, i.e. the latest official release. >> >> Mark, Dag, could any of you take a look to see if it appears in any way >> more obvious to you than to me? > > Hm, I think you can try to disable the numpy_memoryview test and just > continue development as normal. numpy_memoryview has a testcase > function which inserts the test into __test__, so you could just > comment out that line. ... as long as we remember to put it back in ;) > The test seems to fail before it runs though? Is it possible to obtain > a backtrace? When I reproduced it in the Jenking workspace, it crashed while trying to clean up objects to free memory, specifically in the deallocation visitor function of one of the memory views of numpy_memoryview (IIRC). That didn't really tell me where that object was created or what happened to it to get it to crash. Needs some more investigation. Stefan From markflorisson88 at gmail.com Sun Apr 22 16:31:38 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 22 Apr 2012 15:31:38 +0100 Subject: [Cython] test crash in Py3.2 (NumPy/memoryview/refcounting related?) In-Reply-To: <4F93FACC.8040506@behnel.de> References: <4F93E40D.3030703@behnel.de> <4F93FACC.8040506@behnel.de> Message-ID: On 22 April 2012 13:34, Stefan Behnel wrote: > mark florisson, 22.04.2012 13:54: >> On 22 April 2012 11:57, Stefan Behnel wrote: >>> I keep seeing test crashes in Py3.2 debug builds on Jenkins with the latest >>> master, referring to ref-counting problems. Here, for example: >>> >>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py32-ext/328/console >>> >>> and likewise in the series of builds starting at #312. >>> >>> I'm not sure what changed in that corner, it looks like one of those >>> problems that was there for a while and suddenly starts showing when you >>> touch that innocent looking brick at the other end of the house that was >>> holding the balance, until now. In this case, that brick was this commit: >>> >>> https://github.com/cython/cython/commit/c8f61a668d0ce8af1020f520253e1b5f623cf349 >>> >>> I reverted it here and that fixed the problem at least temporarily: >>> >>> https://github.com/cython/cython/commit/5aac8caf1ba21933cc85a51af3319c78fc08d675 >>> >>> but it seems to be back now (after my refnanny optimisations). Before >>> reverting my changes, I was able to reproduce it somewhat reliably on >>> sage.math by running these three tests together (non-forking): >>> >>> ? ? ? ?memslice numpy_bufacc numpy_memoryview >>> >>> None of the tests shows the problem when run by itself. I can't tell if >>> it's also in the latest py3k because I don't have a NumPy lying around that >>> works there. So 3.2 is basically the latest Python version this can be >>> tested with, and it doesn't occur in the 3.1 tests. The tests use NumPy >>> 1.6.1, i.e. the latest official release. >>> >>> Mark, Dag, could any of you take a look to see if it appears in any way >>> more obvious to you than to me? >> >> Hm, I think you can try to disable the numpy_memoryview test and just >> continue development as normal. numpy_memoryview has a testcase >> function which inserts the test into __test__, so you could just >> comment out that line. > > ... as long as we remember to put it back in ;) > > >> The test seems to fail before it runs though? Is it possible to obtain >> a backtrace? > > When I reproduced it in the Jenking workspace, it crashed while trying to > clean up objects to free memory, specifically in the deallocation visitor > function of one of the memory views of numpy_memoryview (IIRC). That didn't > really tell me where that object was created or what happened to it to get > it to crash. Needs some more investigation. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Hm, I can't reproduce the issue, but that assertion triggers only for update_refs and subtract_refs (when the gc is trying to determine which object may be potentially unreachable). I think the problem here is that a single memoryview object is traversed multiple times through different traverse functions, and that the refcount doesn't match the number of traverses. Indeed, the refcount is only one, as the actual count is the acquisition count. So we shouldn't traverse the memoryview objects in memoryview slices, i.e. not _memoryviewslice.from_slice.memview. I'll come up with a commit shortly, would you be willing to test it? From markflorisson88 at gmail.com Sun Apr 22 16:41:07 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 22 Apr 2012 15:41:07 +0100 Subject: [Cython] test crash in Py3.2 (NumPy/memoryview/refcounting related?) In-Reply-To: References: <4F93E40D.3030703@behnel.de> <4F93FACC.8040506@behnel.de> Message-ID: On 22 April 2012 15:31, mark florisson wrote: > On 22 April 2012 13:34, Stefan Behnel wrote: >> mark florisson, 22.04.2012 13:54: >>> On 22 April 2012 11:57, Stefan Behnel wrote: >>>> I keep seeing test crashes in Py3.2 debug builds on Jenkins with the latest >>>> master, referring to ref-counting problems. Here, for example: >>>> >>>> https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests/BACKEND=c,PYVERSION=py32-ext/328/console >>>> >>>> and likewise in the series of builds starting at #312. >>>> >>>> I'm not sure what changed in that corner, it looks like one of those >>>> problems that was there for a while and suddenly starts showing when you >>>> touch that innocent looking brick at the other end of the house that was >>>> holding the balance, until now. In this case, that brick was this commit: >>>> >>>> https://github.com/cython/cython/commit/c8f61a668d0ce8af1020f520253e1b5f623cf349 >>>> >>>> I reverted it here and that fixed the problem at least temporarily: >>>> >>>> https://github.com/cython/cython/commit/5aac8caf1ba21933cc85a51af3319c78fc08d675 >>>> >>>> but it seems to be back now (after my refnanny optimisations). Before >>>> reverting my changes, I was able to reproduce it somewhat reliably on >>>> sage.math by running these three tests together (non-forking): >>>> >>>> ? ? ? ?memslice numpy_bufacc numpy_memoryview >>>> >>>> None of the tests shows the problem when run by itself. I can't tell if >>>> it's also in the latest py3k because I don't have a NumPy lying around that >>>> works there. So 3.2 is basically the latest Python version this can be >>>> tested with, and it doesn't occur in the 3.1 tests. The tests use NumPy >>>> 1.6.1, i.e. the latest official release. >>>> >>>> Mark, Dag, could any of you take a look to see if it appears in any way >>>> more obvious to you than to me? >>> >>> Hm, I think you can try to disable the numpy_memoryview test and just >>> continue development as normal. numpy_memoryview has a testcase >>> function which inserts the test into __test__, so you could just >>> comment out that line. >> >> ... as long as we remember to put it back in ;) >> >> >>> The test seems to fail before it runs though? Is it possible to obtain >>> a backtrace? >> >> When I reproduced it in the Jenking workspace, it crashed while trying to >> clean up objects to free memory, specifically in the deallocation visitor >> function of one of the memory views of numpy_memoryview (IIRC). That didn't >> really tell me where that object was created or what happened to it to get >> it to crash. Needs some more investigation. >> >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Hm, I can't reproduce the issue, but that assertion triggers only for > update_refs and subtract_refs (when the gc is trying to determine > which object may be potentially unreachable). I think the problem here > is that a single memoryview object is traversed multiple times through > different traverse functions, and that the refcount doesn't match the > number of traverses. Indeed, the refcount is only one, as the actual > count is the acquisition count. So we shouldn't traverse the > memoryview objects in memoryview slices, i.e. not > _memoryviewslice.from_slice.memview. I'll come up with a commit > shortly, would you be willing to test it? A fix is here: https://github.com/markflorisson88/cython/commit/cd32184f3f782b6d7275cf430694b59801ce642a Lets see what jenkins has to say :) BTW, tp_clear calls Py_CLEAR on Py_buffer.obj, shouldn't it call releasebuffer instead? From nmd at illinois.edu Sun Apr 22 19:59:09 2012 From: nmd at illinois.edu (Nathan Dunfield) Date: Sun, 22 Apr 2012 12:59:09 -0500 Subject: [Cython] Cython 0.16: "eval" problem Message-ID: <83F1345A-FE56-451C-B17B-B41042E02038@illinois.edu> With Cython 0.15, the following works with Python 2.7: ### start file: prob.pyx def f(x): cdef int* p return eval(x) ### end file >>> import pyximport; pyximport.install() >>> import prob >>> prob.f("5") 5 but with Cython 0.16 it doesn't even compile: >>> import prob Error compiling Cython file: ------------------------------------------------------------ ... def f(x): cdef int* p return eval(x) ^ ------------------------------------------------------------ prob.pyx:3:15: Cannot convert 'int *' to Python object If I comment out the (unused) line "cdef int* p" then it works with Cython 0.16. The issue is the pointer declaration; something like: def f(x): cdef int p p = eval(x) return p*p works fine with Cython 0.16. Thanks, Nathan From dtcaciuc at gmail.com Sun Apr 22 20:14:28 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Sun, 22 Apr 2012 11:14:28 -0700 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: <4F932B12.5020507@behnel.de> References: <4F932B12.5020507@behnel.de> Message-ID: On Sat, Apr 21, 2012 at 2:48 PM, Stefan Behnel wrote: > Dimitri Tcaciuc, 21.04.2012 21:17: >> On a somewhat relevant node, have you considered enabling Issues page on Github? > > It was discussed, but the drawback of having two separate bug trackers is > non-negligible. Ok. I was wondering since it would make it much easier to connect issue/patch/discussion together without, say, me needlessly adding to the development mailing list and/or manually registering for trac and sending htpasswd digest over the mail. Here's something to consider if you ever want to migrate over from trac: https://github.com/adamcik/github-trac-ticket-import Cheers, Dimitri. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From vitja.makarov at gmail.com Sun Apr 22 20:22:44 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 22 Apr 2012 22:22:44 +0400 Subject: [Cython] Cython 0.16: "eval" problem In-Reply-To: <83F1345A-FE56-451C-B17B-B41042E02038@illinois.edu> References: <83F1345A-FE56-451C-B17B-B41042E02038@illinois.edu> Message-ID: 2012/4/22 Nathan Dunfield : > With Cython 0.15, the following works with Python 2.7: > > ### start file: prob.pyx > > def f(x): > ? ?cdef int* p > ? ?return eval(x) > > ### end file > >>>> import pyximport; pyximport.install() >>>> import prob >>>> prob.f("5") > 5 > > but with Cython 0.16 it doesn't even compile: > >>>> import prob > > Error compiling Cython file: > ------------------------------------------------------------ > ... > def f(x): > ? ?cdef int* p > ? ?return eval(x) > ? ? ? ? ? ? ?^ > ------------------------------------------------------------ > > prob.pyx:3:15: Cannot convert 'int *' to Python object > > If I comment out the (unused) line "cdef int* p" then it works with Cython 0.16. ?The issue is the pointer declaration; something like: > > def f(x): > ? ?cdef int p > ? ?p = eval(x) > ? ?return p*p > > works fine with Cython 0.16. > > ? ? ? ?Thanks, > > ? ? ? ?Nathan > Oops, it seems to be a problem with locals() dict creation. Perhaps it should ignore variables that can't be converted to PyObject -- vitja. From nmd at illinois.edu Sun Apr 22 20:33:27 2012 From: nmd at illinois.edu (Nathan Dunfield) Date: Sun, 22 Apr 2012 13:33:27 -0500 Subject: [Cython] Cython 0.16: "eval" problem In-Reply-To: <67296eb070e34ffeb88cfe65fd3b9471@CITESHT4.ad.uillinois.edu> References: <83F1345A-FE56-451C-B17B-B41042E02038@illinois.edu> <67296eb070e34ffeb88cfe65fd3b9471@CITESHT4.ad.uillinois.edu> Message-ID: <0E437224-7481-4033-A0CA-EB2D8CED1BBF@illinois.edu> On Apr 22, 2012, at 1:22 PM, Vitja Makarov wrote: > Oops, it seems to be a problem with locals() dict creation. Yes it does. The following variants of my original example both work: ## prob.pyx version 1 def cy_eval(s): return eval(s) def f(x): cdef int* p return cy_eval(x) ## prob.pyx version 2 def f(x): cdef int* p return eval(x, {}) Best, Nathan From vitja.makarov at gmail.com Sun Apr 22 20:50:23 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Sun, 22 Apr 2012 22:50:23 +0400 Subject: [Cython] Cython 0.16: "eval" problem In-Reply-To: <0E437224-7481-4033-A0CA-EB2D8CED1BBF@illinois.edu> References: <83F1345A-FE56-451C-B17B-B41042E02038@illinois.edu> <67296eb070e34ffeb88cfe65fd3b9471@CITESHT4.ad.uillinois.edu> <0E437224-7481-4033-A0CA-EB2D8CED1BBF@illinois.edu> Message-ID: 2012/4/22 Nathan Dunfield : > On Apr 22, 2012, at 1:22 PM, Vitja Makarov wrote: >> Oops, it seems to be a problem with locals() dict creation. > > Yes it does. ? The following variants of my original example both work: > > ## prob.pyx version 1 > > def cy_eval(s): > ? ?return eval(s) > > def f(x): > ? ?cdef int* p > ? ?return cy_eval(x) > > ## prob.pyx version 2 > > def f(x): > ? ?cdef int* p > ? ?return eval(x, {}) > > Best, > > ? ? ? ?Nathan > I've fixed it here: https://github.com/vitek/cython/commit/6dc132731b8f3f7eaabf55e51d89bcbc7b8f4eb7 Now waiting for jenkins, then I'll push it into upstream. As a workaround you can manually pass locals dictionary, e.g.: eval(x, None, {'a: a}) -- vitja. From markflorisson88 at gmail.com Sun Apr 22 22:20:00 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 22 Apr 2012 21:20:00 +0100 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: References: Message-ID: On 21 April 2012 20:17, Dimitri Tcaciuc wrote: > Hey everyone, > > Congratulations on shipping 0.16! I think I found a problem which > seems pretty straight forward. Say I want to factor out inner part of > some N^2 loops over a flow array, I write something like > > ?cdef inline float _inner(size_t i, size_t j, float[:] x): > ? ? cdef float d = x[i] - x[j] > ? ? return sqrtf(d * d) > > In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and > function is declared as inline, which is great. However, the > memoryview structure is passed by value: > > ?static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i, > size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) { > ? ? ... > > This seems to hinder compiler's (in my case, GCC 4.3.4) ability to > perform efficient inlining (although function does in fact get > inlined). If I manually inline that distance calculation, I get 3x > speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k > elements). When I manually modified generated .c file to pass memory > view slice by pointer, slowdown was eliminated completely. > > On a somewhat relevant node, have you considered enabling Issues page on Github? > > > Thanks! > > > Dimitri. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Although it is neither documented nor tested, it works if you just take the address of the memoryview. You can then index it using memoryview_pointer[0][i]. One should be careful, as taking the pointer and passing that around means that pointer is not acquisition counted, and will point to invalid memory if the memoryview goes out of scope (e.g. if it's a local variable, when you return). Cython could manually inline functions though, which could greatly reduce argument passing and unpacking overhead in some situations (like buffers). From markflorisson88 at gmail.com Sun Apr 22 22:21:16 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 22 Apr 2012 21:21:16 +0100 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: References: <4F932B12.5020507@behnel.de> Message-ID: On 22 April 2012 19:14, Dimitri Tcaciuc wrote: > On Sat, Apr 21, 2012 at 2:48 PM, Stefan Behnel wrote: >> Dimitri Tcaciuc, 21.04.2012 21:17: >>> On a somewhat relevant node, have you considered enabling Issues page on Github? >> >> It was discussed, but the drawback of having two separate bug trackers is >> non-negligible. > > Ok. I was wondering since it would make it much easier to connect > issue/patch/discussion together without, say, me needlessly adding to > the development mailing list and/or manually registering for trac and > sending htpasswd digest over the mail. Here's something to consider if > you ever want to migrate over from trac: > https://github.com/adamcik/github-trac-ticket-import > > Cheers, > > Dimitri. I haven't heard very good things about github issues, but I like to have everything in one place, and I'm not too fond of trac in any regard. It's also quite a barrier to get trac access, so I'd be in favour of moving tickets. >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From dtcaciuc at gmail.com Sun Apr 22 23:45:52 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Sun, 22 Apr 2012 14:45:52 -0700 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: References: Message-ID: On Sun, Apr 22, 2012 at 1:20 PM, mark florisson wrote: > On 21 April 2012 20:17, Dimitri Tcaciuc wrote: >> Hey everyone, >> >> Congratulations on shipping 0.16! I think I found a problem which >> seems pretty straight forward. Say I want to factor out inner part of >> some N^2 loops over a flow array, I write something like >> >> ?cdef inline float _inner(size_t i, size_t j, float[:] x): >> ? ? cdef float d = x[i] - x[j] >> ? ? return sqrtf(d * d) >> >> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and >> function is declared as inline, which is great. However, the >> memoryview structure is passed by value: >> >> ?static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i, >> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) { >> ? ? ... >> >> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to >> perform efficient inlining (although function does in fact get >> inlined). If I manually inline that distance calculation, I get 3x >> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k >> elements). When I manually modified generated .c file to pass memory >> view slice by pointer, slowdown was eliminated completely. >> >> On a somewhat relevant node, have you considered enabling Issues page on Github? >> >> >> Thanks! >> >> >> Dimitri. >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Although it is neither documented nor tested, it works if you just > take the address of the memoryview. You can then index it using > memoryview_pointer[0][i]. One should be careful, as taking the pointer > and passing that around means that pointer is not acquisition counted, > and will point to invalid memory if the memoryview goes out of scope > (e.g. if it's a local variable, when you return). Nice, passing by pointer did the trick! As an observation, I tried using `cython.operator.dereference(x)` and in this case it's way less efficient than `x[0]`. Dereferencing actually allocates an empty memory view slice and copies the contents of `x`, even if the `dereference(x)` result is never assigned anywhere and is only a temporary value in the expression. Dimitri. > Cython could manually inline functions though, which could greatly > reduce argument passing and unpacking overhead in some situations > (like buffers). > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Mon Apr 23 08:19:02 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2012 08:19:02 +0200 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: References: <4F932B12.5020507@behnel.de> Message-ID: <4F94F456.2010602@behnel.de> mark florisson, 22.04.2012 22:21: > On 22 April 2012 19:14, Dimitri Tcaciuc wrote: >> On Sat, Apr 21, 2012 at 2:48 PM, Stefan Behnel wrote: >>> Dimitri Tcaciuc, 21.04.2012 21:17: >>>> On a somewhat relevant node, have you considered enabling Issues page on Github? >>> >>> It was discussed, but the drawback of having two separate bug trackers is >>> non-negligible. >> >> Ok. I was wondering since it would make it much easier to connect >> issue/patch/discussion together without, say, me needlessly adding to >> the development mailing list and/or manually registering for trac and >> sending htpasswd digest over the mail. Here's something to consider if >> you ever want to migrate over from trac: >> https://github.com/adamcik/github-trac-ticket-import > > I haven't heard very good things about github issues I find them nicely accessible from user side, but hardly usable for the developers. All you get is basically a blog style comment system with bare tag support. Sure, you can build many of the necessary features on top of tags, but trac (or any other real issue tracker) already provides a lot more. Pull request tracking works well in github, but I consider their general issue tracker a last resort if you don't have anything else. Stefan From stefan_ml at behnel.de Mon Apr 23 08:24:32 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2012 08:24:32 +0200 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: References: Message-ID: <4F94F5A0.5000806@behnel.de> mark florisson, 22.04.2012 22:20: > On 21 April 2012 20:17, Dimitri Tcaciuc wrote: >> Say I want to factor out inner part of >> some N^2 loops over a flow array, I write something like >> >> cdef inline float _inner(size_t i, size_t j, float[:] x): >> cdef float d = x[i] - x[j] >> return sqrtf(d * d) >> >> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and >> function is declared as inline, which is great. However, the >> memoryview structure is passed by value: >> >> static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i, >> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) { >> ... >> >> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to >> perform efficient inlining (although function does in fact get >> inlined). If I manually inline that distance calculation, I get 3x >> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k >> elements). When I manually modified generated .c file to pass memory >> view slice by pointer, slowdown was eliminated completely. > > Although it is neither documented nor tested, it works if you just > take the address of the memoryview. You can then index it using > memoryview_pointer[0][i]. Are you advertising this an an actual feature here? I'm just asking because supporting hacks can be nasty in the long run. What if we ever want to make a change to the internal way memoryviews work that would break this? Stefan From vitja.makarov at gmail.com Mon Apr 23 09:26:11 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Mon, 23 Apr 2012 11:26:11 +0400 Subject: [Cython] Cython 0.16: "eval" problem In-Reply-To: References: <83F1345A-FE56-451C-B17B-B41042E02038@illinois.edu> <67296eb070e34ffeb88cfe65fd3b9471@CITESHT4.ad.uillinois.edu> <0E437224-7481-4033-A0CA-EB2D8CED1BBF@illinois.edu> Message-ID: 2012/4/22 Vitja Makarov : > 2012/4/22 Nathan Dunfield : >> On Apr 22, 2012, at 1:22 PM, Vitja Makarov wrote: >>> Oops, it seems to be a problem with locals() dict creation. >> >> Yes it does. ? The following variants of my original example both work: >> >> ## prob.pyx version 1 >> >> def cy_eval(s): >> ? ?return eval(s) >> >> def f(x): >> ? ?cdef int* p >> ? ?return cy_eval(x) >> >> ## prob.pyx version 2 >> >> def f(x): >> ? ?cdef int* p >> ? ?return eval(x, {}) >> >> Best, >> >> ? ? ? ?Nathan >> > > I've fixed it here: > https://github.com/vitek/cython/commit/6dc132731b8f3f7eaabf55e51d89bcbc7b8f4eb7 > > Now waiting for jenkins, then I'll push it into upstream. As a > workaround you can manually pass locals dictionary, e.g.: > > eval(x, None, {'a: a}) > Btw before 0.16 locals() weren't passed to eval. I've tried the following code and it fails with 0.15: def foo(): cdef int *a return locals() I've pushed fix to upstream/master https://github.com/cython/cython/commit/0b133e00a7bc3c53ea60d3cf4ae8eb3e20ef49ec -- vitja. From stefan_ml at behnel.de Mon Apr 23 10:23:21 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2012 10:23:21 +0200 Subject: [Cython] test crash in Py3.2 (NumPy/memoryview/refcounting related?) In-Reply-To: References: <4F93E40D.3030703@behnel.de> <4F93FACC.8040506@behnel.de> Message-ID: <4F951179.8000506@behnel.de> mark florisson, 22.04.2012 16:41: > On 22 April 2012 15:31, mark florisson wrote: >> I think the problem here >> is that a single memoryview object is traversed multiple times through >> different traverse functions, and that the refcount doesn't match the >> number of traverses. Indeed, the refcount is only one, as the actual >> count is the acquisition count. So we shouldn't traverse the >> memoryview objects in memoryview slices, i.e. not >> _memoryviewslice.from_slice.memview. I'll come up with a commit >> shortly, would you be willing to test it? > > A fix is here: https://github.com/markflorisson88/cython/commit/cd32184f3f782b6d7275cf430694b59801ce642a > > Lets see what jenkins has to say :) Seems to like it. > BTW, tp_clear calls Py_CLEAR on Py_buffer.obj, shouldn't it call > releasebuffer instead? Where is that? The memoryview class calls __Pyx_ReleaseBuffer() here. BTW, why are some of the self arguments in MemoryView.pyx explicitly typed as "memoryview"? Stefan From markflorisson88 at gmail.com Mon Apr 23 10:32:46 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 23 Apr 2012 09:32:46 +0100 Subject: [Cython] test crash in Py3.2 (NumPy/memoryview/refcounting related?) In-Reply-To: <4F951179.8000506@behnel.de> References: <4F93E40D.3030703@behnel.de> <4F93FACC.8040506@behnel.de> <4F951179.8000506@behnel.de> Message-ID: On 23 April 2012 09:23, Stefan Behnel wrote: > mark florisson, 22.04.2012 16:41: >> On 22 April 2012 15:31, mark florisson wrote: >>> I think the problem here >>> is that a single memoryview object is traversed multiple times through >>> different traverse functions, and that the refcount doesn't match the >>> number of traverses. Indeed, the refcount is only one, as the actual >>> count is the acquisition count. So we shouldn't traverse the >>> memoryview objects in memoryview slices, i.e. not >>> _memoryviewslice.from_slice.memview. I'll come up with a commit >>> shortly, would you be willing to test it? >> >> A fix is here: https://github.com/markflorisson88/cython/commit/cd32184f3f782b6d7275cf430694b59801ce642a >> >> Lets see what jenkins has to say :) > > Seems to like it. > > >> BTW, tp_clear calls Py_CLEAR on Py_buffer.obj, shouldn't it call >> releasebuffer instead? > > Where is that? The memoryview class calls __Pyx_ReleaseBuffer() here. Yes, but ModuleNode generates a tp_clear for Py_buffer cdef class attributes that clears Py_buffer.obj, which means a subsequent __dealloc__ calling release buffer cannot call the release buffer function on the original object. > BTW, why are some of the self arguments in MemoryView.pyx explicitly typed > as "memoryview"? I suppose because the original code had that before I started, and then I continued with it a bit for consistency. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Mon Apr 23 10:39:53 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 23 Apr 2012 09:39:53 +0100 Subject: [Cython] `cdef inline` and typed memory views In-Reply-To: <4F94F5A0.5000806@behnel.de> References: <4F94F5A0.5000806@behnel.de> Message-ID: On 23 April 2012 07:24, Stefan Behnel wrote: > mark florisson, 22.04.2012 22:20: >> On 21 April 2012 20:17, Dimitri Tcaciuc wrote: >>> Say I want to factor out inner part of >>> some N^2 loops over a flow array, I write something like >>> >>> ?cdef inline float _inner(size_t i, size_t j, float[:] x): >>> ? ? cdef float d = x[i] - x[j] >>> ? ? return sqrtf(d * d) >>> >>> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and >>> function is declared as inline, which is great. However, the >>> memoryview structure is passed by value: >>> >>> ?static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i, >>> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) { >>> ? ? ... >>> >>> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to >>> perform efficient inlining (although function does in fact get >>> inlined). If I manually inline that distance calculation, I get 3x >>> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k >>> elements). When I manually modified generated .c file to pass memory >>> view slice by pointer, slowdown was eliminated completely. >> >> Although it is neither documented nor tested, it works if you just >> take the address of the memoryview. You can then index it using >> memoryview_pointer[0][i]. > > Are you advertising this an an actual feature here? I'm just asking because > supporting hacks can be nasty in the long run. What if we ever want to make > a change to the internal way memoryviews work that would break this? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Yeah, I'm not entirely sure if this is a hack or a feature. It doesn't really matter how memoryviews are represented, or where they are stored, as dereferencing the pointer gives you the same situation as before. The only difference is when a) memoryviews would be relocated or b) go out of scope. If we're ever planning to support garbage collection (and I doubt we are) or if we're ever going to allocate them on the heap and have a variable-sized representation, a) could be a case. As for b), it's really the same as automatic C variables. So I suppose I wouldn't be opposed to officially supporting this. From stefan_ml at behnel.de Mon Apr 23 10:42:15 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2012 10:42:15 +0200 Subject: [Cython] tp_clear() of buffer client objects (was: Re: test crash in Py3.2 (NumPy/memoryview/refcounting related?)) In-Reply-To: <4F951179.8000506@behnel.de> References: <4F93E40D.3030703@behnel.de> <4F93FACC.8040506@behnel.de> <4F951179.8000506@behnel.de> Message-ID: <4F9515E7.4050208@behnel.de> Stefan Behnel, 23.04.2012 10:23: > mark florisson, 22.04.2012 16:41: >> On 22 April 2012 15:31, mark florisson wrote: >>> I think the problem here >>> is that a single memoryview object is traversed multiple times through >>> different traverse functions, and that the refcount doesn't match the >>> number of traverses. Indeed, the refcount is only one, as the actual >>> count is the acquisition count. So we shouldn't traverse the >>> memoryview objects in memoryview slices, i.e. not >>> _memoryviewslice.from_slice.memview. I'll come up with a commit >>> shortly, would you be willing to test it? >> >> BTW, tp_clear calls Py_CLEAR on Py_buffer.obj, shouldn't it call >> releasebuffer instead? > > Where is that? The memoryview class calls __Pyx_ReleaseBuffer() here. Ah, found it. Yes, tp_clear() would be called before __dealloc__() in reference cycles, so that's a problem. I'm not sure tp_clear should do something as (potentially) involved as freeing the buffer, but if the Py_buffer is owned by the object, then I guess it just has to do that. Otherwise, it would leak the buffer. The problem is that this also impacts user code, though, so a change might break code, e.g. when it needs to do some cleanup on its own before freeing the buffer. It would make the code more correct, sure, but it would still break it. Guess we have to take that route, though... Stefan From markflorisson88 at gmail.com Mon Apr 23 11:10:02 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 23 Apr 2012 10:10:02 +0100 Subject: [Cython] tp_clear() of buffer client objects (was: Re: test crash in Py3.2 (NumPy/memoryview/refcounting related?)) In-Reply-To: <4F9515E7.4050208@behnel.de> References: <4F93E40D.3030703@behnel.de> <4F93FACC.8040506@behnel.de> <4F951179.8000506@behnel.de> <4F9515E7.4050208@behnel.de> Message-ID: On 23 April 2012 09:42, Stefan Behnel wrote: > Stefan Behnel, 23.04.2012 10:23: >> mark florisson, 22.04.2012 16:41: >>> On 22 April 2012 15:31, mark florisson wrote: >>>> I think the problem here >>>> is that a single memoryview object is traversed multiple times through >>>> different traverse functions, and that the refcount doesn't match the >>>> number of traverses. Indeed, the refcount is only one, as the actual >>>> count is the acquisition count. So we shouldn't traverse the >>>> memoryview objects in memoryview slices, i.e. not >>>> _memoryviewslice.from_slice.memview. I'll come up with a commit >>>> shortly, would you be willing to test it? >>> >>> BTW, tp_clear calls Py_CLEAR on Py_buffer.obj, shouldn't it call >>> releasebuffer instead? >> >> Where is that? The memoryview class calls __Pyx_ReleaseBuffer() here. > > Ah, found it. Yes, tp_clear() would be called before __dealloc__() in > reference cycles, so that's a problem. I'm not sure tp_clear should do > something as (potentially) involved as freeing the buffer, but if the > Py_buffer is owned by the object, then I guess it just has to do that. > Otherwise, it would leak the buffer. > > The problem is that this also impacts user code, though, so a change might > break code, e.g. when it needs to do some cleanup on its own before freeing > the buffer. It would make the code more correct, sure, but it would still > break it. Guess we have to take that route, though... > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel It also seems that tp_clear is only generated for object attributes, and then it includes freeing the object of Py_buffers (so only a Py_buffer attribute doesn't result in a tp_clear function). Finally, tp_dealloc doesn't deallocate buffers either, which it should. So both tp_clear and tp_dealloc should call release buffer (it can be called multiple times on the same buffer). From stefan_ml at behnel.de Mon Apr 23 11:32:37 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2012 11:32:37 +0200 Subject: [Cython] tp_clear() of buffer client objects In-Reply-To: References: <4F93E40D.3030703@behnel.de> <4F93FACC.8040506@behnel.de> <4F951179.8000506@behnel.de> <4F9515E7.4050208@behnel.de> Message-ID: <4F9521B5.3040405@behnel.de> mark florisson, 23.04.2012 11:10: > On 23 April 2012 09:42, Stefan Behnel wrote: >> Stefan Behnel, 23.04.2012 10:23: >>> mark florisson, 22.04.2012 16:41: >>>> On 22 April 2012 15:31, mark florisson wrote: >>>>> I think the problem here >>>>> is that a single memoryview object is traversed multiple times through >>>>> different traverse functions, and that the refcount doesn't match the >>>>> number of traverses. Indeed, the refcount is only one, as the actual >>>>> count is the acquisition count. So we shouldn't traverse the >>>>> memoryview objects in memoryview slices, i.e. not >>>>> _memoryviewslice.from_slice.memview. I'll come up with a commit >>>>> shortly, would you be willing to test it? >>>> >>>> BTW, tp_clear calls Py_CLEAR on Py_buffer.obj, shouldn't it call >>>> releasebuffer instead? >>> >>> Where is that? The memoryview class calls __Pyx_ReleaseBuffer() here. >> >> Ah, found it. Yes, tp_clear() would be called before __dealloc__() in >> reference cycles, so that's a problem. I'm not sure tp_clear should do >> something as (potentially) involved as freeing the buffer, but if the >> Py_buffer is owned by the object, then I guess it just has to do that. >> Otherwise, it would leak the buffer. >> >> The problem is that this also impacts user code, though, so a change might >> break code, e.g. when it needs to do some cleanup on its own before freeing >> the buffer. It would make the code more correct, sure, but it would still >> break it. Guess we have to take that route, though... > > It also seems that tp_clear is only generated for object attributes, > and then it includes freeing the object of Py_buffers (so only a > Py_buffer attribute doesn't result in a tp_clear function). Finally, > tp_dealloc doesn't deallocate buffers either, which it should. So both > tp_clear and tp_dealloc should call release buffer Good call. > (it can be called > multiple times on the same buffer). I guess calling it the first time would set "obj" to NULL and the second time would recognise it and do nothing, right? That's fine. Stefan From nmd at illinois.edu Mon Apr 23 17:58:00 2012 From: nmd at illinois.edu (Nathan Dunfield) Date: Mon, 23 Apr 2012 10:58:00 -0500 Subject: [Cython] Cython 0.16 issue Windows with Mingw32 Message-ID: <8E38000F-1A95-4436-B078-8C86616B96EE@illinois.edu> I've encountered the following issue with Cython 0.16 on Windows with using the Mingw32 compiler (I'm using Python 3.2 here, but I don't think that's the issue): $ python3 setup.py build -c mingw32 running build running build_py running build_ext skipping 'SnapPy.c' Cython extension (up-to-date) building 'snappy.SnapPy' extension c:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -Iheaders -Iunix_kit -Iaddl_code -I. -Ipari/pari-2.3.4/include/ -Ipari/pari-2.3.4/include/pari -Ic:\Python32\include -Ic:\Python32\PC -c SnapPy.c -o build\temp.win32-3.2\Release\snappy.o SnapPy.c: In function `__pyx_f_6snappy_6SnapPy_13Triangulation_build_rep_into_Sn': SnapPy.c:25187: warning: implicit declaration of function `fg_get_num_orig_gens' SnapPy.c:25423: warning: implicit declaration of function `candidateSn_is_valid' SnapPy.c:25435: warning: implicit declaration of function `candidateSn_is_transitive' SnapPy.c: At top level: SnapPy.c:76434: error: initializer element is not constant SnapPy.c:76434: error: (near initialization for `__pyx_CyFunctionType_type.tp_call') error: command 'gcc' failed with exit status 1 The problem seems to be in the code that's pulled in from CythonFunction.c. I apologize for not providing a more minimal example (the above code is available at "hg clone static-http://math.uic.edu/t3m/hg/SnapPy") but the small module I tried didn't pull in the CythonFunction.c code. Thanks, Nathan From robertwb at gmail.com Mon Apr 23 18:30:41 2012 From: robertwb at gmail.com (Robert Bradshaw) Date: Mon, 23 Apr 2012 09:30:41 -0700 Subject: [Cython] Julialang In-Reply-To: References: <4F92452D.7030507@astro.uio.no> Message-ID: On Sun, Apr 22, 2012 at 1:59 AM, Lisandro Dalcin wrote: > On 22 April 2012 08:10, Robert Bradshaw wrote: >> Yes, Julia looks really cool. It's been on my radar for a while, but I >> haven't had a chance to really try it out for anything yet. But I >> hadn't thought about low-level Python/Cython <-> Julia integration. >> That sounds very interesting. I wonder if Jython could give any >> insight into to the tight interaction between two languages that are >> usually used in isolation but have been made to call each other >> (though there are a lot of differences too, e.g. we're not targeting >> replacing the CPython interpreter (on first pass at least...)). >> > > Are you all aware that "calling C" actually means a ctypes-like > functionality based in dlopen()/dlsym() ? > http://julialang.org/manual/calling-c-and-fortran-code/. Yes, with all its drawbacks, but the fact that it's JIT'ed at least cuts into the overhead issues. - Robert From dtcaciuc at gmail.com Mon Apr 23 19:09:08 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Mon, 23 Apr 2012 10:09:08 -0700 Subject: [Cython] Julialang In-Reply-To: References: <4F92452D.7030507@astro.uio.no> Message-ID: I may be misuderstanding the intent here, but here it goes. If the main idea is to be able to call functions that are written in Julia or other languages, I think an effort to create an LLVM backend for Cython would go a long way towards inter-language connections as the one discussed here. It should be possible to take Cython- and Julia- produced LLVM bytecode and assemble it all together, applying whatever bytecode optimizers that are available (eg. SSE vectorization). A big advantage of that approach is that there's no need for one language to know syntax conventions of the other one (or at least not to full extent). Continuing the effort, it should be possible to eliminate the need for writing an intermediate .c/.cpp file if Clang compiler is used, which is also LLVM based. Dimitri. On Mon, Apr 23, 2012 at 9:30 AM, Robert Bradshaw wrote: > On Sun, Apr 22, 2012 at 1:59 AM, Lisandro Dalcin wrote: >> On 22 April 2012 08:10, Robert Bradshaw wrote: >>> Yes, Julia looks really cool. It's been on my radar for a while, but I >>> haven't had a chance to really try it out for anything yet. But I >>> hadn't thought about low-level Python/Cython <-> Julia integration. >>> That sounds very interesting. I wonder if Jython could give any >>> insight into to the tight interaction between two languages that are >>> usually used in isolation but have been made to call each other >>> (though there are a lot of differences too, e.g. we're not targeting >>> replacing the CPython interpreter (on first pass at least...)). >>> >> >> Are you all aware that "calling C" actually means a ctypes-like >> functionality based in dlopen()/dlsym() ? >> http://julialang.org/manual/calling-c-and-fortran-code/. > > Yes, with all its drawbacks, but the fact that it's JIT'ed at least > cuts into the overhead issues. > > - Robert > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From dtcaciuc at gmail.com Mon Apr 23 19:12:10 2012 From: dtcaciuc at gmail.com (Dimitri Tcaciuc) Date: Mon, 23 Apr 2012 10:12:10 -0700 Subject: [Cython] Julialang In-Reply-To: References: <4F92452D.7030507@astro.uio.no> Message-ID: On Mon, Apr 23, 2012 at 10:09 AM, Dimitri Tcaciuc wrote: > I may be misuderstanding the intent here, but here it goes. > > If the main idea is to be able to call functions that are written in > Julia or other languages, I think an effort to create an LLVM backend > for Cython would go a long way towards inter-language connections as > the one discussed here. It should be possible to take Cython- and > Julia- produced LLVM bytecode and assemble it all together, applying > whatever bytecode optimizers that are available (eg. SSE > vectorization). A big advantage of that approach is that there's no > need for one language to know syntax conventions of the other one (or > at least not to full extent). Continuing the effort, it should be > possible to eliminate the need for writing an intermediate .c/.cpp > file if Clang compiler is used, which is also LLVM based. I should clarify myself to avoid a gross mistake; a .c module would still be necessary for CPython integration. > Dimitri. > > On Mon, Apr 23, 2012 at 9:30 AM, Robert Bradshaw wrote: >> On Sun, Apr 22, 2012 at 1:59 AM, Lisandro Dalcin wrote: >>> On 22 April 2012 08:10, Robert Bradshaw wrote: >>>> Yes, Julia looks really cool. It's been on my radar for a while, but I >>>> haven't had a chance to really try it out for anything yet. But I >>>> hadn't thought about low-level Python/Cython <-> Julia integration. >>>> That sounds very interesting. I wonder if Jython could give any >>>> insight into to the tight interaction between two languages that are >>>> usually used in isolation but have been made to call each other >>>> (though there are a lot of differences too, e.g. we're not targeting >>>> replacing the CPython interpreter (on first pass at least...)). >>>> >>> >>> Are you all aware that "calling C" actually means a ctypes-like >>> functionality based in dlopen()/dlsym() ? >>> http://julialang.org/manual/calling-c-and-fortran-code/. >> >> Yes, with all its drawbacks, but the fact that it's JIT'ed at least >> cuts into the overhead issues. >> >> - Robert >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel From njs at pobox.com Mon Apr 23 20:17:50 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 23 Apr 2012 19:17:50 +0100 Subject: [Cython] Julialang In-Reply-To: References: <4F92452D.7030507@astro.uio.no> Message-ID: On Mon, Apr 23, 2012 at 6:09 PM, Dimitri Tcaciuc wrote: > I may be misuderstanding the intent here, but here it goes. > > If the main idea is to be able to call functions that are written in > Julia or other languages, I think an effort to create an LLVM backend > for Cython would go a long way towards inter-language connections as > the one discussed here. It should be possible to take Cython- and > Julia- produced LLVM bytecode and assemble it all together, applying > whatever bytecode optimizers that are available (eg. SSE > vectorization). A big advantage of that approach is that there's no > need for one language to know syntax conventions of the other one (or > at least not to full extent). Continuing the effort, it should be > possible to eliminate the need for writing an intermediate .c/.cpp > file if Clang compiler is used, which is also LLVM based. You'd still need some way to translate between the Cython and Julia calling conventions, runtimes, error handling, garbage collection regimes, etc. IIUC, LLVM IR isn't like the CLR -- it doesn't force languages into a common system for these things. Which might be great and worth the effort, I don't know, and don't want to discourage anyone. But there are literally hundreds of new languages designed every year, and a new *successful* language comes along maybe twice in a decade? And one of those recent ones was PHP, which shows you how important pure technical quality is in determining which ones survive (i.e., not much). Building a self-sustaining ecosystem requires a ton of work and a ton of luck. And here I'm still trying to *reduce* the number of languages I need in each analysis pipeline... so even though there are a number of really exciting things about Julia, and its author seems to know what he's doing, I'm still in wait-and-see mode. -- Nathaniel From stefan_ml at behnel.de Mon Apr 23 20:28:47 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 23 Apr 2012 20:28:47 +0200 Subject: [Cython] Cython 0.16 issue Windows with Mingw32 In-Reply-To: <8E38000F-1A95-4436-B078-8C86616B96EE@illinois.edu> References: <8E38000F-1A95-4436-B078-8C86616B96EE@illinois.edu> Message-ID: <4F959F5F.3020907@behnel.de> Nathan Dunfield, 23.04.2012 17:58: > I've encountered the following issue with Cython 0.16 on Windows with > using the Mingw32 compiler (I'm using Python 3.2 here, but I don't think > that's the issue): > > $ python3 setup.py build -c mingw32 > running build > running build_py > running build_ext > skipping 'SnapPy.c' Cython extension (up-to-date) > building 'snappy.SnapPy' extension > c:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -Iheaders -Iunix_kit -Iaddl_code -I. -Ipari/pari-2.3.4/include/ -Ipari/pari-2.3.4/include/pari -Ic:\Python32\include -Ic:\Python32\PC -c SnapPy.c -o build\temp.win32-3.2\Release\snappy.o > SnapPy.c: In function `__pyx_f_6snappy_6SnapPy_13Triangulation_build_rep_into_Sn': > SnapPy.c:25187: warning: implicit declaration of function `fg_get_num_orig_gens' > SnapPy.c:25423: warning: implicit declaration of function `candidateSn_is_valid' > SnapPy.c:25435: warning: implicit declaration of function `candidateSn_is_transitive' > SnapPy.c: At top level: > SnapPy.c:76434: error: initializer element is not constant > SnapPy.c:76434: error: (near initialization for `__pyx_CyFunctionType_type.tp_call') > error: command 'gcc' failed with exit status 1 Hmm, that line basically just says "PyCFunction_Call", which is a function exported by CPython. I wonder why gcc would consider this "not a constant". Could you check if the preprocessor (gcc -E, with all the above includes) also sees that on your side? Stefan From d.s.seljebotn at astro.uio.no Mon Apr 23 20:55:35 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 23 Apr 2012 20:55:35 +0200 Subject: [Cython] Julialang In-Reply-To: References: <4F92452D.7030507@astro.uio.no> Message-ID: <4F95A5A7.9030205@astro.uio.no> On 04/23/2012 08:17 PM, Nathaniel Smith wrote: > On Mon, Apr 23, 2012 at 6:09 PM, Dimitri Tcaciuc wrote: >> I may be misuderstanding the intent here, but here it goes. >> >> If the main idea is to be able to call functions that are written in >> Julia or other languages, I think an effort to create an LLVM backend >> for Cython would go a long way towards inter-language connections as >> the one discussed here. It should be possible to take Cython- and >> Julia- produced LLVM bytecode and assemble it all together, applying >> whatever bytecode optimizers that are available (eg. SSE >> vectorization). A big advantage of that approach is that there's no >> need for one language to know syntax conventions of the other one (or >> at least not to full extent). Continuing the effort, it should be >> possible to eliminate the need for writing an intermediate .c/.cpp >> file if Clang compiler is used, which is also LLVM based. > > You'd still need some way to translate between the Cython and Julia > calling conventions, runtimes, error handling, garbage collection > regimes, etc. IIUC, LLVM IR isn't like the CLR -- it doesn't force > languages into a common system for these things. > > Which might be great and worth the effort, I don't know, and don't > want to discourage anyone. But there are literally hundreds of new > languages designed every year, and a new *successful* language comes > along maybe twice in a decade? And one of those recent ones was PHP, > which shows you how important pure technical quality is in determining > which ones survive (i.e., not much). Building a self-sustaining > ecosystem requires a ton of work and a ton of luck. And here I'm still > trying to *reduce* the number of languages I need in each analysis > pipeline... so even though there are a number of really exciting > things about Julia, and its author seems to know what he's doing, I'm > still in wait-and-see mode. I'm excited about Julia because it's basically what I'd *like* to program in. My current mode of development for much stuff is Jinja2 or Tempita used for generating C code; Julia would be a real step forward. I recently started a thread on julia-dev, primarily to encourage them to focus on binding to Python and use Python libraries rather than focusing on creating their own libraries (though I wasn't that blunt). The response is positive and I'm hopeful. The thing is, I really hope we've moved beyond CPython in 10 years -- in fact I'd go as far as saying that the reliance on CPython (specifically the lack of a decent JIT) is a real danger for the survival of the scientific Python ecosystem long-term! And I have my doubts about PyPy too (though I'm really happy for Stefan's efforts to bring some sanity with fixing cpyext). If Julia gets into a mode where they bootstrap by piggy-backing on Python's libraries, and gets that working transparently and builds a userbase around that, the next natural step is to implement Python in Julia, with CPython C-API compatability. Which would be great. A very, very, very long shot of course. Dag From greg.ewing at canterbury.ac.nz Tue Apr 24 00:32:51 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 24 Apr 2012 10:32:51 +1200 Subject: [Cython] Julialang In-Reply-To: <4F95A5A7.9030205@astro.uio.no> References: <4F92452D.7030507@astro.uio.no> <4F95A5A7.9030205@astro.uio.no> Message-ID: <4F95D893.7050709@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > I'm excited about Julia because it's basically what I'd *like* to > program in. My current mode of development for much stuff is Jinja2 or > Tempita used for generating C code; Julia would be a real step forward. It looks interesting, but I have a few reservations about it as it stands: * No modules, just one big global namespace. This makes it unsuitable for large projects, IMO. * Multiple dispatch... I have mixed feelings about it. When methods belong to classes, the class serves as a namespace, and as we all know, namespaces are a honking great idea. Putting methods outside of classes throws away one kind of namespace. * One-based indexing? Yuck. I suppose it's what Fortran and Matlab users are familiar with, but it's not the best technical decision, IMO. On the plus side, it does seem to have a very nice and unobtrusive type system. > the next natural step is to implement Python in > Julia, with CPython C-API compatability. Which would be great. That would indeed be an interesting thing to explore. -- Greg From d.s.seljebotn at astro.uio.no Tue Apr 24 07:15:08 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 24 Apr 2012 07:15:08 +0200 Subject: [Cython] Julialang In-Reply-To: <4F95D893.7050709@canterbury.ac.nz> References: <4F92452D.7030507@astro.uio.no> <4F95A5A7.9030205@astro.uio.no> <4F95D893.7050709@canterbury.ac.nz> Message-ID: <4F9636DC.2020704@astro.uio.no> On 04/24/2012 12:32 AM, Greg Ewing wrote: > Dag Sverre Seljebotn wrote: > >> I'm excited about Julia because it's basically what I'd *like* to >> program in. My current mode of development for much stuff is Jinja2 or >> Tempita used for generating C code; Julia would be a real step forward. > > It looks interesting, but I have a few reservations about > it as it stands: > > * No modules, just one big global namespace. This makes it > unsuitable for large projects, IMO. As far as I know it seems like namespaces are on their TODO-list. But of course, that also means it's undecided. > * Multiple dispatch... I have mixed feelings about it. When > methods belong to classes, the class serves as a namespace, > and as we all know, namespaces are a honking great idea. > Putting methods outside of classes throws away one kind of > namespace. Well, there's still the namespace of the argument type. I think it is really a syntactic rewrite of obj->foo(bar) to foo(obj, bar) If Julia gets namespace support then the version ("method") of "foo" to use is determined by the namespace of obj and bar. And in Python there's all sorts of problems with who wins the battle over __add__ and __radd__ and so on (though it's a rather minor point and not something that by itself merits a new language IMO...). > * One-based indexing? Yuck. I suppose it's what Fortran and > Matlab users are familiar with, but it's not the best > technical decision, IMO. > > On the plus side, it does seem to have a very nice and > unobtrusive type system. > >> the next natural step is to implement Python in Julia, with CPython >> C-API compatability. Which would be great. > > That would indeed be an interesting thing to explore. Dag From stefan_ml at behnel.de Tue Apr 24 07:44:34 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 24 Apr 2012 07:44:34 +0200 Subject: [Cython] Julialang In-Reply-To: <4F95D893.7050709@canterbury.ac.nz> References: <4F92452D.7030507@astro.uio.no> <4F95A5A7.9030205@astro.uio.no> <4F95D893.7050709@canterbury.ac.nz> Message-ID: <4F963DC2.7030805@behnel.de> Greg Ewing, 24.04.2012 00:32: > Dag Sverre Seljebotn wrote: >> I'm excited about Julia because it's basically what I'd *like* to program >> in. My current mode of development for much stuff is Jinja2 or Tempita >> used for generating C code; Julia would be a real step forward. > > It looks interesting, but I have a few reservations about > it as it stands: > > * No modules, just one big global namespace. This makes it > unsuitable for large projects, IMO. > > * Multiple dispatch... I have mixed feelings about it. When > methods belong to classes, the class serves as a namespace, > and as we all know, namespaces are a honking great idea. > Putting methods outside of classes throws away one kind of > namespace. > > * One-based indexing? Yuck. I suppose it's what Fortran and > Matlab users are familiar with, but it's not the best > technical decision, IMO. > > On the plus side, it does seem to have a very nice and > unobtrusive type system. I totally agree. They might have been inspired by Lua and tried to make the type system more usable. There are/were many languages that started off with the design goal of being simple, beautiful and avoiding "all that overhead", before they got to the point of becoming usable and consequently quite complex. Even if Julia stays a niche language (and there is nothing that indicates that it won't be so), I think Dag is right in that it is an interesting niche for a certain user group (whether that is enough for it to prevail, well...). It could certainly make a nice addition to CPython. Whether reimplementing Python in it is a good idea and worth the effort - well, there are lots of incomplete special-purpose Python-like language implementations already, so why not have one more. Stefan From greg.ewing at canterbury.ac.nz Tue Apr 24 08:29:18 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 24 Apr 2012 18:29:18 +1200 Subject: [Cython] Julialang In-Reply-To: <4F9636DC.2020704@astro.uio.no> References: <4F92452D.7030507@astro.uio.no> <4F95A5A7.9030205@astro.uio.no> <4F95D893.7050709@canterbury.ac.nz> <4F9636DC.2020704@astro.uio.no> Message-ID: <4F96483E.30302@canterbury.ac.nz> Dag Sverre Seljebotn wrote: > Well, there's still the namespace of the argument type. I think it is > really a syntactic rewrite of > > obj->foo(bar) > > to > > foo(obj, bar) This is where I disagree. It's *not* just a syntactic rewrite, it's a lot more than that. With a Python method, I have a fairly good idea of where to start looking for a definition, documentation, etc. Or if it's a stand-alone function, I can follow the imports and find out which module it's defined in. But with generic functions, the implementation for the particular combination of argument types concerned could be *anywhere*, even in a file that I haven't explicitly imported myself. It's monkeypatching on steroids. I acknowledge that multiple dispatch is very powerful and lets you do all sorts of wonderful things. But that power comes at the expense of some features that I particularly value about Python's namespace system. > And in Python there's all sorts of problems with who wins the battle > over __add__ and __radd__ I think Python's solution to this is rather good, actually. It seems to work quite well in practice. And multiple dispatch systems have all the same problems when the "best" match to the argument types is ambiguous, so it's not as if multimethods are a magic solution to this. -- Greg From stefan_ml at behnel.de Tue Apr 24 09:34:16 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 24 Apr 2012 09:34:16 +0200 Subject: [Cython] Cython 0.16 issue Windows with Mingw32 In-Reply-To: <4F959F5F.3020907@behnel.de> References: <8E38000F-1A95-4436-B078-8C86616B96EE@illinois.edu> <4F959F5F.3020907@behnel.de> Message-ID: <4F965778.3090707@behnel.de> Stefan Behnel, 23.04.2012 20:28: > Nathan Dunfield, 23.04.2012 17:58: >> I've encountered the following issue with Cython 0.16 on Windows with >> using the Mingw32 compiler (I'm using Python 3.2 here, but I don't think >> that's the issue): >> >> $ python3 setup.py build -c mingw32 >> running build >> running build_py >> running build_ext >> skipping 'SnapPy.c' Cython extension (up-to-date) >> building 'snappy.SnapPy' extension >> c:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -Iheaders -Iunix_kit -Iaddl_code -I. -Ipari/pari-2.3.4/include/ -Ipari/pari-2.3.4/include/pari -Ic:\Python32\include -Ic:\Python32\PC -c SnapPy.c -o build\temp.win32-3.2\Release\snappy.o >> SnapPy.c: In function `__pyx_f_6snappy_6SnapPy_13Triangulation_build_rep_into_Sn': >> SnapPy.c:25187: warning: implicit declaration of function `fg_get_num_orig_gens' >> SnapPy.c:25423: warning: implicit declaration of function `candidateSn_is_valid' >> SnapPy.c:25435: warning: implicit declaration of function `candidateSn_is_transitive' >> SnapPy.c: At top level: >> SnapPy.c:76434: error: initializer element is not constant >> SnapPy.c:76434: error: (near initialization for `__pyx_CyFunctionType_type.tp_call') >> error: command 'gcc' failed with exit status 1 > > Hmm, that line basically just says "PyCFunction_Call", which is a function > exported by CPython. I wonder why gcc would consider this "not a constant". My guess is that it's a Windows-DLL issue. Maybe symbols exported by Windows-DLLs simply aren't "constant". We should be able to fix this by indirecting the slot through a static function. Stefan From vitja.makarov at gmail.com Tue Apr 24 09:45:14 2012 From: vitja.makarov at gmail.com (Vitja Makarov) Date: Tue, 24 Apr 2012 11:45:14 +0400 Subject: [Cython] Cython 0.16 issue Windows with Mingw32 In-Reply-To: <4F965778.3090707@behnel.de> References: <8E38000F-1A95-4436-B078-8C86616B96EE@illinois.edu> <4F959F5F.3020907@behnel.de> <4F965778.3090707@behnel.de> Message-ID: 2012/4/24 Stefan Behnel : > Stefan Behnel, 23.04.2012 20:28: >> Nathan Dunfield, 23.04.2012 17:58: >>> I've encountered the following issue with Cython 0.16 on Windows with >>> using the Mingw32 compiler (I'm using Python 3.2 here, but I don't think >>> that's the issue): >>> >>> $ python3 setup.py build -c mingw32 >>> running build >>> running build_py >>> running build_ext >>> skipping 'SnapPy.c' Cython extension (up-to-date) >>> building 'snappy.SnapPy' extension >>> c:\MinGW\bin\gcc.exe -mno-cygwin -mdll -O -Wall -Iheaders -Iunix_kit -Iaddl_code -I. -Ipari/pari-2.3.4/include/ -Ipari/pari-2.3.4/include/pari -Ic:\Python32\include -Ic:\Python32\PC -c SnapPy.c -o build\temp.win32-3.2\Release\snappy.o >>> SnapPy.c: In function `__pyx_f_6snappy_6SnapPy_13Triangulation_build_rep_into_Sn': >>> SnapPy.c:25187: warning: implicit declaration of function `fg_get_num_orig_gens' >>> SnapPy.c:25423: warning: implicit declaration of function `candidateSn_is_valid' >>> SnapPy.c:25435: warning: implicit declaration of function `candidateSn_is_transitive' >>> SnapPy.c: At top level: >>> SnapPy.c:76434: error: initializer element is not constant >>> SnapPy.c:76434: error: (near initialization for `__pyx_CyFunctionType_type.tp_call') >>> error: command 'gcc' failed with exit status 1 >> >> Hmm, that line basically just says "PyCFunction_Call", which is a function >> exported by CPython. I wonder why gcc would consider this "not a constant". > > My guess is that it's a Windows-DLL issue. Maybe symbols exported by > Windows-DLLs simply aren't "constant". We should be able to fix this by > indirecting the slot through a static function. > We can also fill it later at type initialization. But since we already have #define __Pyx_CyFunction_Call PyCFunction_Call and PyPy version of __Pyx_CyFunction_Call I think it's better to replace define with static function you mentioned above. -- vitja. From nmd at illinois.edu Tue Apr 24 14:22:33 2012 From: nmd at illinois.edu (Nathan Dunfield) Date: Tue, 24 Apr 2012 07:22:33 -0500 Subject: [Cython] Cython 0.16 issue Windows with Mingw32 In-Reply-To: <041194cede2147d18e21b07936093d8e@CITESHT4.ad.uillinois.edu> References: <8E38000F-1A95-4436-B078-8C86616B96EE@illinois.edu> <041194cede2147d18e21b07936093d8e@CITESHT4.ad.uillinois.edu> Message-ID: <29C698CD-797C-4A8B-93B0-B123FF198582@illinois.edu> On Apr 23, 2012, at 1:28 PM, Stefan Behnel wrote: > Hmm, that line basically just says "PyCFunction_Call", which is a function > exported by CPython. I wonder why gcc would consider this "not a constant". > > Could you check if the preprocessor (gcc -E, with all the above includes) > also sees that on your side? It turns out I was running a version of MinGW that, while only a few years old, had a quite elderly version of gcc, namely 3.4 (yes, that's 3.4 and not 4.3). Once I updated to the most recent MinGW, which has gcc 4.6, and worked around a known bug in distutils ( http://bugs.python.org/issue12641 ), the issue with PyCFunction_Call went away and the entire 76k line Cython generated C file compiled without a hitch. So probably this is only a problem on very old compilers, and so perhaps not worth investigating further. Best, Nathan From stefan_ml at behnel.de Tue Apr 24 14:43:43 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 24 Apr 2012 14:43:43 +0200 Subject: [Cython] Cython 0.16 issue Windows with Mingw32 In-Reply-To: <29C698CD-797C-4A8B-93B0-B123FF198582@illinois.edu> References: <8E38000F-1A95-4436-B078-8C86616B96EE@illinois.edu> <041194cede2147d18e21b07936093d8e@CITESHT4.ad.uillinois.edu> <29C698CD-797C-4A8B-93B0-B123FF198582@illinois.edu> Message-ID: <4F969FFF.8090702@behnel.de> Nathan Dunfield, 24.04.2012 14:22: > On Apr 23, 2012, at 1:28 PM, Stefan Behnel wrote: >> Hmm, that line basically just says "PyCFunction_Call", which is a >> function exported by CPython. I wonder why gcc would consider this >> "not a constant". >> >> Could you check if the preprocessor (gcc -E, with all the above >> includes) also sees that on your side? > > It turns out I was running a version of MinGW that, while only a few > years old, had a quite elderly version of gcc, namely 3.4 (yes, that's > 3.4 and not 4.3). Once I updated to the most recent MinGW, which has > gcc 4.6, and worked around a known bug in distutils ( > http://bugs.python.org/issue12641 ), the issue with PyCFunction_Call > went away and the entire 76k line Cython generated C file compiled > without a hitch. So probably this is only a problem on very old > compilers, and so perhaps not worth investigating further. Thanks for reporting this back. Given that this is "only" MinGW, which requires some work on user side anyway to get it properly installed and used by Python builds, I agree that we can expect those who use it to also be able to install a somewhat recent version if they run into this. After all, gcc4.x is very likely to produce faster code for recent processors than gcc 3.4, so upgrading is a way better alternative than just "making it work" on our side. Stefan From markflorisson88 at gmail.com Fri Apr 27 21:16:25 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 27 Apr 2012 20:16:25 +0100 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: <4F6B868A.1000802@behnel.de> References: <4F6B868A.1000802@behnel.de> Message-ID: On 22 March 2012 20:07, Stefan Behnel wrote: > mark florisson, 22.03.2012 19:50: >> For the fused type runtime dispatch I found it very convenient to use >> the with statement, but that is not supported in Python 2.4. However, >> the compiler could dynamically compile compiler code with the compiler >> itself and import it (pyximport), if it is not needed to compile >> Cython itself. I gave it a try and it seems to work like a charm (but >> probably needs more testing :): >> https://github.com/markflorisson88/cython/commit/0c2983056919f7f4d30a809724d7db0ace99d89b#diff-2 > > The advantages are limited, so I'm leaning towards seeing the drawbacks, of > which there are at least some. For one, *running* Cython (as opposed to > installing it) becomes more complex and involves a (much) higher first time > overhead. We'd also start putting shared libraries into user directories > without asking them first. Might be a problem on shared installations with > many users. The overhead would only be for certain python versions that try to use certain functionality, in this case, python2.4 and fused types. To be honest, the overhead isn't very large. As for compiling shared libraries, I don't think people will complain about shared libraries, that's the only way in which Python and Cython can be used. > Note also that Cython no longer compiles itself when installing in PyPy (at > all), but that would be easy to special case here (and PyPy obviously has > features like the "with" statement). > > Next, I think it would tempt us to split source files into separate modules > just because that way we can use a specific feature in one of them because > it'll get compiled (and the other half is needed for bootstrapping). That > would be bad design. Possibly. In the case of fused types, the code of the fused node is nearly 800 lines, which is probably good to separate from the other, typically smaller nodes, especially considering it's kind of a specific feature. In my case, and I wouldn't mind limiting the functionality until further discussion to that case only, using the with statement really helps keeping track of blocks, and the resulting code is much more readable than it would otherwise be. > OTOH, it might be worth taking a general look at the > code base to see what's really required for bootstrapping, or maybe for > compiling pure Python code in general. Factoring that out, and thus > separating the Python compiler from the Cython specific language features > might (might!) be an actual improvement. (Then again, there's .pxd > overriding and the "cython" module, which add Cython features back in, and > those two make Cython much more attactive...) > > I also started to dislike pyximport because it's way outdated, fragile, > complicated and its features are overly hacked together (and I'm not > entirely innocent in that regard). I would love to see a rewrite that also > supports compiling packages properly. Not a GSoC by itself, but it's > certainly a worthy project. What about a project that aims for separating > out a Python compiler and rewriting pyximport as a jitty frontend for it? > Maybe not the greatest use case ever, but a fun project, I'd say. Yeah, the code is pretty terrible, but it seems to work. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Fri Apr 27 22:16:38 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 27 Apr 2012 22:16:38 +0200 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: References: <4F6B868A.1000802@behnel.de> Message-ID: <4F9AFEA6.8030003@behnel.de> mark florisson, 27.04.2012 21:16: > On 22 March 2012 20:07, Stefan Behnel wrote: >> mark florisson, 22.03.2012 19:50: >>> For the fused type runtime dispatch I found it very convenient to use >>> the with statement, but that is not supported in Python 2.4. However, >>> the compiler could dynamically compile compiler code with the compiler >>> itself and import it (pyximport), if it is not needed to compile >>> Cython itself. I gave it a try and it seems to work like a charm (but >>> probably needs more testing :): >>> https://github.com/markflorisson88/cython/commit/0c2983056919f7f4d30a809724d7db0ace99d89b#diff-2 >> >> The advantages are limited, so I'm leaning towards seeing the drawbacks, of >> which there are at least some. For one, *running* Cython (as opposed to >> installing it) becomes more complex and involves a (much) higher first time >> overhead. We'd also start putting shared libraries into user directories >> without asking them first. Might be a problem on shared installations with >> many users. > > The overhead would only be for certain python versions that try to use > certain functionality, in this case, python2.4 and fused types. To be > honest, the overhead isn't very large. As for compiling shared > libraries, I don't think people will complain about shared libraries, > that's the only way in which Python and Cython can be used. > >> Note also that Cython no longer compiles itself when installing in PyPy (at >> all), but that would be easy to special case here (and PyPy obviously has >> features like the "with" statement). >> >> Next, I think it would tempt us to split source files into separate modules >> just because that way we can use a specific feature in one of them because >> it'll get compiled (and the other half is needed for bootstrapping). That >> would be bad design. > > Possibly. In the case of fused types, the code of the fused node is > nearly 800 lines, which is probably good to separate from the other, > typically smaller nodes, especially considering it's kind of a > specific feature. In my case, and I wouldn't mind limiting the > functionality until further discussion to that case only, using the > with statement really helps keeping track of blocks, and the resulting > code is much more readable than it would otherwise be. What about this deal: we remove the hard bootstrap dependency on the fused types code (and maybe other Cython specific features) and require its compilation at install time in Py2.4 (and maybe even 2.5). That would allow us to use newer Python syntax (and even Cython supported syntax) there (except for fused types, obviously). Failure to compile the module in Python 2.4/5 at install time would then abort the installation. Bad luck for the user, but easy to fix by installing a newer Python version. That would give us the advantage of not needing to pollute user home directories with shared libraries at runtime (which I would consider a very annoying property). Making the dependency optional can be as simple as using a try-except conditional import at the module level and setting the module name to None in the failure case. If we want to prevent weird error messages, we can just ignore the import failure during Cython's own installation (by setting a flag somewhere) and during a normal run, we use a guard that checks that the import worked (i.e. the module is non-None) before starting the compilation and raises an error otherwise. We should then clearly document in the module comment that this source file can use newer syntax while all other files cannot. Oh, and we should still take care *not* to split our source base by the "can be compiled or not" predicate. What do you think? Stefan From markflorisson88 at gmail.com Fri Apr 27 22:38:41 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 27 Apr 2012 21:38:41 +0100 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: <4F9AFEA6.8030003@behnel.de> References: <4F6B868A.1000802@behnel.de> <4F9AFEA6.8030003@behnel.de> Message-ID: On 27 April 2012 21:16, Stefan Behnel wrote: > mark florisson, 27.04.2012 21:16: >> On 22 March 2012 20:07, Stefan Behnel wrote: >>> mark florisson, 22.03.2012 19:50: >>>> For the fused type runtime dispatch I found it very convenient to use >>>> the with statement, but that is not supported in Python 2.4. However, >>>> the compiler could dynamically compile compiler code with the compiler >>>> itself and import it (pyximport), if it is not needed to compile >>>> Cython itself. I gave it a try and it seems to work like a charm (but >>>> probably needs more testing :): >>>> https://github.com/markflorisson88/cython/commit/0c2983056919f7f4d30a809724d7db0ace99d89b#diff-2 >>> >>> The advantages are limited, so I'm leaning towards seeing the drawbacks, of >>> which there are at least some. For one, *running* Cython (as opposed to >>> installing it) becomes more complex and involves a (much) higher first time >>> overhead. We'd also start putting shared libraries into user directories >>> without asking them first. Might be a problem on shared installations with >>> many users. >> >> The overhead would only be for certain python versions that try to use >> certain functionality, in this case, python2.4 and fused types. To be >> honest, the overhead isn't very large. As for compiling shared >> libraries, I don't think people will complain about shared libraries, >> that's the only way in which Python and Cython can be used. >> >>> Note also that Cython no longer compiles itself when installing in PyPy (at >>> all), but that would be easy to special case here (and PyPy obviously has >>> features like the "with" statement). >>> >>> Next, I think it would tempt us to split source files into separate modules >>> just because that way we can use a specific feature in one of them because >>> it'll get compiled (and the other half is needed for bootstrapping). That >>> would be bad design. >> >> Possibly. In the case of fused types, the code of the fused node is >> nearly 800 lines, which is probably good to separate from the other, >> typically smaller nodes, especially considering it's kind of a >> specific feature. In my case, and I wouldn't mind limiting the >> functionality until further discussion to that case only, using the >> with statement really helps keeping track of blocks, and the resulting >> code is much more readable than it would otherwise be. > > What about this deal: we remove the hard bootstrap dependency on the fused > types code (and maybe other Cython specific features) and require its > compilation at install time in Py2.4 (and maybe even 2.5). That would allow > us to use newer Python syntax (and even Cython supported syntax) there > (except for fused types, obviously). Failure to compile the module in > Python 2.4/5 at install time would then abort the installation. Bad luck > for the user, but easy to fix by installing a newer Python version. > > That would give us the advantage of not needing to pollute user home > directories with shared libraries at runtime (which I would consider a very > annoying property). I think it's fine to require compiling in the installed case (or will that be a problem for some package managers?). In the non-installed case with python versions smaller than needed, would you prefer a pyximport or an error message telling you to install Cython? Because for development you really don't want to install every time. > Making the dependency optional can be as simple as using a try-except > conditional import at the module level and setting the module name to None > in the failure case. If we want to prevent weird error messages, we can > just ignore the import failure during Cython's own installation (by setting > a flag somewhere) and during a normal run, we use a guard that checks that > the import worked (i.e. the module is non-None) before starting the > compilation and raises an error otherwise. > > We should then clearly document in the module comment that this source file > can use newer syntax while all other files cannot. Oh, and we should still > take care *not* to split our source base by the "can be compiled or not" > predicate. Right. > What do you think? > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Sat Apr 28 19:55:33 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 28 Apr 2012 19:55:33 +0200 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: References: <4F6B868A.1000802@behnel.de> <4F9AFEA6.8030003@behnel.de> Message-ID: <4F9C2F15.3010309@behnel.de> mark florisson, 27.04.2012 22:38: > On 27 April 2012 21:16, Stefan Behnel wrote: >> What about this deal: we remove the hard bootstrap dependency on the fused >> types code (and maybe other Cython specific features) and require its >> compilation at install time in Py2.4 (and maybe even 2.5). That would allow >> us to use newer Python syntax (and even Cython supported syntax) there >> (except for fused types, obviously). Failure to compile the module in >> Python 2.4/5 at install time would then abort the installation. Bad luck >> for the user, but easy to fix by installing a newer Python version. >> >> That would give us the advantage of not needing to pollute user home >> directories with shared libraries at runtime (which I would consider a very >> annoying property). > > I think it's fine to require compiling in the installed case (or will > that be a problem for some package managers?). In the non-installed > case with python versions smaller than needed, would you prefer a > pyximport or an error message telling you to install Cython? Because > for development you really don't want to install every time. I think it's fine to require at least Python 2.6 for Cython core development. Just the installation (basically, what we test in Jenkins anyway) should work in Py2.4 and shouldn't require any rebuilds at runtime. Stefan From markflorisson88 at gmail.com Sat Apr 28 20:18:09 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 28 Apr 2012 19:18:09 +0100 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: <4F9C2F15.3010309@behnel.de> References: <4F6B868A.1000802@behnel.de> <4F9AFEA6.8030003@behnel.de> <4F9C2F15.3010309@behnel.de> Message-ID: On 28 April 2012 18:55, Stefan Behnel wrote: > mark florisson, 27.04.2012 22:38: >> On 27 April 2012 21:16, Stefan Behnel wrote: >>> What about this deal: we remove the hard bootstrap dependency on the fused >>> types code (and maybe other Cython specific features) and require its >>> compilation at install time in Py2.4 (and maybe even 2.5). That would allow >>> us to use newer Python syntax (and even Cython supported syntax) there >>> (except for fused types, obviously). Failure to compile the module in >>> Python 2.4/5 at install time would then abort the installation. Bad luck >>> for the user, but easy to fix by installing a newer Python version. >>> >>> That would give us the advantage of not needing to pollute user home >>> directories with shared libraries at runtime (which I would consider a very >>> annoying property). >> >> I think it's fine to require compiling in the installed case (or will >> that be a problem for some package managers?). In the non-installed >> case with python versions smaller than needed, would you prefer a >> pyximport or an error message telling you to install Cython? Because >> for development you really don't want to install every time. > > I think it's fine to require at least Python 2.6 for Cython core > development. Just the installation (basically, what we test in Jenkins > anyway) should work in Py2.4 and shouldn't require any rebuilds at runtime. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Well, sometimes stuff works in say, 2.7 but fails in 2.4. In that case you really have to test with the failing python versions, which means you'd have to reinstall every time you want to try the tests again (this is handled automatically for py3k, which runs the 2to3 tool). I'm also sure many users just clone from git, add the directory to PYTHONPATH and work from there. So I guess what I'm saying is, it's fine to mandate compilation at compile time (don't allow flags to disable compilation), and (for me), pyximport is totally fine, but Cython must be workable (all functionality), without needing to install or build, in all versions. That means either not using the with statement, or compiling with pyximport in certain versions in certain situations only (i.e., only in incompatible python version in case the user neither built nor installed Cython). I don't think that's a problem, if people don't like to have those shared library modules (will they even notice?) in their user directories, they can install or build Cython. Finally, my attachment to the with statement here is mostly to the aesthetics of the resulting code, rewriting my pull request is not so much work, so we can leave that out of consideration here. From stefan_ml at behnel.de Sat Apr 28 20:39:28 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 28 Apr 2012 20:39:28 +0200 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: References: <4F6B868A.1000802@behnel.de> <4F9AFEA6.8030003@behnel.de> <4F9C2F15.3010309@behnel.de> Message-ID: <4F9C3960.2080004@behnel.de> mark florisson, 28.04.2012 20:18: > On 28 April 2012 18:55, Stefan Behnel wrote: >> mark florisson, 27.04.2012 22:38: >>> On 27 April 2012 21:16, Stefan Behnel wrote: >>>> What about this deal: we remove the hard bootstrap dependency on the fused >>>> types code (and maybe other Cython specific features) and require its >>>> compilation at install time in Py2.4 (and maybe even 2.5). That would allow >>>> us to use newer Python syntax (and even Cython supported syntax) there >>>> (except for fused types, obviously). Failure to compile the module in >>>> Python 2.4/5 at install time would then abort the installation. Bad luck >>>> for the user, but easy to fix by installing a newer Python version. >>>> >>>> That would give us the advantage of not needing to pollute user home >>>> directories with shared libraries at runtime (which I would consider a very >>>> annoying property). >>> >>> I think it's fine to require compiling in the installed case (or will >>> that be a problem for some package managers?). In the non-installed >>> case with python versions smaller than needed, would you prefer a >>> pyximport or an error message telling you to install Cython? Because >>> for development you really don't want to install every time. >> >> I think it's fine to require at least Python 2.6 for Cython core >> development. Just the installation (basically, what we test in Jenkins >> anyway) should work in Py2.4 and shouldn't require any rebuilds at runtime. > > Well, sometimes stuff works in say, 2.7 but fails in 2.4. In that case > you really have to test with the failing python versions, which means > you'd have to reinstall every time you want to try the tests again > (this is handled automatically for py3k, which runs the 2to3 tool). The number of times I recently ran tests in Py2.4 myself is really not worth mentioning. Most of the time, when something fails there, the error I get in Jenkins is so obvious that I just commit an untested fix for it. I think it's really acceptable to require a run of "setup.py build_ext -i" for local developer testing in Py2.4. > I'm also sure many users just clone from git, add the directory to > PYTHONPATH and work from there. I'm sure there are close to no users who try to do that with Py2.4 these days. Maybe there are some who do it with Py2.5, but we are not currently considering to break a plain Python run there, AFAICT. I think the normal way users employ Cython is after a proper installation. > So I guess what I'm saying is, it's fine to mandate compilation at > compile time (don't allow flags to disable compilation), and (for me), > pyximport is totally fine, but Cython must be workable (all > functionality), without needing to install or build, in all versions. Workable, ok. But if fused types are only available in a compiled installed version under Python 2.4, that's maybe not optimal but certainly acceptable. Users of Py2.4 should be used to suffering anyway. > That means either not using the with statement, or compiling with > pyximport in certain versions in certain situations only (i.e., only > in incompatible python version in case the user neither built nor > installed Cython). I don't think that's a problem, if people don't > like to have those shared library modules (will they even notice?) in > their user directories, they can install or build Cython. Why require the use of pyximport at runtime when we can do everything during installation? I really don't see an advantage. > Finally, my attachment to the with statement here is mostly to the > aesthetics of the resulting code, rewriting my pull request is not so > much work, so we can leave that out of consideration here. If it's not too much work, that would obviously make things go a lot smoother. Stefan From markflorisson88 at gmail.com Sat Apr 28 21:55:33 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 28 Apr 2012 20:55:33 +0100 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: <4F9C3960.2080004@behnel.de> References: <4F6B868A.1000802@behnel.de> <4F9AFEA6.8030003@behnel.de> <4F9C2F15.3010309@behnel.de> <4F9C3960.2080004@behnel.de> Message-ID: On 28 April 2012 19:39, Stefan Behnel wrote: > mark florisson, 28.04.2012 20:18: >> On 28 April 2012 18:55, Stefan Behnel wrote: >>> mark florisson, 27.04.2012 22:38: >>>> On 27 April 2012 21:16, Stefan Behnel wrote: >>>>> What about this deal: we remove the hard bootstrap dependency on the fused >>>>> types code (and maybe other Cython specific features) and require its >>>>> compilation at install time in Py2.4 (and maybe even 2.5). That would allow >>>>> us to use newer Python syntax (and even Cython supported syntax) there >>>>> (except for fused types, obviously). Failure to compile the module in >>>>> Python 2.4/5 at install time would then abort the installation. Bad luck >>>>> for the user, but easy to fix by installing a newer Python version. >>>>> >>>>> That would give us the advantage of not needing to pollute user home >>>>> directories with shared libraries at runtime (which I would consider a very >>>>> annoying property). >>>> >>>> I think it's fine to require compiling in the installed case (or will >>>> that be a problem for some package managers?). In the non-installed >>>> case with python versions smaller than needed, would you prefer a >>>> pyximport or an error message telling you to install Cython? Because >>>> for development you really don't want to install every time. >>> >>> I think it's fine to require at least Python 2.6 for Cython core >>> development. Just the installation (basically, what we test in Jenkins >>> anyway) should work in Py2.4 and shouldn't require any rebuilds at runtime. >> >> Well, sometimes stuff works in say, 2.7 but fails in 2.4. In that case >> you really have to test with the failing python versions, which means >> you'd have to reinstall every time you want to try the tests again >> (this is handled automatically for py3k, which runs the 2to3 tool). > > The number of times I recently ran tests in Py2.4 myself is really not > worth mentioning. Most of the time, when something fails there, the error I > get in Jenkins is so obvious that I just commit an untested fix for it. In my experience that still fails quite often, you may still forget some test or accidentally add some whitespace, and then you're going to build everything on Jenkins only to realize one hour later that something is still broken. Maybe it's because of the buffer differences and because 2.4 is the first thing that runs on Jenkins, but I test quite often in 2.4. > I think it's really acceptable to require a run of "setup.py build_ext -i" > for local developer testing in Py2.4. > > >> I'm also sure many users just clone from git, add the directory to >> PYTHONPATH and work from there. > > I'm sure there are close to no users who try to do that with Py2.4 these > days. Maybe there are some who do it with Py2.5, but we are not currently > considering to break a plain Python run there, AFAICT. > > I think the normal way users employ Cython is after a proper installation. > > >> So I guess what I'm saying is, it's fine to mandate compilation at >> compile time (don't allow flags to disable compilation), and (for me), >> pyximport is totally fine, but Cython must be workable (all >> functionality), without needing to install or build, in all versions. > > Workable, ok. But if fused types are only available in a compiled installed > version under Python 2.4, that's maybe not optimal but certainly > acceptable. Users of Py2.4 should be used to suffering anyway. > That's a really confusing statement. If they are used to suffering, they why can't they bear a runtime-compiled module if they didn't install? :) If a user installs she will have at least one compiled module in the system, but if she doesn't install, she will also have one compiled module. I'm kind of wondering though, if this is really such a big problem, then it means we can also never cache any user JIT-compiled code (like e.g. OpenCL)? >> That means either not using the with statement, or compiling with >> pyximport in certain versions in certain situations only (i.e., only >> in incompatible python version in case the user neither built nor >> installed Cython). I don't think that's a problem, if people don't >> like to have those shared library modules (will they even notice?) in >> their user directories, they can install or build Cython. > > Why require the use of pyximport at runtime when we can do everything > during installation? I really don't see an advantage. > We will do everything during installation, but not mandate an installation. >> Finally, my attachment to the with statement here is mostly to the >> aesthetics of the resulting code, rewriting my pull request is not so >> much work, so we can leave that out of consideration here. > > If it's not too much work, that would obviously make things go a lot smoother. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I'm not sure why we're always making such a fuss over the little things, but I suppose I'd prefer mandating a compile in 2.4 over not having the with statement. Unless you want to mandate installing in every version, this still means we can't write any parts of the compiler in actual Cython code. From njs at pobox.com Sat Apr 28 23:04:26 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 28 Apr 2012 22:04:26 +0100 Subject: [Cython] Wacky idea: proper macros Message-ID: Was chatting with Wes today about the usual problem many of us have encountered with needing to use some sort of templating system to generate code handling multiple types, operations, etc., and a wacky idea occurred to me. So I thought I'd through it out here. What if we added a simple macro facility to Cython, that worked at the AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) Basically some way to write arbitrary Python code into a .pyx file that gets executed at compile time and can transform the AST, plus some nice convenience APIs for simple transformations. E.g., if we steal the illegal token sequence @@ as our marker, we could have something like: @@ # alone on a line, starts a block of Python code from Cython.MacroUtil import replace_ctype def expand_types(placeholder, typelist): def my_decorator(function_name, ast): functions = {} for typename in typelist: new_name = "%s_%s" % (function_name, typename) functions[name] = replace_ctype(ast, placeholder, typename) return functions return function_decorator @@ # this token sequence cannot occur in Python, so it's a safe end-marker # Compile-time function decorator # Results in two cdef functions named sum_double and sum_int @@expand_types("T", ["double", "int"]) cdef T sum(np.ndarray[T] arr): cdef T start = 0; for i in range(arr.size): start += arr[i] return start I don't know if this is a good idea, but it seems like it'd be very easy to do on the Cython side, fairly clean, and be dramatically less horrible than all the ad-hoc templating stuff people do now. Presumably there'd be strict limits on how much backwards compatibility we'd be willing to guarantee for code that went poking around in the AST by hand, but a small handful of functions like my notional "replace_ctype" would go a long way, and wouldn't impose much of a compatibility burden. -- Nathaniel From markflorisson88 at gmail.com Sat Apr 28 23:25:03 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 28 Apr 2012 22:25:03 +0100 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On 28 April 2012 22:04, Nathaniel Smith wrote: > Was chatting with Wes today about the usual problem many of us have > encountered with needing to use some sort of templating system to > generate code handling multiple types, operations, etc., and a wacky > idea occurred to me. So I thought I'd through it out here. > > What if we added a simple macro facility to Cython, that worked at the > AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) > Basically some way to write arbitrary Python code into a .pyx file > that gets executed at compile time and can transform the AST, plus > some nice convenience APIs for simple transformations. > > E.g., if we steal the illegal token sequence @@ as our marker, we > could have something like: > > @@ # alone on a line, starts a block of Python code > from Cython.MacroUtil import replace_ctype > def expand_types(placeholder, typelist): > ?def my_decorator(function_name, ast): > ? ?functions = {} > ? ?for typename in typelist: > ? ? ?new_name = "%s_%s" % (function_name, typename) > ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) > ? ?return functions > ?return function_decorator > @@ # this token sequence cannot occur in Python, so it's a safe end-marker > > # Compile-time function decorator > # Results in two cdef functions named sum_double and sum_int > @@expand_types("T", ["double", "int"]) > cdef T sum(np.ndarray[T] arr): > ?cdef T start = 0; > ?for i in range(arr.size): > ? ?start += arr[i] > ?return start > > I don't know if this is a good idea, but it seems like it'd be very > easy to do on the Cython side, fairly clean, and be dramatically less > horrible than all the ad-hoc templating stuff people do now. > Presumably there'd be strict limits on how much backwards > compatibility we'd be willing to guarantee for code that went poking > around in the AST by hand, but a small handful of functions like my > notional "replace_ctype" would go a long way, and wouldn't impose much > of a compatibility burden. > > -- Nathaniel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? In general I would like better meta-programming support, maybe even allow defining new operators (although I'm not sure any of it is very pythonic), but for templates I think fused types should be used, or improved when they fall short. Maybe a plugin system could also help people. From wesmckinn at gmail.com Sun Apr 29 03:14:51 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 28 Apr 2012 21:14:51 -0400 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On Sat, Apr 28, 2012 at 5:25 PM, mark florisson wrote: > On 28 April 2012 22:04, Nathaniel Smith wrote: >> Was chatting with Wes today about the usual problem many of us have >> encountered with needing to use some sort of templating system to >> generate code handling multiple types, operations, etc., and a wacky >> idea occurred to me. So I thought I'd through it out here. >> >> What if we added a simple macro facility to Cython, that worked at the >> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >> Basically some way to write arbitrary Python code into a .pyx file >> that gets executed at compile time and can transform the AST, plus >> some nice convenience APIs for simple transformations. >> >> E.g., if we steal the illegal token sequence @@ as our marker, we >> could have something like: >> >> @@ # alone on a line, starts a block of Python code >> from Cython.MacroUtil import replace_ctype >> def expand_types(placeholder, typelist): >> ?def my_decorator(function_name, ast): >> ? ?functions = {} >> ? ?for typename in typelist: >> ? ? ?new_name = "%s_%s" % (function_name, typename) >> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >> ? ?return functions >> ?return function_decorator >> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >> >> # Compile-time function decorator >> # Results in two cdef functions named sum_double and sum_int >> @@expand_types("T", ["double", "int"]) >> cdef T sum(np.ndarray[T] arr): >> ?cdef T start = 0; >> ?for i in range(arr.size): >> ? ?start += arr[i] >> ?return start >> >> I don't know if this is a good idea, but it seems like it'd be very >> easy to do on the Cython side, fairly clean, and be dramatically less >> horrible than all the ad-hoc templating stuff people do now. >> Presumably there'd be strict limits on how much backwards >> compatibility we'd be willing to guarantee for code that went poking >> around in the AST by hand, but a small handful of functions like my >> notional "replace_ctype" would go a long way, and wouldn't impose much >> of a compatibility burden. >> >> -- Nathaniel >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? > > In general I would like better meta-programming support, maybe even > allow defining new operators (although I'm not sure any of it is very > pythonic), but for templates I think fused types should be used, or > improved when they fall short. Maybe a plugin system could also help > people. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I referenced this problem recently in a blog post (http://wesmckinney.com/blog/?p=467). My main interest these days is in expressing data algorithms. I've unfortunately found myself working around performance problems with fundamental array operations in NumPy so a lot of the Cython work I've done has been in and around this. In lieu of some kind of macro system it seems inevitable that I'm going to need to create some kind of mini array language or otherwise code generation framework (targeting C, Cython, or Fortran). I worry that this is going to end with me creating "yet another APL [or Haskell] implementation" but I really need something that runs inside CPython. And why not? Most of these algorithms could be expressed at a very high level and lead to pretty clean generated C with many of the special cases (contiguous memory, or low dimensions in the case of n-dimensional algorithms) checked and handled in simplified loops. - Wes From stefan_ml at behnel.de Sun Apr 29 06:50:54 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 29 Apr 2012 06:50:54 +0200 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: References: <4F6B868A.1000802@behnel.de> <4F9AFEA6.8030003@behnel.de> <4F9C2F15.3010309@behnel.de> <4F9C3960.2080004@behnel.de> Message-ID: <4F9CC8AE.1050901@behnel.de> mark florisson, 28.04.2012 21:55: > On 28 April 2012 19:39, Stefan Behnel wrote: >> mark florisson, 28.04.2012 20:18: >>> On 28 April 2012 18:55, Stefan Behnel wrote: >>>> mark florisson, 27.04.2012 22:38: >>>>> On 27 April 2012 21:16, Stefan Behnel wrote: >>>>>> What about this deal: we remove the hard bootstrap dependency on the fused >>>>>> types code (and maybe other Cython specific features) and require its >>>>>> compilation at install time in Py2.4 (and maybe even 2.5). That would allow >>>>>> us to use newer Python syntax (and even Cython supported syntax) there >>>>>> (except for fused types, obviously). Failure to compile the module in >>>>>> Python 2.4/5 at install time would then abort the installation. Bad luck >>>>>> for the user, but easy to fix by installing a newer Python version. >>>>>> >>>>>> That would give us the advantage of not needing to pollute user home >>>>>> directories with shared libraries at runtime (which I would consider a very >>>>>> annoying property). >>>>> >>>>> I think it's fine to require compiling in the installed case (or will >>>>> that be a problem for some package managers?). In the non-installed >>>>> case with python versions smaller than needed, would you prefer a >>>>> pyximport or an error message telling you to install Cython? Because >>>>> for development you really don't want to install every time. >>>> >>>> I think it's fine to require at least Python 2.6 for Cython core >>>> development. Just the installation (basically, what we test in Jenkins >>>> anyway) should work in Py2.4 and shouldn't require any rebuilds at runtime. >>> >>> Well, sometimes stuff works in say, 2.7 but fails in 2.4. In that case >>> you really have to test with the failing python versions, which means >>> you'd have to reinstall every time you want to try the tests again >>> (this is handled automatically for py3k, which runs the 2to3 tool). >> >> The number of times I recently ran tests in Py2.4 myself is really not >> worth mentioning. Most of the time, when something fails there, the error I >> get in Jenkins is so obvious that I just commit an untested fix for it. > > In my experience that still fails quite often, you may still forget > some test or accidentally add some whitespace, and then you're going > to build everything on Jenkins only to realize one hour later that > something is still broken. Maybe it's because of the buffer > differences and because 2.4 is the first thing that runs on Jenkins, > but I test quite often in 2.4. That may be the reason. Still, how much overhead is it really to run "setup.py build_ext -i"? Especially compared to getting the exact same compilation overhead with an import time triggered rebuild? >> I think it's really acceptable to require a run of "setup.py build_ext -i" >> for local developer testing in Py2.4. >> >>> I'm also sure many users just clone from git, add the directory to >>> PYTHONPATH and work from there. >> >> I'm sure there are close to no users who try to do that with Py2.4 these >> days. Maybe there are some who do it with Py2.5, but we are not currently >> considering to break a plain Python run there, AFAICT. >> >> I think the normal way users employ Cython is after a proper installation. >> >>> So I guess what I'm saying is, it's fine to mandate compilation at >>> compile time (don't allow flags to disable compilation), and (for me), >>> pyximport is totally fine, but Cython must be workable (all >>> functionality), without needing to install or build, in all versions. >> >> Workable, ok. But if fused types are only available in a compiled installed >> version under Python 2.4, that's maybe not optimal but certainly >> acceptable. Users of Py2.4 should be used to suffering anyway. > > That's a really confusing statement. If they are used to suffering, > they why can't they bear a runtime-compiled module if they didn't > install? :) If a user installs she will have at least one compiled > module in the system, but if she doesn't install, she will also have > one compiled module. > > I'm kind of wondering though, if this is really such a big problem, > then it means we can also never cache any user JIT-compiled code (like > e.g. OpenCL)? That's a different situation because it's the user's decision to use a (caching) JIT compiler in the first place. We don't enforce that. Basically, what I'm saying is: why do something complicated at runtime when we can just do everything normally at install time and be done? Once Cython is installed, it should just work, without needing to rebuild parts of itself. I'm convinced that that will make it a lot more acceptable for package maintainers, sys admins and users. Keep in mind that the installation under PyPy doesn't compile anything and just installs the plain Python modules. Enforcing compilation in Py2.4 is just saying that sys admins of very old systems (which we still want to support, at least in the generated C code) will no longer get the benefit of the pure Python installation feature. It doesn't impact the users at all, that's a feature. >>> That means either not using the with statement, or compiling with >>> pyximport in certain versions in certain situations only (i.e., only >>> in incompatible python version in case the user neither built nor >>> installed Cython). I don't think that's a problem, if people don't >>> like to have those shared library modules (will they even notice?) in >>> their user directories, they can install or build Cython. >> >> Why require the use of pyximport at runtime when we can do everything >> during installation? I really don't see an advantage. > > We will do everything during installation, but not mandate an installation. I think we should in the case at hand. >>> Finally, my attachment to the with statement here is mostly to the >>> aesthetics of the resulting code, rewriting my pull request is not so >>> much work, so we can leave that out of consideration here. >> >> If it's not too much work, that would obviously make things go a lot smoother. > > I'm not sure why we're always making such a fuss over the little > things, but I suppose I'd prefer mandating a compile in 2.4 over not > having the with statement. Ok. > Unless you want to mandate installing in > every version, this still means we can't write any parts of the > compiler in actual Cython code. We're doing just that for Py2.4 now by starting to use illegal syntax. However, since you're changing topics here already, do you actually see any advantage in starting to use non-Python code? I don't think that would be a good idea at all. I think we've gotten along quite happily with plain Python code for the compiler itself so far and I don't see an interest in changing that. In particular, I can't see the compiler being in need of any of its own non-Python language features. Stefan From stefan_ml at behnel.de Sun Apr 29 07:11:46 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 29 Apr 2012 07:11:46 +0200 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: <4F9CCD92.5090602@behnel.de> Wes McKinney, 29.04.2012 03:14: > On Sat, Apr 28, 2012 at 5:25 PM, mark florisson wrote: >> On 28 April 2012 22:04, Nathaniel Smith wrote: >>> Was chatting with Wes today about the usual problem many of us have >>> encountered with needing to use some sort of templating system to >>> generate code handling multiple types, operations, etc., and a wacky >>> idea occurred to me. So I thought I'd through it out here. >>> >>> What if we added a simple macro facility to Cython, that worked at the >>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >>> Basically some way to write arbitrary Python code into a .pyx file >>> that gets executed at compile time and can transform the AST, plus >>> some nice convenience APIs for simple transformations. >>> >>> E.g., if we steal the illegal token sequence @@ as our marker, we >>> could have something like: >>> >>> @@ # alone on a line, starts a block of Python code >>> from Cython.MacroUtil import replace_ctype >>> def expand_types(placeholder, typelist): >>> def my_decorator(function_name, ast): >>> functions = {} >>> for typename in typelist: >>> new_name = "%s_%s" % (function_name, typename) >>> functions[name] = replace_ctype(ast, placeholder, typename) >>> return functions >>> return function_decorator >>> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >>> >>> # Compile-time function decorator >>> # Results in two cdef functions named sum_double and sum_int >>> @@expand_types("T", ["double", "int"]) >>> cdef T sum(np.ndarray[T] arr): >>> cdef T start = 0; >>> for i in range(arr.size): >>> start += arr[i] >>> return start >>> >>> I don't know if this is a good idea, but it seems like it'd be very >>> easy to do on the Cython side, fairly clean, and be dramatically less >>> horrible than all the ad-hoc templating stuff people do now. >>> Presumably there'd be strict limits on how much backwards >>> compatibility we'd be willing to guarantee for code that went poking >>> around in the AST by hand, but a small handful of functions like my >>> notional "replace_ctype" would go a long way, and wouldn't impose much >>> of a compatibility burden. >> >> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? >> >> In general I would like better meta-programming support, maybe even >> allow defining new operators (although I'm not sure any of it is very >> pythonic), but for templates I think fused types should be used, or >> improved when they fall short. Maybe a plugin system could also help >> people. > > I referenced this problem recently in a blog post > (http://wesmckinney.com/blog/?p=467). My main interest these days is > in expressing data algorithms. I've unfortunately found myself working > around performance problems with fundamental array operations in NumPy > so a lot of the Cython work I've done has been in and around this. In > lieu of some kind of macro system it seems inevitable that I'm going > to need to create some kind of mini array language or otherwise code > generation framework (targeting C, Cython, or Fortran). I worry that > this is going to end with me creating "yet another APL [or Haskell] > implementation" but I really need something that runs inside CPython. Generally speaking, it's always better to collect and describe use cases first before adding a language feature, especially one that is as complex and far reaching as this. It might well be that fused types can (be made to) work for them, and it might be that a (non AST based) preprocessor step would work. Keeping metaprogramming facilities out of the compiler makes it both more generally versatile (and easier to replace or disable) and keeps both sides simpler. > And why not? Most of these algorithms could be expressed at a very > high level and lead to pretty clean generated C with many of the > special cases (contiguous memory, or low dimensions in the case of > n-dimensional algorithms) checked and handled in simplified loops. That sounds like what you want is a preprocessor that spells out NumPy array configuration options into separate code paths. However, I'm not sure a generic approach would work well (enough?) here. And it also doesn't sound like this needs to be done inside of the compiler. It should be possible to build that on top of fused types as a separate preprocessor. Stefan From d.s.seljebotn at astro.uio.no Sun Apr 29 09:08:59 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 29 Apr 2012 09:08:59 +0200 Subject: [Cython] Wacky idea: proper macros In-Reply-To: <4F9CCD92.5090602@behnel.de> References: <4F9CCD92.5090602@behnel.de> Message-ID: <3e0b9999-b1d2-42e2-aa62-4a4b05a7deac@email.android.com> Stefan Behnel wrote: >Wes McKinney, 29.04.2012 03:14: >> On Sat, Apr 28, 2012 at 5:25 PM, mark florisson wrote: >>> On 28 April 2012 22:04, Nathaniel Smith wrote: >>>> Was chatting with Wes today about the usual problem many of us have >>>> encountered with needing to use some sort of templating system to >>>> generate code handling multiple types, operations, etc., and a >wacky >>>> idea occurred to me. So I thought I'd through it out here. >>>> >>>> What if we added a simple macro facility to Cython, that worked at >the >>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style >macros.) >>>> Basically some way to write arbitrary Python code into a .pyx file >>>> that gets executed at compile time and can transform the AST, plus >>>> some nice convenience APIs for simple transformations. >>>> >>>> E.g., if we steal the illegal token sequence @@ as our marker, we >>>> could have something like: >>>> >>>> @@ # alone on a line, starts a block of Python code >>>> from Cython.MacroUtil import replace_ctype >>>> def expand_types(placeholder, typelist): >>>> def my_decorator(function_name, ast): >>>> functions = {} >>>> for typename in typelist: >>>> new_name = "%s_%s" % (function_name, typename) >>>> functions[name] = replace_ctype(ast, placeholder, typename) >>>> return functions >>>> return function_decorator >>>> @@ # this token sequence cannot occur in Python, so it's a safe >end-marker >>>> >>>> # Compile-time function decorator >>>> # Results in two cdef functions named sum_double and sum_int >>>> @@expand_types("T", ["double", "int"]) >>>> cdef T sum(np.ndarray[T] arr): >>>> cdef T start = 0; >>>> for i in range(arr.size): >>>> start += arr[i] >>>> return start >>>> >>>> I don't know if this is a good idea, but it seems like it'd be very >>>> easy to do on the Cython side, fairly clean, and be dramatically >less >>>> horrible than all the ad-hoc templating stuff people do now. >>>> Presumably there'd be strict limits on how much backwards >>>> compatibility we'd be willing to guarantee for code that went >poking >>>> around in the AST by hand, but a small handful of functions like my >>>> notional "replace_ctype" would go a long way, and wouldn't impose >much >>>> of a compatibility burden. >>> >>> Have you looked at >http://wiki.cython.org/enhancements/metaprogramming ? >>> >>> In general I would like better meta-programming support, maybe even >>> allow defining new operators (although I'm not sure any of it is >very >>> pythonic), but for templates I think fused types should be used, or >>> improved when they fall short. Maybe a plugin system could also help >>> people. >> >> I referenced this problem recently in a blog post >> (http://wesmckinney.com/blog/?p=467). My main interest these days is >> in expressing data algorithms. I've unfortunately found myself >working >> around performance problems with fundamental array operations in >NumPy >> so a lot of the Cython work I've done has been in and around this. In >> lieu of some kind of macro system it seems inevitable that I'm going >> to need to create some kind of mini array language or otherwise code >> generation framework (targeting C, Cython, or Fortran). I worry that >> this is going to end with me creating "yet another APL [or Haskell] >> implementation" but I really need something that runs inside >CPython. > >Generally speaking, it's always better to collect and describe use >cases >first before adding a language feature, especially one that is as >complex >and far reaching as this. > >It might well be that fused types can (be made to) work for them, and >it >might be that a (non AST based) preprocessor step would work. Keeping >metaprogramming facilities out of the compiler makes it both more >generally >versatile (and easier to replace or disable) and keeps both sides >simpler. > > >> And why not? Most of these algorithms could be expressed at a very >> high level and lead to pretty clean generated C with many of the >> special cases (contiguous memory, or low dimensions in the case of >> n-dimensional algorithms) checked and handled in simplified loops. > >That sounds like what you want is a preprocessor that spells out NumPy >array configuration options into separate code paths. However, I'm not >sure >a generic approach would work well (enough?) here. And it also doesn't >sound like this needs to be done inside of the compiler. It should be >possible to build that on top of fused types as a separate >preprocessor. Well, *of* *course* it is possible to do it using a preprocessor. The question is whether it would be a nicer experience if Cython had macros, and whether the feature would fit well with Cython. I know I would use Cython more if it had metaprogramming, but you can't cater to everyone. Please, everybody, look at the metaprogramming in Julia for a well done example before we discuss this further (julialang.org)... When done at that level, you don't need to worry about AST APIs either. Wes: for the things you talk about I simply use jinja2 or Tempita to generate C code. If you need to stay closer to NumPy and Python though, I guess you'd generate Cython code. But I guess this is what you do already and you want something nicer? Would somebody be likely to work on this, or is it just a feature request? (I wish I could work on it, but I just can't....I am +1 on macros done right, but -1 on them done badly) Dag > >Stefan >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From njs at pobox.com Sun Apr 29 09:42:14 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 29 Apr 2012 08:42:14 +0100 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On Sat, Apr 28, 2012 at 10:25 PM, mark florisson wrote: > On 28 April 2012 22:04, Nathaniel Smith wrote: >> Was chatting with Wes today about the usual problem many of us have >> encountered with needing to use some sort of templating system to >> generate code handling multiple types, operations, etc., and a wacky >> idea occurred to me. So I thought I'd through it out here. >> >> What if we added a simple macro facility to Cython, that worked at the >> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >> Basically some way to write arbitrary Python code into a .pyx file >> that gets executed at compile time and can transform the AST, plus >> some nice convenience APIs for simple transformations. >> >> E.g., if we steal the illegal token sequence @@ as our marker, we >> could have something like: >> >> @@ # alone on a line, starts a block of Python code >> from Cython.MacroUtil import replace_ctype >> def expand_types(placeholder, typelist): >> ?def my_decorator(function_name, ast): >> ? ?functions = {} >> ? ?for typename in typelist: >> ? ? ?new_name = "%s_%s" % (function_name, typename) >> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >> ? ?return functions >> ?return function_decorator >> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >> >> # Compile-time function decorator >> # Results in two cdef functions named sum_double and sum_int >> @@expand_types("T", ["double", "int"]) >> cdef T sum(np.ndarray[T] arr): >> ?cdef T start = 0; >> ?for i in range(arr.size): >> ? ?start += arr[i] >> ?return start >> >> I don't know if this is a good idea, but it seems like it'd be very >> easy to do on the Cython side, fairly clean, and be dramatically less >> horrible than all the ad-hoc templating stuff people do now. >> Presumably there'd be strict limits on how much backwards >> compatibility we'd be willing to guarantee for code that went poking >> around in the AST by hand, but a small handful of functions like my >> notional "replace_ctype" would go a long way, and wouldn't impose much >> of a compatibility burden. >> >> -- Nathaniel >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > > Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? > > In general I would like better meta-programming support, maybe even > allow defining new operators (although I'm not sure any of it is very > pythonic), but for templates I think fused types should be used, or > improved when they fall short. Maybe a plugin system could also help > people. I hadn't seen that, no -- thanks for the link. I have to say that the examples in that link, though, give me the impression of a cool solution looking for a problem. I've never wished I could symbolically differentiate Python expressions at compile time, or create a mutant Python+SQL hybrid language. Actually I guess I've only missed define-syntax once in maybe 10 years of hacking in Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it will peek at the caller's syntax tree to automagically label the axes as "x" and "log(y)", and that can't be done in Python. But that's not exactly a convincing argument for a macro system. But generating optimized code is Cython's whole selling point, and people really are doing klugey tricks with string-based preprocessors just to generate multiple copies of loops in Cython and C. Also, fused types are great, but: (1) IIUC you can't actually do ndarray[fused_type] yet, which speaks to the feature's complexity, and (2) to handle Wes's original example on his blog (duplicating a bunch of code between a "sum" path and a "product" path), you'd actually need something like "fused operators", which aren't even on the horizon. So it seems unlikely that fused types will grow to cover all these cases in the near future. Of course some experimentation would be needed to find the right syntax and convenience functions for this feature too, so maybe I'm just being over-optimistic and it would also turn out to be very complicated :-). But it seems like some simple AST search/replace functions would get you a long way. - N From markflorisson88 at gmail.com Sun Apr 29 11:56:23 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 29 Apr 2012 10:56:23 +0100 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On 29 April 2012 08:42, Nathaniel Smith wrote: > On Sat, Apr 28, 2012 at 10:25 PM, mark florisson > wrote: >> On 28 April 2012 22:04, Nathaniel Smith wrote: >>> Was chatting with Wes today about the usual problem many of us have >>> encountered with needing to use some sort of templating system to >>> generate code handling multiple types, operations, etc., and a wacky >>> idea occurred to me. So I thought I'd through it out here. >>> >>> What if we added a simple macro facility to Cython, that worked at the >>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >>> Basically some way to write arbitrary Python code into a .pyx file >>> that gets executed at compile time and can transform the AST, plus >>> some nice convenience APIs for simple transformations. >>> >>> E.g., if we steal the illegal token sequence @@ as our marker, we >>> could have something like: >>> >>> @@ # alone on a line, starts a block of Python code >>> from Cython.MacroUtil import replace_ctype >>> def expand_types(placeholder, typelist): >>> ?def my_decorator(function_name, ast): >>> ? ?functions = {} >>> ? ?for typename in typelist: >>> ? ? ?new_name = "%s_%s" % (function_name, typename) >>> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >>> ? ?return functions >>> ?return function_decorator >>> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >>> >>> # Compile-time function decorator >>> # Results in two cdef functions named sum_double and sum_int >>> @@expand_types("T", ["double", "int"]) >>> cdef T sum(np.ndarray[T] arr): >>> ?cdef T start = 0; >>> ?for i in range(arr.size): >>> ? ?start += arr[i] >>> ?return start >>> >>> I don't know if this is a good idea, but it seems like it'd be very >>> easy to do on the Cython side, fairly clean, and be dramatically less >>> horrible than all the ad-hoc templating stuff people do now. >>> Presumably there'd be strict limits on how much backwards >>> compatibility we'd be willing to guarantee for code that went poking >>> around in the AST by hand, but a small handful of functions like my >>> notional "replace_ctype" would go a long way, and wouldn't impose much >>> of a compatibility burden. >>> >>> -- Nathaniel >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? >> >> In general I would like better meta-programming support, maybe even >> allow defining new operators (although I'm not sure any of it is very >> pythonic), but for templates I think fused types should be used, or >> improved when they fall short. Maybe a plugin system could also help >> people. > > I hadn't seen that, no -- thanks for the link. > > I have to say that the examples in that link, though, give me the > impression of a cool solution looking for a problem. I've never wished > I could symbolically differentiate Python expressions at compile time, > or create a mutant Python+SQL hybrid language. Actually I guess I've > only missed define-syntax once in maybe 10 years of hacking in > Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it > will peek at the caller's syntax tree to automagically label the axes > as "x" and "log(y)", and that can't be done in Python. But that's not > exactly a convincing argument for a macro system. > > But generating optimized code is Cython's whole selling point, and > people really are doing klugey tricks with string-based preprocessors > just to generate multiple copies of loops in Cython and C. > > Also, fused types are great, but: (1) IIUC you can't actually do > ndarray[fused_type] yet, which speaks to the feature's complexity, and What? Yes you can do that. > (2) to handle Wes's original example on his blog (duplicating a bunch > of code between a "sum" path and a "product" path), you'd actually > need something like "fused operators", which aren't even on the > horizon. So it seems unlikely that fused types will grow to cover all > these cases in the near future. Although it doesn't handle contiguity or dimensional differences, currently the efficient fused operator is a function pointer. Wouldn't passing in a float64_t (*reducer)(float64_t, float64_t) work in this case (in the face of multiple types, you can have fused parameters in the function pointer as well)? I agree with Dag that Julia has nice metaprogramming support, maybe functions could take arbitrary compile time expressions as extra arguments. > Of course some experimentation would be needed to find the right > syntax and convenience functions for this feature too, so maybe I'm > just being over-optimistic and it would also turn out to be very > complicated :-). But it seems like some simple AST search/replace > functions would get you a long way. > > - N > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Sun Apr 29 12:12:08 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 29 Apr 2012 11:12:08 +0100 Subject: [Cython] dynamically compile and import pure python modules In-Reply-To: <4F9CC8AE.1050901@behnel.de> References: <4F6B868A.1000802@behnel.de> <4F9AFEA6.8030003@behnel.de> <4F9C2F15.3010309@behnel.de> <4F9C3960.2080004@behnel.de> <4F9CC8AE.1050901@behnel.de> Message-ID: On 29 April 2012 05:50, Stefan Behnel wrote: > mark florisson, 28.04.2012 21:55: >> On 28 April 2012 19:39, Stefan Behnel wrote: >>> mark florisson, 28.04.2012 20:18: >>>> On 28 April 2012 18:55, Stefan Behnel wrote: >>>>> mark florisson, 27.04.2012 22:38: >>>>>> On 27 April 2012 21:16, Stefan Behnel wrote: >>>>>>> What about this deal: we remove the hard bootstrap dependency on the fused >>>>>>> types code (and maybe other Cython specific features) and require its >>>>>>> compilation at install time in Py2.4 (and maybe even 2.5). That would allow >>>>>>> us to use newer Python syntax (and even Cython supported syntax) there >>>>>>> (except for fused types, obviously). Failure to compile the module in >>>>>>> Python 2.4/5 at install time would then abort the installation. Bad luck >>>>>>> for the user, but easy to fix by installing a newer Python version. >>>>>>> >>>>>>> That would give us the advantage of not needing to pollute user home >>>>>>> directories with shared libraries at runtime (which I would consider a very >>>>>>> annoying property). >>>>>> >>>>>> I think it's fine to require compiling in the installed case (or will >>>>>> that be a problem for some package managers?). In the non-installed >>>>>> case with python versions smaller than needed, would you prefer a >>>>>> pyximport or an error message telling you to install Cython? Because >>>>>> for development you really don't want to install every time. >>>>> >>>>> I think it's fine to require at least Python 2.6 for Cython core >>>>> development. Just the installation (basically, what we test in Jenkins >>>>> anyway) should work in Py2.4 and shouldn't require any rebuilds at runtime. >>>> >>>> Well, sometimes stuff works in say, 2.7 but fails in 2.4. In that case >>>> you really have to test with the failing python versions, which means >>>> you'd have to reinstall every time you want to try the tests again >>>> (this is handled automatically for py3k, which runs the 2to3 tool). >>> >>> The number of times I recently ran tests in Py2.4 myself is really not >>> worth mentioning. Most of the time, when something fails there, the error I >>> get in Jenkins is so obvious that I just commit an untested fix for it. >> >> In my experience that still fails quite often, you may still forget >> some test or accidentally add some whitespace, and then you're going >> to build everything on Jenkins only to realize one hour later that >> something is still broken. Maybe it's because of the buffer >> differences and because 2.4 is the first thing that runs on Jenkins, >> but I test quite often in 2.4. > > That may be the reason. > > Still, how much overhead is it really to run "setup.py build_ext -i"? > Especially compared to getting the exact same compilation overhead with an > import time triggered rebuild? > The compilation overhead is irrelevant, the important difference is that one is *automatic* and hence preferable. Knowing myself, I will forget that module X, Y and Z need to be compiled, which means when I change the code I don't see the changes take effect, because I forgot to rebuild the modules. After a while I will start to question my sanity and insert a print statement at module level, only to release that I had to rebuild after every single change. >>> I think it's really acceptable to require a run of "setup.py build_ext -i" >>> for local developer testing in Py2.4. >>> >>>> I'm also sure many users just clone from git, add the directory to >>>> PYTHONPATH and work from there. >>> >>> I'm sure there are close to no users who try to do that with Py2.4 these >>> days. Maybe there are some who do it with Py2.5, but we are not currently >>> considering to break a plain Python run there, AFAICT. >>> >>> I think the normal way users employ Cython is after a proper installation. >>> >>>> So I guess what I'm saying is, it's fine to mandate compilation at >>>> compile time (don't allow flags to disable compilation), and (for me), >>>> pyximport is totally fine, but Cython must be workable (all >>>> functionality), without needing to install or build, in all versions. >>> >>> Workable, ok. But if fused types are only available in a compiled installed >>> version under Python 2.4, that's maybe not optimal but certainly >>> acceptable. Users of Py2.4 should be used to suffering anyway. >> >> That's a really confusing statement. If they are used to suffering, >> they why can't they bear a runtime-compiled module if they didn't >> install? :) If a user installs she will have at least one compiled >> module in the system, but if she doesn't install, she will also have >> one compiled module. >> >> I'm kind of wondering though, if this is really such a big problem, >> then it means we can also never cache any user JIT-compiled code (like >> e.g. OpenCL)? > > That's a different situation because it's the user's decision to use a > (caching) JIT compiler in the first place. We don't enforce that. > > Basically, what I'm saying is: why do something complicated at runtime when > we can just do everything normally at install time and be done? Once Cython > is installed, it should just work, without needing to rebuild parts of > itself. I'm convinced that that will make it a lot more acceptable for > package maintainers, sys admins and users. > > Keep in mind that the installation under PyPy doesn't compile anything and > just installs the plain Python modules. Enforcing compilation in Py2.4 is > just saying that sys admins of very old systems (which we still want to > support, at least in the generated C code) will no longer get the benefit > of the pure Python installation feature. It doesn't impact the users at > all, that's a feature. > > >>>> That means either not using the with statement, or compiling with >>>> pyximport in certain versions in certain situations only (i.e., only >>>> in incompatible python version in case the user neither built nor >>>> installed Cython). I don't think that's a problem, if people don't >>>> like to have those shared library modules (will they even notice?) in >>>> their user directories, they can install or build Cython. >>> >>> Why require the use of pyximport at runtime when we can do everything >>> during installation? I really don't see an advantage. >> >> We will do everything during installation, but not mandate an installation. > > I think we should in the case at hand. > > >>>> Finally, my attachment to the with statement here is mostly to the >>>> aesthetics of the resulting code, rewriting my pull request is not so >>>> much work, so we can leave that out of consideration here. >>> >>> If it's not too much work, that would obviously make things go a lot smoother. >> >> I'm not sure why we're always making such a fuss over the little >> things, but I suppose I'd prefer mandating a compile in 2.4 over not >> having the with statement. > > Ok. > > >> Unless you want to mandate installing in >> every version, this still means we can't write any parts of the >> compiler in actual Cython code. > > We're doing just that for Py2.4 now by starting to use illegal syntax. > > However, since you're changing topics here already, do you actually see any > advantage in starting to use non-Python code? I don't think that would be a > good idea at all. I think we've gotten along quite happily with plain > Python code for the compiler itself so far and I don't see an interest in > changing that. In particular, I can't see the compiler being in need of any > of its own non-Python language features. I don't currently either, although we do compile the compiler for speed. So I can see use cases where pxd overlays (which are imho one of the worst features of Cython as they suddenly can turn your classes into cdef classes without you being aware when working in the .py file) fall short for more time-consuming parts of the compiler. > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel Anyway, we clearly don't agree, so I suppose I'll change my code as I think installing is worse than writing 2.4-compatible code, and hope to retain the readability. From stefan_ml at behnel.de Mon Apr 30 08:48:15 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 30 Apr 2012 08:48:15 +0200 Subject: [Cython] [cython-users] cimport numpy fails with Python 3 semantics In-Reply-To: References: <15456913.1041.1335634688690.JavaMail.geo-discussion-forums@vbli11> <4F9C3BDC.9040108@behnel.de> Message-ID: <4F9E35AF.6050209@behnel.de> mark florisson, 28.04.2012 21:57: > On 28 April 2012 19:50, Stefan Behnel wrote: >> mark florisson, 28.04.2012 20:33: >>> I think each module should have its own language level, so I think >>> that's a bug. I think the rules should be: >>> >>> - if passed as command line argument, use that for all cimported >>> modules, unless they define their only language level through the >>> directive >>> - if set as a directive, the language level will apply only to that module >> >> That's how it works. We don't run the tests with language level 3 in >> Jenkins because the majority of the tests is not meant to be run with Py3 >> semantics. Maybe it's time to add a numpy_cy3 test. >> >> If there are more problems than just this (which was a bug in numpy.pxd), >> we may consider setting language level 2 explicitly in numpy.pxd. > > Ah, great. Do we have any documentation for that? We do now. ;) However, I'm not sure cimported .pxd files should always inherit the language_level setting. It's somewhat of a grey area because user provided .pxd files would benefit from it since they likely all use the same language level as the main module, whereas the Cython shipped (and otherwise globally installed) .pxd files wouldn't gain anything and could potentially break. I think we may want to keep the current behaviour and set the language level explicitly in the few shipped .pxd files that are not language level agnostic (i.e. those that actually contain code). Stefan From markflorisson88 at gmail.com Mon Apr 30 15:24:20 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 30 Apr 2012 14:24:20 +0100 Subject: [Cython] [cython-users] Conditional import in pure Python mode In-Reply-To: References: Message-ID: On 30 April 2012 13:14, Ian Bell wrote: > > On Sun, Apr 29, 2012 at 10:58 PM, mark florisson > wrote: >> >> On 29 April 2012 01:33, Ian Bell wrote: >> > Hello Cython users, >> > >> > I haven't the foggiest idea how easy this would be to implement, or how >> > to >> > do it in the first place, but my idea is to be able to have some sort of >> > idiom like >> > >> > if cython.compiled: >> > ??? cython.import('from libc.math cimport sin') >> > else: >> > ??? from math import sin >> > >> > that would allow for a pure Python file to still have 100% CPython code >> > in >> > it, but when it goes through the .py+.pxd-->.pyd compilation, use the >> > math.h >> > version instead.? Is there any way that this could work?? I can (and >> > will) >> > hack together a really nasty preprocessor that can edit the .py file at >> > Cython build time, but it is a nasty hack, and very un-Pythonic.? For >> > me, >> > the computational efficiency using the math.h trig functions justify a >> > bit >> > of nastiness. >> > >> > Regards, >> > Ian >> >> I think a cython.cimport would be nice. If you want to have a crack at >> it, you may want to look in Cython/Compiler/ParseTreeTransforms.py at >> the TransformBuiltinMethods transform, and also at the >> Cython/Shadow.py module. > > > Mark, > > Any chance you can give me a wee bit more help?? I can see how you need to > add a cimport function in Shadow.py.? Supposing I pass it an import string > that it will parse for the import, whereto from there?? There's quite a lot > of code to dig through. > > Ian > Certainly, it's great to see some enthusiasm. So first, it's important to determine how you want it. You can either simulate importing the rightmost module from the package, or you can simulate importing everything from the package. Personally I think if you say cython.cimport_module("libc.stdio") you want conceptually the stdio module to be returned. Maybe a 'cimport_star' function could cimport everything from the scope, but I'm not sure how great an idea that is in pure mode (likely not a good one). So lets assume we want to use the following syntax: stdio = cython.cimport_module("libc.stdio"). In the TransformBuiltinMethods you add another case to the visit_SimpleCallNode, which will match for the function name "cimport_module" (which is already known to be used as an attribute of the Cython module). TransformBuiltinMethods is a subclass of Visitor.EnvTransform, which keeps track of the scope (e.g. if the node is in a function, the local function scope, etc). This class is a subclass of CythonTransform, which has a Main.Context set up in Pipeline.py, accessible through the 'self.context' attribute. Main.Context has methods to find and process pxd files, i.e. through the find_pxd_file and the process_pxd methods (or maybe the find_module method will work here). So you want to validate that the string that gets passed to the 'cimport_module' function call is a string literal, and that this is in fact happening from the module scope. You can then create an Entry with an as_module attribute and put it in the current scope. The as_module attribute holds the scope of the cimported pxd file, which means its a "cimported module". You can do this through the 'declare_module' method of the current scope (self.current_env().declare_module(...)) (see Symtab.py for that method). I hope that helps you get set up up somewhat, don't hesitate to bug the cython-devel mailing list with any further questions. BTW, we probably want to formally discuss the exact syntax and semantics before implementing it, so I think it will be a good idea to summarize what you want and how you want it on the cython-dev mailing list. Good luck :) From wesmckinn at gmail.com Mon Apr 30 15:49:32 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 30 Apr 2012 09:49:32 -0400 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On Sun, Apr 29, 2012 at 5:56 AM, mark florisson wrote: > On 29 April 2012 08:42, Nathaniel Smith wrote: >> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson >> wrote: >>> On 28 April 2012 22:04, Nathaniel Smith wrote: >>>> Was chatting with Wes today about the usual problem many of us have >>>> encountered with needing to use some sort of templating system to >>>> generate code handling multiple types, operations, etc., and a wacky >>>> idea occurred to me. So I thought I'd through it out here. >>>> >>>> What if we added a simple macro facility to Cython, that worked at the >>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >>>> Basically some way to write arbitrary Python code into a .pyx file >>>> that gets executed at compile time and can transform the AST, plus >>>> some nice convenience APIs for simple transformations. >>>> >>>> E.g., if we steal the illegal token sequence @@ as our marker, we >>>> could have something like: >>>> >>>> @@ # alone on a line, starts a block of Python code >>>> from Cython.MacroUtil import replace_ctype >>>> def expand_types(placeholder, typelist): >>>> ?def my_decorator(function_name, ast): >>>> ? ?functions = {} >>>> ? ?for typename in typelist: >>>> ? ? ?new_name = "%s_%s" % (function_name, typename) >>>> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >>>> ? ?return functions >>>> ?return function_decorator >>>> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >>>> >>>> # Compile-time function decorator >>>> # Results in two cdef functions named sum_double and sum_int >>>> @@expand_types("T", ["double", "int"]) >>>> cdef T sum(np.ndarray[T] arr): >>>> ?cdef T start = 0; >>>> ?for i in range(arr.size): >>>> ? ?start += arr[i] >>>> ?return start >>>> >>>> I don't know if this is a good idea, but it seems like it'd be very >>>> easy to do on the Cython side, fairly clean, and be dramatically less >>>> horrible than all the ad-hoc templating stuff people do now. >>>> Presumably there'd be strict limits on how much backwards >>>> compatibility we'd be willing to guarantee for code that went poking >>>> around in the AST by hand, but a small handful of functions like my >>>> notional "replace_ctype" would go a long way, and wouldn't impose much >>>> of a compatibility burden. >>>> >>>> -- Nathaniel >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> >>> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? >>> >>> In general I would like better meta-programming support, maybe even >>> allow defining new operators (although I'm not sure any of it is very >>> pythonic), but for templates I think fused types should be used, or >>> improved when they fall short. Maybe a plugin system could also help >>> people. >> >> I hadn't seen that, no -- thanks for the link. >> >> I have to say that the examples in that link, though, give me the >> impression of a cool solution looking for a problem. I've never wished >> I could symbolically differentiate Python expressions at compile time, >> or create a mutant Python+SQL hybrid language. Actually I guess I've >> only missed define-syntax once in maybe 10 years of hacking in >> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it >> will peek at the caller's syntax tree to automagically label the axes >> as "x" and "log(y)", and that can't be done in Python. But that's not >> exactly a convincing argument for a macro system. >> >> But generating optimized code is Cython's whole selling point, and >> people really are doing klugey tricks with string-based preprocessors >> just to generate multiple copies of loops in Cython and C. >> >> Also, fused types are great, but: (1) IIUC you can't actually do >> ndarray[fused_type] yet, which speaks to the feature's complexity, and > > What? Yes you can do that. I haven't been able to get ndarray[fused_t] to work as we've discussed off-list. In your own words "Unfortunately, the automatic buffer dispatch didn't make it into 0.16, so you need to manually specialize". I'm a bit hamstrung by other users needing to be able to compile pandas using the latest released Cython. >> (2) to handle Wes's original example on his blog (duplicating a bunch >> of code between a "sum" path and a "product" path), you'd actually >> need something like "fused operators", which aren't even on the >> horizon. So it seems unlikely that fused types will grow to cover all >> these cases in the near future. > > Although it doesn't handle contiguity or dimensional differences, > currently the efficient fused operator is a function pointer. Wouldn't > passing in a float64_t (*reducer)(float64_t, float64_t) work in this > case (in the face of multiple types, you can have fused parameters in > the function pointer as well)? I have to think that using function pointers everywhere is going to lose out to "inlined" C. Maybe gcc is smart enough to optimize observed code paths. In other words, you don't want for (i = 0; i < nlabels; i++) { lab = labels[i]; result[lab] = reducer(sumx[i], data[i]); } when you can have for (i = 0; i < nlabels; i++) { lab = labels[i]; result[lab] = sumx[i] + data[i]; } I guess I should start writing some C code and actually measuring the performance gap as I might completely be off-base here; what you want eventually is to look at the array data graph and "rewrite" it to better leverage data parallelism (parallelize pieces that you can) and cache efficiency. The bigger problem with all this is that I want to avoid an accumulation of ad hoc solutions. > > I agree with Dag that Julia has nice metaprogramming support, maybe > functions could take arbitrary compile time expressions as extra > arguments. > >> Of course some experimentation would be needed to find the right >> syntax and convenience functions for this feature too, so maybe I'm >> just being over-optimistic and it would also turn out to be very >> complicated :-). But it seems like some simple AST search/replace >> functions would get you a long way. >> >> - N >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From stefan_ml at behnel.de Mon Apr 30 15:55:18 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 30 Apr 2012 15:55:18 +0200 Subject: [Cython] [cython-users] Conditional import in pure Python mode In-Reply-To: References: Message-ID: <4F9E99C6.1010703@behnel.de> mark florisson, 30.04.2012 15:24: > So lets assume we want to use the following syntax: stdio = > cython.cimport_module("libc.stdio"). > > In the TransformBuiltinMethods you add another case to the > visit_SimpleCallNode That seems way too late in the pipeline to me (see Pipeline.py). I think this is better (although maybe still not perfectly) suited for "InterpretCompilerDirectives". > we probably want to formally discuss the exact syntax and semantics > before implementing it, so I think it will be a good idea to summarize > what you want and how you want it on the cython-dev mailing list. Absolutely. Stefan From markflorisson88 at gmail.com Mon Apr 30 17:10:31 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 30 Apr 2012 16:10:31 +0100 Subject: [Cython] [cython-users] Conditional import in pure Python mode In-Reply-To: <4F9E99C6.1010703@behnel.de> References: <4F9E99C6.1010703@behnel.de> Message-ID: On 30 April 2012 14:55, Stefan Behnel wrote: > mark florisson, 30.04.2012 15:24: >> So lets assume we want to use the following syntax: stdio = >> cython.cimport_module("libc.stdio"). >> >> In the TransformBuiltinMethods you add another case to the >> visit_SimpleCallNode > > That seems way too late in the pipeline to me (see Pipeline.py). I think > this is better (although maybe still not perfectly) suited for > "InterpretCompilerDirectives". > Ah good point, it seems to run pretty late. It needs to be before the declarations are analyzed. >> we probably want to formally discuss the exact syntax and semantics >> before implementing it, so I think it will be a good idea to summarize >> what you want and how you want it on the cython-dev mailing list. > > Absolutely. > > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Mon Apr 30 17:19:59 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 30 Apr 2012 16:19:59 +0100 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On 30 April 2012 14:49, Wes McKinney wrote: > On Sun, Apr 29, 2012 at 5:56 AM, mark florisson > wrote: >> On 29 April 2012 08:42, Nathaniel Smith wrote: >>> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson >>> wrote: >>>> On 28 April 2012 22:04, Nathaniel Smith wrote: >>>>> Was chatting with Wes today about the usual problem many of us have >>>>> encountered with needing to use some sort of templating system to >>>>> generate code handling multiple types, operations, etc., and a wacky >>>>> idea occurred to me. So I thought I'd through it out here. >>>>> >>>>> What if we added a simple macro facility to Cython, that worked at the >>>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >>>>> Basically some way to write arbitrary Python code into a .pyx file >>>>> that gets executed at compile time and can transform the AST, plus >>>>> some nice convenience APIs for simple transformations. >>>>> >>>>> E.g., if we steal the illegal token sequence @@ as our marker, we >>>>> could have something like: >>>>> >>>>> @@ # alone on a line, starts a block of Python code >>>>> from Cython.MacroUtil import replace_ctype >>>>> def expand_types(placeholder, typelist): >>>>> ?def my_decorator(function_name, ast): >>>>> ? ?functions = {} >>>>> ? ?for typename in typelist: >>>>> ? ? ?new_name = "%s_%s" % (function_name, typename) >>>>> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >>>>> ? ?return functions >>>>> ?return function_decorator >>>>> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >>>>> >>>>> # Compile-time function decorator >>>>> # Results in two cdef functions named sum_double and sum_int >>>>> @@expand_types("T", ["double", "int"]) >>>>> cdef T sum(np.ndarray[T] arr): >>>>> ?cdef T start = 0; >>>>> ?for i in range(arr.size): >>>>> ? ?start += arr[i] >>>>> ?return start >>>>> >>>>> I don't know if this is a good idea, but it seems like it'd be very >>>>> easy to do on the Cython side, fairly clean, and be dramatically less >>>>> horrible than all the ad-hoc templating stuff people do now. >>>>> Presumably there'd be strict limits on how much backwards >>>>> compatibility we'd be willing to guarantee for code that went poking >>>>> around in the AST by hand, but a small handful of functions like my >>>>> notional "replace_ctype" would go a long way, and wouldn't impose much >>>>> of a compatibility burden. >>>>> >>>>> -- Nathaniel >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? >>>> >>>> In general I would like better meta-programming support, maybe even >>>> allow defining new operators (although I'm not sure any of it is very >>>> pythonic), but for templates I think fused types should be used, or >>>> improved when they fall short. Maybe a plugin system could also help >>>> people. >>> >>> I hadn't seen that, no -- thanks for the link. >>> >>> I have to say that the examples in that link, though, give me the >>> impression of a cool solution looking for a problem. I've never wished >>> I could symbolically differentiate Python expressions at compile time, >>> or create a mutant Python+SQL hybrid language. Actually I guess I've >>> only missed define-syntax once in maybe 10 years of hacking in >>> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it >>> will peek at the caller's syntax tree to automagically label the axes >>> as "x" and "log(y)", and that can't be done in Python. But that's not >>> exactly a convincing argument for a macro system. >>> >>> But generating optimized code is Cython's whole selling point, and >>> people really are doing klugey tricks with string-based preprocessors >>> just to generate multiple copies of loops in Cython and C. >>> >>> Also, fused types are great, but: (1) IIUC you can't actually do >>> ndarray[fused_type] yet, which speaks to the feature's complexity, and >> >> What? Yes you can do that. > > I haven't been able to get ndarray[fused_t] to work as we've discussed > off-list. In your own words "Unfortunately, the automatic buffer > dispatch didn't make it into 0.16, so you need to manually > specialize". I'm a bit hamstrung by other users needing to be able to > compile pandas using the latest released Cython. Well, as I said, it does work, but you need to tell Cython which type you meant. If you don't want to do that, you have to use this branch: https://github.com/markflorisson88/cython/tree/_fused_dispatch_rebased . This never made it in since we had no consensus on whether to allow the compiler to bootstrap itself and because of possible immaturity of the branch. So what doesn't work is automatic dispatch for Python functions (def functions and the object version of a cpdef function). They don't automatically select the right specialization for buffer arguments. Anything else should work, otherwise it's a bug. Note also that figuring out which specialization to call dynamically (i.e. not from Cython space at compile time, but from Python space at runtime) has non-trivial overhead on top of just argument unpacking. But you can't say "doesn't work" without giving a concrete example of what doesn't work besides automatic dispatch, and how it fails. >>> (2) to handle Wes's original example on his blog (duplicating a bunch >>> of code between a "sum" path and a "product" path), you'd actually >>> need something like "fused operators", which aren't even on the >>> horizon. So it seems unlikely that fused types will grow to cover all >>> these cases in the near future. >> >> Although it doesn't handle contiguity or dimensional differences, >> currently the efficient fused operator is a function pointer. Wouldn't >> passing in a float64_t (*reducer)(float64_t, float64_t) work in this >> case (in the face of multiple types, you can have fused parameters in >> the function pointer as well)? > > I have to think that using function pointers everywhere is going to > lose out to "inlined" C. Maybe gcc is smart enough to optimize > observed code paths. In other words, you don't want > > for (i = 0; i < nlabels; i++) { > ? ?lab = labels[i]; > ? ?result[lab] = reducer(sumx[i], data[i]); > } > > when you can have > > for (i = 0; i < nlabels; i++) { > ? ?lab = labels[i]; > ? ?result[lab] = sumx[i] + data[i]; > } > > I guess I should start writing some C code and actually measuring the > performance gap as I might completely be off-base here; what you want > eventually is to look at the array data graph and "rewrite" it to > better leverage data parallelism (parallelize pieces that you can) and > cache efficiency. > > The bigger problem with all this is that I want to avoid an > accumulation of ad hoc solutions. It's probably a good idea to check the overhead first, but I agree it will likely be non-trivial. In that sense, I would like something similar to julia like def func(runtime_arguments, etc, $compile_time_expr): ... use $compile_time_expr(sumx[i], data[i]) # or maybe cython.ceval(compile_time_expr, {'op1': ..., 'op2': ...}) Maybe such functions should only be restricted to Cython space, though. Fused types weren't designed to handle this in any way, they are only there to support different types, not operators. If you would have objects you could give them all different methods, so this is really rather somewhat of a special case. >> >> I agree with Dag that Julia has nice metaprogramming support, maybe >> functions could take arbitrary compile time expressions as extra >> arguments. >> >>> Of course some experimentation would be needed to find the right >>> syntax and convenience functions for this feature too, so maybe I'm >>> just being over-optimistic and it would also turn out to be very >>> complicated :-). But it seems like some simple AST search/replace >>> functions would get you a long way. >>> >>> - N >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From markflorisson88 at gmail.com Mon Apr 30 17:22:05 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Mon, 30 Apr 2012 16:22:05 +0100 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On 30 April 2012 14:49, Wes McKinney wrote: > On Sun, Apr 29, 2012 at 5:56 AM, mark florisson > wrote: >> On 29 April 2012 08:42, Nathaniel Smith wrote: >>> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson >>> wrote: >>>> On 28 April 2012 22:04, Nathaniel Smith wrote: >>>>> Was chatting with Wes today about the usual problem many of us have >>>>> encountered with needing to use some sort of templating system to >>>>> generate code handling multiple types, operations, etc., and a wacky >>>>> idea occurred to me. So I thought I'd through it out here. >>>>> >>>>> What if we added a simple macro facility to Cython, that worked at the >>>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >>>>> Basically some way to write arbitrary Python code into a .pyx file >>>>> that gets executed at compile time and can transform the AST, plus >>>>> some nice convenience APIs for simple transformations. >>>>> >>>>> E.g., if we steal the illegal token sequence @@ as our marker, we >>>>> could have something like: >>>>> >>>>> @@ # alone on a line, starts a block of Python code >>>>> from Cython.MacroUtil import replace_ctype >>>>> def expand_types(placeholder, typelist): >>>>> ?def my_decorator(function_name, ast): >>>>> ? ?functions = {} >>>>> ? ?for typename in typelist: >>>>> ? ? ?new_name = "%s_%s" % (function_name, typename) >>>>> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >>>>> ? ?return functions >>>>> ?return function_decorator >>>>> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >>>>> >>>>> # Compile-time function decorator >>>>> # Results in two cdef functions named sum_double and sum_int >>>>> @@expand_types("T", ["double", "int"]) >>>>> cdef T sum(np.ndarray[T] arr): >>>>> ?cdef T start = 0; >>>>> ?for i in range(arr.size): >>>>> ? ?start += arr[i] >>>>> ?return start >>>>> >>>>> I don't know if this is a good idea, but it seems like it'd be very >>>>> easy to do on the Cython side, fairly clean, and be dramatically less >>>>> horrible than all the ad-hoc templating stuff people do now. >>>>> Presumably there'd be strict limits on how much backwards >>>>> compatibility we'd be willing to guarantee for code that went poking >>>>> around in the AST by hand, but a small handful of functions like my >>>>> notional "replace_ctype" would go a long way, and wouldn't impose much >>>>> of a compatibility burden. >>>>> >>>>> -- Nathaniel >>>>> _______________________________________________ >>>>> cython-devel mailing list >>>>> cython-devel at python.org >>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? >>>> >>>> In general I would like better meta-programming support, maybe even >>>> allow defining new operators (although I'm not sure any of it is very >>>> pythonic), but for templates I think fused types should be used, or >>>> improved when they fall short. Maybe a plugin system could also help >>>> people. >>> >>> I hadn't seen that, no -- thanks for the link. >>> >>> I have to say that the examples in that link, though, give me the >>> impression of a cool solution looking for a problem. I've never wished >>> I could symbolically differentiate Python expressions at compile time, >>> or create a mutant Python+SQL hybrid language. Actually I guess I've >>> only missed define-syntax once in maybe 10 years of hacking in >>> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it >>> will peek at the caller's syntax tree to automagically label the axes >>> as "x" and "log(y)", and that can't be done in Python. But that's not >>> exactly a convincing argument for a macro system. >>> >>> But generating optimized code is Cython's whole selling point, and >>> people really are doing klugey tricks with string-based preprocessors >>> just to generate multiple copies of loops in Cython and C. >>> >>> Also, fused types are great, but: (1) IIUC you can't actually do >>> ndarray[fused_type] yet, which speaks to the feature's complexity, and >> >> What? Yes you can do that. > > I haven't been able to get ndarray[fused_t] to work as we've discussed > off-list. In your own words "Unfortunately, the automatic buffer > dispatch didn't make it into 0.16, so you need to manually > specialize". I'm a bit hamstrung by other users needing to be able to > compile pandas using the latest released Cython. Also, if something doesn't work in 0.16, please report it. We reverted the numpy changes that broke pandas, so if there are other changes that break stuff, please inform us. >>> (2) to handle Wes's original example on his blog (duplicating a bunch >>> of code between a "sum" path and a "product" path), you'd actually >>> need something like "fused operators", which aren't even on the >>> horizon. So it seems unlikely that fused types will grow to cover all >>> these cases in the near future. >> >> Although it doesn't handle contiguity or dimensional differences, >> currently the efficient fused operator is a function pointer. Wouldn't >> passing in a float64_t (*reducer)(float64_t, float64_t) work in this >> case (in the face of multiple types, you can have fused parameters in >> the function pointer as well)? > > I have to think that using function pointers everywhere is going to > lose out to "inlined" C. Maybe gcc is smart enough to optimize > observed code paths. In other words, you don't want > > for (i = 0; i < nlabels; i++) { > ? ?lab = labels[i]; > ? ?result[lab] = reducer(sumx[i], data[i]); > } > > when you can have > > for (i = 0; i < nlabels; i++) { > ? ?lab = labels[i]; > ? ?result[lab] = sumx[i] + data[i]; > } > > I guess I should start writing some C code and actually measuring the > performance gap as I might completely be off-base here; what you want > eventually is to look at the array data graph and "rewrite" it to > better leverage data parallelism (parallelize pieces that you can) and > cache efficiency. > > The bigger problem with all this is that I want to avoid an > accumulation of ad hoc solutions. > >> >> I agree with Dag that Julia has nice metaprogramming support, maybe >> functions could take arbitrary compile time expressions as extra >> arguments. >> >>> Of course some experimentation would be needed to find the right >>> syntax and convenience functions for this feature too, so maybe I'm >>> just being over-optimistic and it would also turn out to be very >>> complicated :-). But it seems like some simple AST search/replace >>> functions would get you a long way. >>> >>> - N >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From wesmckinn at gmail.com Mon Apr 30 18:30:32 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 30 Apr 2012 12:30:32 -0400 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: On Mon, Apr 30, 2012 at 11:19 AM, mark florisson wrote: > On 30 April 2012 14:49, Wes McKinney wrote: >> On Sun, Apr 29, 2012 at 5:56 AM, mark florisson >> wrote: >>> On 29 April 2012 08:42, Nathaniel Smith wrote: >>>> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson >>>> wrote: >>>>> On 28 April 2012 22:04, Nathaniel Smith wrote: >>>>>> Was chatting with Wes today about the usual problem many of us have >>>>>> encountered with needing to use some sort of templating system to >>>>>> generate code handling multiple types, operations, etc., and a wacky >>>>>> idea occurred to me. So I thought I'd through it out here. >>>>>> >>>>>> What if we added a simple macro facility to Cython, that worked at the >>>>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >>>>>> Basically some way to write arbitrary Python code into a .pyx file >>>>>> that gets executed at compile time and can transform the AST, plus >>>>>> some nice convenience APIs for simple transformations. >>>>>> >>>>>> E.g., if we steal the illegal token sequence @@ as our marker, we >>>>>> could have something like: >>>>>> >>>>>> @@ # alone on a line, starts a block of Python code >>>>>> from Cython.MacroUtil import replace_ctype >>>>>> def expand_types(placeholder, typelist): >>>>>> ?def my_decorator(function_name, ast): >>>>>> ? ?functions = {} >>>>>> ? ?for typename in typelist: >>>>>> ? ? ?new_name = "%s_%s" % (function_name, typename) >>>>>> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >>>>>> ? ?return functions >>>>>> ?return function_decorator >>>>>> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >>>>>> >>>>>> # Compile-time function decorator >>>>>> # Results in two cdef functions named sum_double and sum_int >>>>>> @@expand_types("T", ["double", "int"]) >>>>>> cdef T sum(np.ndarray[T] arr): >>>>>> ?cdef T start = 0; >>>>>> ?for i in range(arr.size): >>>>>> ? ?start += arr[i] >>>>>> ?return start >>>>>> >>>>>> I don't know if this is a good idea, but it seems like it'd be very >>>>>> easy to do on the Cython side, fairly clean, and be dramatically less >>>>>> horrible than all the ad-hoc templating stuff people do now. >>>>>> Presumably there'd be strict limits on how much backwards >>>>>> compatibility we'd be willing to guarantee for code that went poking >>>>>> around in the AST by hand, but a small handful of functions like my >>>>>> notional "replace_ctype" would go a long way, and wouldn't impose much >>>>>> of a compatibility burden. >>>>>> >>>>>> -- Nathaniel >>>>>> _______________________________________________ >>>>>> cython-devel mailing list >>>>>> cython-devel at python.org >>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>> >>>>> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? >>>>> >>>>> In general I would like better meta-programming support, maybe even >>>>> allow defining new operators (although I'm not sure any of it is very >>>>> pythonic), but for templates I think fused types should be used, or >>>>> improved when they fall short. Maybe a plugin system could also help >>>>> people. >>>> >>>> I hadn't seen that, no -- thanks for the link. >>>> >>>> I have to say that the examples in that link, though, give me the >>>> impression of a cool solution looking for a problem. I've never wished >>>> I could symbolically differentiate Python expressions at compile time, >>>> or create a mutant Python+SQL hybrid language. Actually I guess I've >>>> only missed define-syntax once in maybe 10 years of hacking in >>>> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it >>>> will peek at the caller's syntax tree to automagically label the axes >>>> as "x" and "log(y)", and that can't be done in Python. But that's not >>>> exactly a convincing argument for a macro system. >>>> >>>> But generating optimized code is Cython's whole selling point, and >>>> people really are doing klugey tricks with string-based preprocessors >>>> just to generate multiple copies of loops in Cython and C. >>>> >>>> Also, fused types are great, but: (1) IIUC you can't actually do >>>> ndarray[fused_type] yet, which speaks to the feature's complexity, and >>> >>> What? Yes you can do that. >> >> I haven't been able to get ndarray[fused_t] to work as we've discussed >> off-list. In your own words "Unfortunately, the automatic buffer >> dispatch didn't make it into 0.16, so you need to manually >> specialize". I'm a bit hamstrung by other users needing to be able to >> compile pandas using the latest released Cython. > > Well, as I said, it does work, but you need to tell Cython which type > you meant. If you don't want to do that, you have to use this branch: > https://github.com/markflorisson88/cython/tree/_fused_dispatch_rebased > . This never made it in since we had no consensus on whether to allow > the compiler to bootstrap itself and because of possible immaturity of > the branch. > > So what doesn't work is automatic dispatch for Python functions (def > functions and the object version of a cpdef function). They don't > automatically select the right specialization for buffer arguments. > Anything else should work, otherwise it's a bug. > > Note also that figuring out which specialization to call dynamically > (i.e. not from Cython space at compile time, but from Python space at > runtime) has non-trivial overhead on top of just argument unpacking. > But you can't say "doesn't work" without giving a concrete example of > what doesn't work besides automatic dispatch, and how it fails. > Sorry, I meant automatic dispatch re "doesn't work" and want to reiterate how much I appreciate the work you're doing. To give some context, my code is riddled with stuff like this: lib.inner_join_indexer_float64 lib.inner_join_indexer_int32 lib.inner_join_indexer_int64 lib.inner_join_indexer_object where the only difference between these functions is the type of the buffer in the two arrays passed in. I have a template string for these functions that looks like this: inner_join_template = """@cython.wraparound(False) @cython.boundscheck(False) def inner_join_indexer_%(name)s(ndarray[%(c_type)s] left, ndarray[%(c_type)s] right): ''' ... I would _love_ to replace this with fused types. In any case, lately I've been sort of yearning for the kinds of things you can do with an APL-variant like J. Like here's a groupby in J: labels 1 1 2 2 2 3 1 data 3 4 5.5 6 7.5 _2 8.3 labels >>> (2) to handle Wes's original example on his blog (duplicating a bunch >>>> of code between a "sum" path and a "product" path), you'd actually >>>> need something like "fused operators", which aren't even on the >>>> horizon. So it seems unlikely that fused types will grow to cover all >>>> these cases in the near future. >>> >>> Although it doesn't handle contiguity or dimensional differences, >>> currently the efficient fused operator is a function pointer. Wouldn't >>> passing in a float64_t (*reducer)(float64_t, float64_t) work in this >>> case (in the face of multiple types, you can have fused parameters in >>> the function pointer as well)? >> >> I have to think that using function pointers everywhere is going to >> lose out to "inlined" C. Maybe gcc is smart enough to optimize >> observed code paths. In other words, you don't want >> >> for (i = 0; i < nlabels; i++) { >> ? ?lab = labels[i]; >> ? ?result[lab] = reducer(sumx[i], data[i]); >> } >> >> when you can have >> >> for (i = 0; i < nlabels; i++) { >> ? ?lab = labels[i]; >> ? ?result[lab] = sumx[i] + data[i]; >> } >> >> I guess I should start writing some C code and actually measuring the >> performance gap as I might completely be off-base here; what you want >> eventually is to look at the array data graph and "rewrite" it to >> better leverage data parallelism (parallelize pieces that you can) and >> cache efficiency. >> >> The bigger problem with all this is that I want to avoid an >> accumulation of ad hoc solutions. > > It's probably a good idea to check the overhead first, but I agree it > will likely be non-trivial. In that sense, I would like something > similar to julia like > > def ?func(runtime_arguments, etc, $compile_time_expr): > ? ?... > ? ? ? ? use $compile_time_expr(sumx[i], data[i]) # or maybe > cython.ceval(compile_time_expr, {'op1': ..., 'op2': ...}) > > Maybe such functions should only be restricted to Cython space, > though. Fused types weren't designed to handle this in any way, they > are only there to support different types, not operators. If you would > have objects you could give them all different methods, so this is > really rather somewhat of a special case. > >>> >>> I agree with Dag that Julia has nice metaprogramming support, maybe >>> functions could take arbitrary compile time expressions as extra >>> arguments. >>> >>>> Of course some experimentation would be needed to find the right >>>> syntax and convenience functions for this feature too, so maybe I'm >>>> just being over-optimistic and it would also turn out to be very >>>> complicated :-). But it seems like some simple AST search/replace >>>> functions would get you a long way. >>>> >>>> - N >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel at python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel From d.s.seljebotn at astro.uio.no Mon Apr 30 22:49:01 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 30 Apr 2012 22:49:01 +0200 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: Message-ID: <4F9EFABD.5080408@astro.uio.no> On 04/30/2012 06:30 PM, Wes McKinney wrote: > On Mon, Apr 30, 2012 at 11:19 AM, mark florisson > wrote: >> On 30 April 2012 14:49, Wes McKinney wrote: >>> On Sun, Apr 29, 2012 at 5:56 AM, mark florisson >>> wrote: >>>> On 29 April 2012 08:42, Nathaniel Smith wrote: >>>>> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson >>>>> wrote: >>>>>> On 28 April 2012 22:04, Nathaniel Smith wrote: >>>>>>> Was chatting with Wes today about the usual problem many of us have >>>>>>> encountered with needing to use some sort of templating system to >>>>>>> generate code handling multiple types, operations, etc., and a wacky >>>>>>> idea occurred to me. So I thought I'd through it out here. >>>>>>> >>>>>>> What if we added a simple macro facility to Cython, that worked at the >>>>>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style macros.) >>>>>>> Basically some way to write arbitrary Python code into a .pyx file >>>>>>> that gets executed at compile time and can transform the AST, plus >>>>>>> some nice convenience APIs for simple transformations. >>>>>>> >>>>>>> E.g., if we steal the illegal token sequence @@ as our marker, we >>>>>>> could have something like: >>>>>>> >>>>>>> @@ # alone on a line, starts a block of Python code >>>>>>> from Cython.MacroUtil import replace_ctype >>>>>>> def expand_types(placeholder, typelist): >>>>>>> def my_decorator(function_name, ast): >>>>>>> functions = {} >>>>>>> for typename in typelist: >>>>>>> new_name = "%s_%s" % (function_name, typename) >>>>>>> functions[name] = replace_ctype(ast, placeholder, typename) >>>>>>> return functions >>>>>>> return function_decorator >>>>>>> @@ # this token sequence cannot occur in Python, so it's a safe end-marker >>>>>>> >>>>>>> # Compile-time function decorator >>>>>>> # Results in two cdef functions named sum_double and sum_int >>>>>>> @@expand_types("T", ["double", "int"]) >>>>>>> cdef T sum(np.ndarray[T] arr): >>>>>>> cdef T start = 0; >>>>>>> for i in range(arr.size): >>>>>>> start += arr[i] >>>>>>> return start >>>>>>> >>>>>>> I don't know if this is a good idea, but it seems like it'd be very >>>>>>> easy to do on the Cython side, fairly clean, and be dramatically less >>>>>>> horrible than all the ad-hoc templating stuff people do now. >>>>>>> Presumably there'd be strict limits on how much backwards >>>>>>> compatibility we'd be willing to guarantee for code that went poking >>>>>>> around in the AST by hand, but a small handful of functions like my >>>>>>> notional "replace_ctype" would go a long way, and wouldn't impose much >>>>>>> of a compatibility burden. >>>>>>> >>>>>>> -- Nathaniel >>>>>>> _______________________________________________ >>>>>>> cython-devel mailing list >>>>>>> cython-devel at python.org >>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>> >>>>>> Have you looked at http://wiki.cython.org/enhancements/metaprogramming ? >>>>>> >>>>>> In general I would like better meta-programming support, maybe even >>>>>> allow defining new operators (although I'm not sure any of it is very >>>>>> pythonic), but for templates I think fused types should be used, or >>>>>> improved when they fall short. Maybe a plugin system could also help >>>>>> people. >>>>> >>>>> I hadn't seen that, no -- thanks for the link. >>>>> >>>>> I have to say that the examples in that link, though, give me the >>>>> impression of a cool solution looking for a problem. I've never wished >>>>> I could symbolically differentiate Python expressions at compile time, >>>>> or create a mutant Python+SQL hybrid language. Actually I guess I've >>>>> only missed define-syntax once in maybe 10 years of hacking in >>>>> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it >>>>> will peek at the caller's syntax tree to automagically label the axes >>>>> as "x" and "log(y)", and that can't be done in Python. But that's not >>>>> exactly a convincing argument for a macro system. >>>>> >>>>> But generating optimized code is Cython's whole selling point, and >>>>> people really are doing klugey tricks with string-based preprocessors >>>>> just to generate multiple copies of loops in Cython and C. >>>>> >>>>> Also, fused types are great, but: (1) IIUC you can't actually do >>>>> ndarray[fused_type] yet, which speaks to the feature's complexity, and >>>> >>>> What? Yes you can do that. >>> >>> I haven't been able to get ndarray[fused_t] to work as we've discussed >>> off-list. In your own words "Unfortunately, the automatic buffer >>> dispatch didn't make it into 0.16, so you need to manually >>> specialize". I'm a bit hamstrung by other users needing to be able to >>> compile pandas using the latest released Cython. >> >> Well, as I said, it does work, but you need to tell Cython which type >> you meant. If you don't want to do that, you have to use this branch: >> https://github.com/markflorisson88/cython/tree/_fused_dispatch_rebased >> . This never made it in since we had no consensus on whether to allow >> the compiler to bootstrap itself and because of possible immaturity of >> the branch. >> >> So what doesn't work is automatic dispatch for Python functions (def >> functions and the object version of a cpdef function). They don't >> automatically select the right specialization for buffer arguments. >> Anything else should work, otherwise it's a bug. >> >> Note also that figuring out which specialization to call dynamically >> (i.e. not from Cython space at compile time, but from Python space at >> runtime) has non-trivial overhead on top of just argument unpacking. >> But you can't say "doesn't work" without giving a concrete example of >> what doesn't work besides automatic dispatch, and how it fails. >> > > Sorry, I meant automatic dispatch re "doesn't work" and want to > reiterate how much I appreciate the work you're doing. To give some > context, my code is riddled with stuff like this: > > lib.inner_join_indexer_float64 > lib.inner_join_indexer_int32 > lib.inner_join_indexer_int64 > lib.inner_join_indexer_object > > where the only difference between these functions is the type of the > buffer in the two arrays passed in. I have a template string for these > functions that looks like this: > > inner_join_template = """@cython.wraparound(False) > @cython.boundscheck(False) > def inner_join_indexer_%(name)s(ndarray[%(c_type)s] left, > ndarray[%(c_type)s] right): > ''' > ... > > I would _love_ to replace this with fused types. > > In any case, lately I've been sort of yearning for the kinds of things > you can do with an APL-variant like J. Like here's a groupby in J: > > labels > 1 1 2 2 2 3 1 > data > 3 4 5.5 6 7.5 _2 8.3 > labels ?????????????????????? > ?3 4 8.3?5.5 6 7.5?_2? > ?????????????????????? > > Here< is box and /. is categorize. > > Replacing the box< operator with +/ (sum), I get the group sums: > > labels +/ /. data > 15.3 19 _2 > > Have 2-dimensional data? > > data > 0 1 2 3 4 5 6 > 7 8 9 10 11 12 13 > 14 15 16 17 18 19 20 > 21 22 23 24 25 26 27 > 28 29 30 31 32 33 34 > 35 36 37 38 39 40 41 > 42 43 44 45 46 47 48 > labels ?????????????????????? > ?0 1 6 ?2 3 4 ?5 ? > ?????????????????????? > ?7 8 13 ?9 10 11 ?12? > ?????????????????????? > ?14 15 20?16 17 18?19? > ?????????????????????? > ?21 22 27?23 24 25?26? > ?????????????????????? > ?28 29 34?30 31 32?33? > ?????????????????????? > ?35 36 41?37 38 39?40? > ?????????????????????? > ?42 43 48?44 45 46?47? > ?????????????????????? > > labels +//."1 data > 7 9 5 > 28 30 12 > 49 51 19 > 70 72 26 > 91 93 33 > 112 114 40 > 133 135 47 > > However, J and other APLs are interpreted. If you generate C or > JIT-compile I think you can do really well performance wise and have > very expressive code for writing data algorithms without all this > boilerplate. I know how you feel. On one hand I really like metaprogramming; on the other hand I think it is very difficult to get right when done compile-time (just look at C++ -- I've heard that D is a bit better though I didn't really try it). JIT is really the way to go. It is one thing that a JIT could optimize the case where you pass a callback to a function and inline it run-time. But even if it doesn't get that fancy, it'd be great to just be able to write something like "cython.eval(s)" and have that be compiled (I guess you could do that now, but the sheer overhead of the C compiler and all the .so files involved means nobody would sanely use that as the main way of stringing together something like pandas). I think that's really the way to get "Pythonic" metaprogramming, where you mix and match runtime and compile-time, and can hook into arbitrary Python code in the meta-programming step. Guido may even accept some syntax hooks to more easily express macros without resorting to strings, if I got the right impression at PyData on his take on DSLs. Without a JIT, all we seem to come up with will be kludges. So I'm sceptical about more metaprogramming features in Cython now since it will be outdated in five years -- by then, somebody will either have gotten our stuff together and JIT-ed both Python and Cython, or we'll all be using something else (like R or Julia). Dag From njs at pobox.com Mon Apr 30 22:55:15 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 30 Apr 2012 21:55:15 +0100 Subject: [Cython] Wacky idea: proper macros In-Reply-To: <4F9EFABD.5080408@astro.uio.no> References: <4F9EFABD.5080408@astro.uio.no> Message-ID: On Mon, Apr 30, 2012 at 9:49 PM, Dag Sverre Seljebotn wrote: > JIT is really the way to go. It is one thing that a JIT could optimize the > case where you pass a callback to a function and inline it run-time. But > even if it doesn't get that fancy, it'd be great to just be able to write > something like "cython.eval(s)" and have that be compiled (I guess you could > do that now, but the sheer overhead of the C compiler and all the .so files > involved means nobody would sanely use that as the main way of stringing > together something like pandas). The overhead of running a fully optimizing compiler over pandas on every import is pretty high, though. You can come up with various caching mechanisms, but they all mean introducing some kind of compile time/run time distinction. So I'm skeptical we'll just be able to get rid of that concept, even in a brave new LLVM/PyPy/Julia world. -- Nathaniel From wesmckinn at gmail.com Mon Apr 30 22:56:13 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 30 Apr 2012 16:56:13 -0400 Subject: [Cython] Wacky idea: proper macros In-Reply-To: <4F9EFABD.5080408@astro.uio.no> References: <4F9EFABD.5080408@astro.uio.no> Message-ID: On Mon, Apr 30, 2012 at 4:49 PM, Dag Sverre Seljebotn wrote: > On 04/30/2012 06:30 PM, Wes McKinney wrote: >> >> On Mon, Apr 30, 2012 at 11:19 AM, mark florisson >> ?wrote: >>> >>> On 30 April 2012 14:49, Wes McKinney ?wrote: >>>> >>>> On Sun, Apr 29, 2012 at 5:56 AM, mark florisson >>>> ?wrote: >>>>> >>>>> On 29 April 2012 08:42, Nathaniel Smith ?wrote: >>>>>> >>>>>> On Sat, Apr 28, 2012 at 10:25 PM, mark florisson >>>>>> ?wrote: >>>>>>> >>>>>>> On 28 April 2012 22:04, Nathaniel Smith ?wrote: >>>>>>>> >>>>>>>> Was chatting with Wes today about the usual problem many of us have >>>>>>>> encountered with needing to use some sort of templating system to >>>>>>>> generate code handling multiple types, operations, etc., and a wacky >>>>>>>> idea occurred to me. So I thought I'd through it out here. >>>>>>>> >>>>>>>> What if we added a simple macro facility to Cython, that worked at >>>>>>>> the >>>>>>>> AST level? (I.e. I'm talking lisp-style macros, *not* C-style >>>>>>>> macros.) >>>>>>>> Basically some way to write arbitrary Python code into a .pyx file >>>>>>>> that gets executed at compile time and can transform the AST, plus >>>>>>>> some nice convenience APIs for simple transformations. >>>>>>>> >>>>>>>> E.g., if we steal the illegal token sequence @@ as our marker, we >>>>>>>> could have something like: >>>>>>>> >>>>>>>> @@ # alone on a line, starts a block of Python code >>>>>>>> from Cython.MacroUtil import replace_ctype >>>>>>>> def expand_types(placeholder, typelist): >>>>>>>> ?def my_decorator(function_name, ast): >>>>>>>> ? ?functions = {} >>>>>>>> ? ?for typename in typelist: >>>>>>>> ? ? ?new_name = "%s_%s" % (function_name, typename) >>>>>>>> ? ? ?functions[name] = replace_ctype(ast, placeholder, typename) >>>>>>>> ? ?return functions >>>>>>>> ?return function_decorator >>>>>>>> @@ # this token sequence cannot occur in Python, so it's a safe >>>>>>>> end-marker >>>>>>>> >>>>>>>> # Compile-time function decorator >>>>>>>> # Results in two cdef functions named sum_double and sum_int >>>>>>>> @@expand_types("T", ["double", "int"]) >>>>>>>> cdef T sum(np.ndarray[T] arr): >>>>>>>> ?cdef T start = 0; >>>>>>>> ?for i in range(arr.size): >>>>>>>> ? ?start += arr[i] >>>>>>>> ?return start >>>>>>>> >>>>>>>> I don't know if this is a good idea, but it seems like it'd be very >>>>>>>> easy to do on the Cython side, fairly clean, and be dramatically >>>>>>>> less >>>>>>>> horrible than all the ad-hoc templating stuff people do now. >>>>>>>> Presumably there'd be strict limits on how much backwards >>>>>>>> compatibility we'd be willing to guarantee for code that went poking >>>>>>>> around in the AST by hand, but a small handful of functions like my >>>>>>>> notional "replace_ctype" would go a long way, and wouldn't impose >>>>>>>> much >>>>>>>> of a compatibility burden. >>>>>>>> >>>>>>>> -- Nathaniel >>>>>>>> _______________________________________________ >>>>>>>> cython-devel mailing list >>>>>>>> cython-devel at python.org >>>>>>>> http://mail.python.org/mailman/listinfo/cython-devel >>>>>>> >>>>>>> >>>>>>> Have you looked at >>>>>>> http://wiki.cython.org/enhancements/metaprogramming ? >>>>>>> >>>>>>> In general I would like better meta-programming support, maybe even >>>>>>> allow defining new operators (although I'm not sure any of it is very >>>>>>> pythonic), but for templates I think fused types should be used, or >>>>>>> improved when they fall short. Maybe a plugin system could also help >>>>>>> people. >>>>>> >>>>>> >>>>>> I hadn't seen that, no -- thanks for the link. >>>>>> >>>>>> I have to say that the examples in that link, though, give me the >>>>>> impression of a cool solution looking for a problem. I've never wished >>>>>> I could symbolically differentiate Python expressions at compile time, >>>>>> or create a mutant Python+SQL hybrid language. Actually I guess I've >>>>>> only missed define-syntax once in maybe 10 years of hacking in >>>>>> Python-the-language: it's neat how if you do 'plot(x, log(y))' in R it >>>>>> will peek at the caller's syntax tree to automagically label the axes >>>>>> as "x" and "log(y)", and that can't be done in Python. But that's not >>>>>> exactly a convincing argument for a macro system. >>>>>> >>>>>> But generating optimized code is Cython's whole selling point, and >>>>>> people really are doing klugey tricks with string-based preprocessors >>>>>> just to generate multiple copies of loops in Cython and C. >>>>>> >>>>>> Also, fused types are great, but: (1) IIUC you can't actually do >>>>>> ndarray[fused_type] yet, which speaks to the feature's complexity, and >>>>> >>>>> >>>>> What? Yes you can do that. >>>> >>>> >>>> I haven't been able to get ndarray[fused_t] to work as we've discussed >>>> off-list. In your own words "Unfortunately, the automatic buffer >>>> dispatch didn't make it into 0.16, so you need to manually >>>> specialize". I'm a bit hamstrung by other users needing to be able to >>>> compile pandas using the latest released Cython. >>> >>> >>> Well, as I said, it does work, but you need to tell Cython which type >>> you meant. If you don't want to do that, you have to use this branch: >>> https://github.com/markflorisson88/cython/tree/_fused_dispatch_rebased >>> . This never made it in since we had no consensus on whether to allow >>> the compiler to bootstrap itself and because of possible immaturity of >>> the branch. >>> >>> So what doesn't work is automatic dispatch for Python functions (def >>> functions and the object version of a cpdef function). They don't >>> automatically select the right specialization for buffer arguments. >>> Anything else should work, otherwise it's a bug. >>> >>> Note also that figuring out which specialization to call dynamically >>> (i.e. not from Cython space at compile time, but from Python space at >>> runtime) has non-trivial overhead on top of just argument unpacking. >>> But you can't say "doesn't work" without giving a concrete example of >>> what doesn't work besides automatic dispatch, and how it fails. >>> >> >> Sorry, I meant automatic dispatch re "doesn't work" and want to >> reiterate how much I appreciate the work you're doing. To give some >> context, my code is riddled with stuff like this: >> >> lib.inner_join_indexer_float64 >> lib.inner_join_indexer_int32 >> lib.inner_join_indexer_int64 >> lib.inner_join_indexer_object >> >> where the only difference between these functions is the type of the >> buffer in the two arrays passed in. I have a template string for these >> functions that looks like this: >> >> inner_join_template = """@cython.wraparound(False) >> @cython.boundscheck(False) >> def inner_join_indexer_%(name)s(ndarray[%(c_type)s] left, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ndarray[%(c_type)s] right): >> ? ? ''' >> ... >> >> I would _love_ to replace this with fused types. >> >> In any case, lately I've been sort of yearning for the kinds of things >> you can do with an APL-variant like J. Like here's a groupby in J: >> >> ? ?labels >> 1 1 2 2 2 3 1 >> ? ?data >> 3 4 5.5 6 7.5 _2 8.3 >> ? ?labels> ?????????????????????? >> ?3 4 8.3?5.5 6 7.5?_2? >> ?????????????????????? >> >> Here< ?is box and /. is categorize. >> >> Replacing the box< ?operator with +/ (sum), I get the group sums: >> >> ? ?labels +/ /. data >> 15.3 19 _2 >> >> Have 2-dimensional data? >> >> ? ?data >> ?0 ?1 ?2 ?3 ?4 ?5 ?6 >> ?7 ?8 ?9 10 11 12 13 >> 14 15 16 17 18 19 20 >> 21 22 23 24 25 26 27 >> 28 29 30 31 32 33 34 >> 35 36 37 38 39 40 41 >> 42 43 44 45 46 47 48 >> ? ?labels> ?????????????????????? >> ?0 1 6 ? ?2 3 4 ? ?5 ? >> ?????????????????????? >> ?7 8 13 ??9 10 11 ?12? >> ?????????????????????? >> ?14 15 20?16 17 18?19? >> ?????????????????????? >> ?21 22 27?23 24 25?26? >> ?????????????????????? >> ?28 29 34?30 31 32?33? >> ?????????????????????? >> ?35 36 41?37 38 39?40? >> ?????????????????????? >> ?42 43 48?44 45 46?47? >> ?????????????????????? >> >> ? ?labels +//."1 data >> ? 7 ? 9 ?5 >> ?28 ?30 12 >> ?49 ?51 19 >> ?70 ?72 26 >> ?91 ?93 33 >> 112 114 40 >> 133 135 47 >> >> However, J and other APLs are interpreted. If you generate C or >> JIT-compile I think you can do really well performance wise and have >> very expressive code for writing data algorithms without all this >> boilerplate. > > > I know how you feel. On one hand I really like metaprogramming; on the other > hand I think it is very difficult to get right when done compile-time (just > look at C++ -- I've heard that D is a bit better though I didn't really try > it). > > JIT is really the way to go. It is one thing that a JIT could optimize the > case where you pass a callback to a function and inline it run-time. But > even if it doesn't get that fancy, it'd be great to just be able to write > something like "cython.eval(s)" and have that be compiled (I guess you could > do that now, but the sheer overhead of the C compiler and all the .so files > involved means nobody would sanely use that as the main way of stringing > together something like pandas). > > I think that's really the way to get "Pythonic" metaprogramming, where you > mix and match runtime and compile-time, and can hook into arbitrary Python > code in the meta-programming step. > > Guido may even accept some syntax hooks to more easily express macros > without resorting to strings, if I got the right impression at PyData on his > take on DSLs. > > Without a JIT, all we seem to come up with will be kludges. So I'm sceptical > about more metaprogramming features in Cython now since it will be outdated > in five years -- by then, somebody will either have gotten our stuff > together and JIT-ed both Python and Cython, or we'll all be using something > else (like R or Julia). > > Dag > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I feel pretty strongly that we need a JIT, but it really needs to run inside CPython. The PyPy approach doesn't seem right to me, and Julia is nice but you're essentially starting from scratch (I am going to do some benchmarking / explorations to see how good Julia's JIT is). I don't have the JIT-fu to do this myself, and I probably won't have the bandwidth to work on it for a couple of years. I might be able to fund its development sooner rather than later, though. - Wes From wesmckinn at gmail.com Mon Apr 30 22:57:26 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 30 Apr 2012 16:57:26 -0400 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: <4F9EFABD.5080408@astro.uio.no> Message-ID: On Mon, Apr 30, 2012 at 4:55 PM, Nathaniel Smith wrote: > On Mon, Apr 30, 2012 at 9:49 PM, Dag Sverre Seljebotn > wrote: >> JIT is really the way to go. It is one thing that a JIT could optimize the >> case where you pass a callback to a function and inline it run-time. But >> even if it doesn't get that fancy, it'd be great to just be able to write >> something like "cython.eval(s)" and have that be compiled (I guess you could >> do that now, but the sheer overhead of the C compiler and all the .so files >> involved means nobody would sanely use that as the main way of stringing >> together something like pandas). > > The overhead of running a fully optimizing compiler over pandas on > every import is pretty high, though. You can come up with various > caching mechanisms, but they all mean introducing some kind of compile > time/run time distinction. So I'm skeptical we'll just be able to get > rid of that concept, even in a brave new LLVM/PyPy/Julia world. > > -- Nathaniel > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel I'd be perfectly OK with just having to compile pandas's "data engine" and generate loads of C/C++ code. JIT-compiling little array expressions would be cool too. I've got enough of an itch that I might have to start scratching pretty soon. From d.s.seljebotn at astro.uio.no Mon Apr 30 23:32:58 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 30 Apr 2012 23:32:58 +0200 Subject: [Cython] Wacky idea: proper macros In-Reply-To: References: <4F9EFABD.5080408@astro.uio.no> Message-ID: <339e72d3-3ef5-44d9-ac89-79dd494cc460@email.android.com> Wes McKinney wrote: >On Mon, Apr 30, 2012 at 4:55 PM, Nathaniel Smith wrote: >> On Mon, Apr 30, 2012 at 9:49 PM, Dag Sverre Seljebotn >> wrote: >>> JIT is really the way to go. It is one thing that a JIT could >optimize the >>> case where you pass a callback to a function and inline it run-time. >But >>> even if it doesn't get that fancy, it'd be great to just be able to >write >>> something like "cython.eval(s)" and have that be compiled (I guess >you could >>> do that now, but the sheer overhead of the C compiler and all the >.so files >>> involved means nobody would sanely use that as the main way of >stringing >>> together something like pandas). >> >> The overhead of running a fully optimizing compiler over pandas on >> every import is pretty high, though. You can come up with various >> caching mechanisms, but they all mean introducing some kind of >compile >> time/run time distinction. So I'm skeptical we'll just be able to get >> rid of that concept, even in a brave new LLVM/PyPy/Julia world. >> >> -- Nathaniel >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> http://mail.python.org/mailman/listinfo/cython-devel > >I'd be perfectly OK with just having to compile pandas's "data engine" >and generate loads of C/C++ code. JIT-compiling little array >expressions would be cool too. I've got enough of an itch that I might >have to start scratching pretty soon. I think a good start is: Myself I'd look into just using Jinja2 to generate all the Cython code, rather than those horrible Python interpolated strings...that should give you something that's at least rather pleasant for you to work with once you are used to it (even if it is a bit horrible to newcomers to the code base). You can even check in the generated sources. And we've discussed letting cython be smart with templating languages and error report on a line in the original template, such features will certainly accepted once somebody codes it up. (I can give you me breakdown of how I eliminate other templating languages than Jinja2 for this purpose tomorrow if you are interested). Dag >_______________________________________________ >cython-devel mailing list >cython-devel at python.org >http://mail.python.org/mailman/listinfo/cython-devel -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From wstein at gmail.com Mon Apr 30 23:36:42 2012 From: wstein at gmail.com (William Stein) Date: Mon, 30 Apr 2012 14:36:42 -0700 Subject: [Cython] Wacky idea: proper macros In-Reply-To: <339e72d3-3ef5-44d9-ac89-79dd494cc460@email.android.com> References: <4F9EFABD.5080408@astro.uio.no> <339e72d3-3ef5-44d9-ac89-79dd494cc460@email.android.com> Message-ID: On Mon, Apr 30, 2012 at 2:32 PM, Dag Sverre Seljebotn wrote: > > > Wes McKinney wrote: > >>On Mon, Apr 30, 2012 at 4:55 PM, Nathaniel Smith wrote: >>> On Mon, Apr 30, 2012 at 9:49 PM, Dag Sverre Seljebotn >>> wrote: >>>> JIT is really the way to go. It is one thing that a JIT could >>optimize the >>>> case where you pass a callback to a function and inline it run-time. >>But >>>> even if it doesn't get that fancy, it'd be great to just be able to >>write >>>> something like "cython.eval(s)" and have that be compiled (I guess >>you could >>>> do that now, but the sheer overhead of the C compiler and all the >>.so files >>>> involved means nobody would sanely use that as the main way of >>stringing >>>> together something like pandas). >>> >>> The overhead of running a fully optimizing compiler over pandas on >>> every import is pretty high, though. You can come up with various >>> caching mechanisms, but they all mean introducing some kind of >>compile >>> time/run time distinction. So I'm skeptical we'll just be able to get >>> rid of that concept, even in a brave new LLVM/PyPy/Julia world. >>> >>> -- Nathaniel >>> _______________________________________________ >>> cython-devel mailing list >>> cython-devel at python.org >>> http://mail.python.org/mailman/listinfo/cython-devel >> >>I'd be perfectly OK with just having to compile pandas's "data engine" >>and generate loads of C/C++ code. JIT-compiling little array >>expressions would be cool too. I've got enough of an itch that I might >>have to start scratching pretty soon. > > I think a good start is: > > Myself I'd look into just using Jinja2 to generate all the Cython code, rather than those horrible Python interpolated strings...that should give you something that's at least rather pleasant for you to work with once you are used to it (even if it is a bit horrible to newcomers to the code base). > > You can even check in the generated sources. > > And we've discussed letting cython be smart with templating languages and error report on a line in the original template, such features will certainly accepted once somebody codes it up. > > ?(I can give you me breakdown of how I eliminate other templating languages than Jinja2 for this purpose tomorrow if you are interested). Can you point us to a good example of you using jinja2 for this purpose? I'm a big fan of Jinja2 in general (e.g., for HTML)... > > Dag > >>_______________________________________________ >>cython-devel mailing list >>cython-devel at python.org >>http://mail.python.org/mailman/listinfo/cython-devel > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > http://mail.python.org/mailman/listinfo/cython-devel -- William Stein Professor of Mathematics University of Washington http://wstein.org