[Cython] Utilities, cython.h, libcython
mark florisson
markflorisson88 at gmail.com
Wed Oct 5 16:18:11 CEST 2011
On 5 October 2011 14:54, mark florisson <markflorisson88 at gmail.com> wrote:
> On 5 October 2011 08:38, Robert Bradshaw <robertwb at math.washington.edu> wrote:
>> On Wed, Oct 5, 2011 at 12:16 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
>>> mark florisson, 04.10.2011 23:19:
>>>>
>>>> So I propose that after fused types gets merged we try to move as many
>>>> utility codes as possible to their utility code files (unless they are
>>>> used in pending pull requests or other branches). Preferably this will
>>>> be done in one or a few commits. How should we split up the work
>>>
>>> I would propose that new utility code gets moved out into utility files
>>> right away (if doable, given the current state of the infrastructure), and
>>> that existing utility code gets moves when it gets modified or when someone
>>> feels like it. Until we really get to the point of wanting to create a
>>> separate shared library etc., there's no need to hurry with the move.
>>>
>>>
>>>> We could actually move things before fused types get merged, as long
>>>> as we don't touch binding_cfunc_utility_code.
>>>
>>> Another reason not to hurry, right?
>>>
>>>
>>>> Before we go there, Stefan, do we still want to implement the header
>>>> .ini style which can list dependencies and such?
>>>
>>> I think we'll eventually need that, but that also depends a bit on the
>>> question whether we want to (or can) build a shared library or not. See
>>> below.
>>>
>>>
>>>> Another issue is that Cython compile time is increasing with the
>>>> addition of control flow and cython utilities. If you use fused types
>>>> you're also going to combinatorially add more compile time.
>>>
>>> I don't see that locally - a compiled Cython is hugely fast for me. In
>>> comparison, the C compiler literally takes ages to compile the result. An
>>> external shared library may or may not help with both - in particular, it is
>>> not clear to me what makes the C compiler slow. If the compile time is
>>> dominated by the number of inlined functions (which is not unlikely), a
>>> shared library + header file will not make a difference.
>>>
>>>
>>>> I'm sure
>>>> this came up earlier, but I really think we should have a libcython
>>>> and a cython.h. libcython (a shared library) should contain any common
>>>> Cython-specific code not meant to be inlined, and cython.h any types,
>>>> macros and inline functions etc.
>>>
>>> This has a couple of implications though. In order to support this on the
>>> user side, we have to build one shared library per installed package in
>>> order to avoid any Cython versioning issues. Just installing a versioned
>>> "libcython_x.y.z.so" globally isn't enough, especially during development,
>>> but also at deployment time. Different packages may use different CFLAGS or
>>> Cython options, which may have an impact on the result. Encoding all
>>> possible factors in the file name will be cumbersome and may mean that we
>>> still end up with a number of installed Cython libraries that correlates
>>> with the number of installed Cython based packages.
>>
>> That's a good point. Perhaps an easier first target is to have one
>> "libcython" per package (with a randomized or project-specific name).
>> Longer-term, I think the goal of one libcython per version is a
>> reasonable one, for deployment at least. Exceptional packages (e.g.
>> that require a special set of CFLAGS rather than the ones Python was
>> built with) can either bundle their own or forgo any sharing of code
>> as it is done now, and features that can't be easily normalized across
>> (cython and c) compilation options would remain in project-specific
>> generated .c files.
>>
>>> Next, we may not know at build time which set of Cython modules is in the
>>> package. This may be less of an issue if we rely on "cythonize()" in
>>> setup.py to compile all modules before hand (assuming that the user doesn't
>>> call it twice, once for *.pyx, once for *.py, for example), but even if we
>>> know all modules, we'd still have to figure out the complete set of utility
>>> code used by all modules in order to build an adapted library with only the
>>> necessary code used in the package. So we'd always end up with a complete
>>> library with all utility code, which is only really interesting for larger
>>> packages with several Cython modules.
>>
>> Yes, I'm thinking we would create relatively complete libraries,
>> though if we did things on a per package level perhaps we could do
>> some pruning. We could still conditionally put some of the utility
>> code (especially the rarely used or shared stuff) into each module.
>
> Yeah that would be nice. I actually think we shouldn't do anything on
> a per-package level, only a bunch of modules with related stuff
> (conversion utilities/exception raising etc in one module,
> buffer/memoryview utilities in another etc). We've been living with
> huge files since now, I don't think we suddenly need to actively start
> pruning for a little bit of memory.
>
> I think the module approach would also be easy to implement, as the
> infrastructure for external cdef functions/classes importing/exporting
> is already there.
>
>>> I agree with Robert that a CEP would be needed for this, both for clearing
>>> up the implications and actual use cases (I know that Sage is a reasonable
>>> use case, but it's also a rather special case).
>>>
>>>
>>>> This will decrease Cython and C
>>>> compile time, and will also make executables smaller.
>>>
>>> I don't see how this actually impacts executables. However, a self-contained
>>> executable is a value in itself.
>>
>> As an example, we're starting to have full utility types, e.g. for
>> generators and or CyFunction. Lots of the utility code (e.g. loading
>> modules, raising exceptions, etc.) could be shared as well. For
>> something like Sage that could be a significant savings, and it could
>> be a big boon for cython.inline as well.
>>
>>>> This could be
>>>> enabled using a command line option to Cython, as well as with
>>>> distutils, eventually we may decide to make it the default (lets
>>>> figure that out later). Preferably libcython.so would be installed
>>>> alongside libpython.so and cython.h inside the Python include
>>>> directory.
>>>
>>> I don't see this happening. It's easy for Python (there is only one Python
>>> running at a time, with one libpython loaded), but it's a lot less safe for
>>> different versions of a Cython library that are used by different modules
>>> inside of the running Python. For example, we'd have to version all visible
>>> symbols in operating systems with flat namespaces, in order to support
>>> loading multiple versions of the library.
>>
>> Which is another advantage to "linking" via the cimport mechanisms.
>>
>>>> Lastly, I think we also should figure out a way to serialize Entry
>>>> objects from CythonUtilities, which could easily and swiftly be loaded
>>>> when creating the cython scope. It's quite a pain to declare all
>>>> entries for utilities you write manually
>>>
>>> Why would you declare them manually? I thought everything would be moved out
>>> into the utility code files?
>>>
>>>
>>>> so what I mostly did was
>>>> parse the utility up to and including AnalyseDeclarationsTransform,
>>>> and then retrieve the entries from there.
>>>
>>> Sounds like a drawback regarding the processing time, but may still be a
>>> reasonable way to do it. I would expect that it won't be hard to pickle the
>>> resulting dict of entries into a cache file and rebuild it only when one of
>>> the utility files changes.
>>
>> +1
>>
>> It'd be great to be able to do this for the many .pxd files in Sage as
>> well. Parsing .pxd files is a huge portion of the compilation of the
>> Sage library.
>>
>> - Robert
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
>>
>
I expect it will also speed up the test runner quite a lot, which
takes forever, as there are lots of small doctests. <offtopic>On an
unrelated note, it'd be great if we could run individual doctests in
parallel, I know py.test can do that, maybe nosetests as well. It'd be
great if there was a plugin that supported cython (as well as C
extension modules) that could run them, and an additional plugin that
could make it work with our various test modes and
directives.</offtopic>
More information about the cython-devel
mailing list