[Cython] Utilities, cython.h, libcython

Thu Oct 6 02:05:21 CEST 2011

On Wednesday, October 5, 2011, mark florisson wrote:

> On 5 October 2011 08:16, Stefan Behnel <stefan_ml at behnel.de <javascript:;>>
> wrote:
> > mark florisson, 04.10.2011 23:19:
> >>
> >> So I propose that after fused types gets merged we try to move as many
> >> utility codes as possible to their utility code files (unless they are
> >> used in pending pull requests or other branches). Preferably this will
> >> be done in one or a few commits. How should we split up the work
> >
> > I would propose that new utility code gets moved out into utility files
> > right away (if doable, given the current state of the infrastructure),
> and
> > that existing utility code gets moves when it gets modified or when
> someone
> > feels like it. Until we really get to the point of wanting to create a
> > separate shared library etc., there's no need to hurry with the move.
> >
> >
> >> We could actually move things before fused types get merged, as long
> >> as we don't touch binding_cfunc_utility_code.
> >
> > Another reason not to hurry, right?
> >
> >
> >> Before we go there, Stefan, do we still want to implement the header
> >> .ini style which can list dependencies and such?
> >
> > I think we'll eventually need that, but that also depends a bit on the
> > question whether we want to (or can) build a shared library or not. See
> > below.
> >
> >
> >> Another issue is that Cython compile time is increasing with the
> >> addition of control flow and cython utilities. If you use fused types
> >> you're also going to combinatorially add more compile time.
> >
> > I don't see that locally - a compiled Cython is hugely fast for me. In
> > comparison, the C compiler literally takes ages to compile the result. An
> > external shared library may or may not help with both - in particular, it
> is
> > not clear to me what makes the C compiler slow. If the compile time is
> > dominated by the number of inlined functions (which is not unlikely), a
> > shared library + header file will not make a difference.
> >
>
> Have you tried with the memoryviews merged? e.g. if I have this code:
>
> from libc.stdlib cimport malloc
> cdef int[:] slice = <int[:10]> <int *> malloc(sizeof(int) * 10)
>
> [0] [14:45] ~  ➤ time cython test.pyx
> cython test.pyx  2.61s user 0.08s system 99% cpu 2.695 total
> [0] [14:45] ~  ➤ time zsh compile
> zsh compile  1.88s user 0.06s system 99% cpu 1.946 total
>
> where 'compile' is the script that invoked the same gcc command
> distutils uses. As you can see it took more than 2.5 seconds to
> compile this code (simply because the memoryview utilities get
> included). The C compiler does it quite a lot faster here. This
> obviously depends largely on your code, you get probably have it the
> other way around as well.
>

Anything we can do to cache/dedupe things here would be great.

> >> I'm sure
> >> this came up earlier, but I really think we should have a libcython
> >> and a cython.h. libcython (a shared library) should contain any common
> >> Cython-specific code not meant to be inlined, and cython.h any types,
> >> macros and inline functions etc.
> >
> > This has a couple of implications though. In order to support this on the
> > user side, we have to build one shared library per installed package in
> > order to avoid any Cython versioning issues. Just installing a versioned
> > "libcython_x.y.z.so" globally isn't enough, especially during
> development,
> > but also at deployment time. Different packages may use different CFLAGS
> or
> > Cython options, which may have an impact on the result. Encoding all
> > possible factors in the file name will be cumbersome and may mean that we
> > still end up with a number of installed Cython libraries that correlates
> > with the number of installed Cython based packages.
>
> Hm, I think the CFLAGS are important so long as they are compatible
> with Python. When the user compiles a Cython extension module with
> extra CFLAGS, this doesn't affect libpython. Similarly, the Cython
> utilities are really not the user's responsibility, so libcython
> doesn't need to be compiled with the same flags as the extension
> module. If still wanted, the user could either recompile python with
> different CFLAGS (which means libcython will get those as well), or
> not use libcython at all. CFLAGS should really only pertain to user
> code, not to the Cython library, which the user shouldn't be concerned
> about.
>
> > Next, we may not know at build time which set of Cython modules is in the
> > package. This may be less of an issue if we rely on "cythonize()" in
> > setup.py to compile all modules before hand (assuming that the user
> doesn't
> > call it twice, once for *.pyx, once for *.py, for example), but even if
> we
> > know all modules, we'd still have to figure out the complete set of
> utility
> > code used by all modules in order to build an adapted library with only
> the
> > necessary code used in the package. So we'd always end up with a complete
> > library with all utility code, which is only really interesting for
> larger
> > packages with several Cython modules.
> > I agree with Robert that a CEP would be needed for this, both for
> clearing
> > up the implications and actual use cases (I know that Sage is a
> reasonable
> > use case, but it's also a rather special case).
> >
> >
> >> This will decrease Cython and C
> >> compile time, and will also make executables smaller.
> >
> > I don't see how this actually impacts executables. However, a
> self-contained
> > executable is a value in itself.
> >
> >
> >> This could be
> >> enabled using a command line option to Cython, as well as with
> >> distutils, eventually we may decide to make it the default (lets
> >> figure that out later). Preferably libcython.so would be installed
> >> alongside libpython.so and cython.h inside the Python include
> >> directory.
> >
> > I don't see this happening. It's easy for Python (there is only one
> Python
> > running at a time, with one libpython loaded), but it's a lot less safe
> for
> > different versions of a Cython library that are used by different modules
> > inside of the running Python. For example, we'd have to version all
> visible
> > symbols in operating systems with flat namespaces, in order to support
> > loading multiple versions of the library.
> >
> >
> >> Lastly, I think we also should figure out a way to serialize Entry
> >> objects from CythonUtilities, which could easily and swiftly be loaded
> >> when creating the cython scope. It's quite a pain to declare all
> >> entries for utilities you write manually
> >
> > Why would you declare them manually? I thought everything would be moved
> out
> > into the utility code files?
> >
>
> Right, the code is in the utility files. However, the cython scope
> needs to have the entries of the classes and functions of the
> utilities. e.g. the user may write
>
> cimport cython
>
> cdef cython.array myobject
>
> For this to work, we need an 'array' entry, which we don't have yet,
> as the utility code will be parsed at code generation time if an entry
> of that utility code (which doesn't exist yet!) is used.
>
> >> so what I mostly did was
> >> parse the utility up to and including AnalyseDeclarationsTransform,
> >> and then retrieve the entries from there.
> >
> > Sounds like a drawback regarding the processing time, but may still be a
> > reasonable way to do it. I would expect that it won't be hard to pickle
> the
> > resulting dict of entries into a cache file and rebuild it only when one
> of
> > the utility files changes.
>
> Exactly. I'm not sure about pickle though, but the details don't
> matter. Pickle is certainly easy as long as you don't change your
> interface (which we most certainly will, though).
>
> We can version the cache to handle this.

- Robert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cython-devel/attachments/20111005/a7d500f9/attachment.html>