[Distutils] People want CPAN :-)

David Cournapeau david at ar.media.kyoto-u.ac.jp
Sun Nov 8 14:22:07 CET 2009


Georg Brandl wrote:
>
> One thing about CPAN (and Haskell's libraries on hackage) that I think
> many people see favorably, even though it's only superficial, is the
> more-or-less consistent hierarchical naming for the individual packages
> (or the contained modules in Haskell).  Compared with that, the Python
> package namespace looks untidy.

That's true, but there is not much we can do on this one, so I did not
mention it.

>
> Note that the downloadable distutils manual has 94 pages and *should* be
> enough to explain the basics of packaging.  It has to be updated, of
> course, once the more advanced mechanisms are part of the core.

The manual is too complicated for simple tasks, and not very useful for
complex
ones. Mostly because distutils does not follow the "only one way to do
things"
mantra. I can help to improve the distutils documentation for the build
part, which is mostly undocumented (things like how to create a new
command to build ctypes extensions, etc...).

>
> Me too.  Though it would be Snakebite + serious sandboxing.

Sandboxing is of course needed, but that's a known problem, and people
have already thought hard about it. The open suse build system, albeit
linux specific, works quite well, for example. For environment
sandboxing, chroot works on all unix I know (including mac os x) -
security is more challenging, I don't have any expertise there. Windows
is more difficult to handle, though (maybe windows people know good
sandboxing solutions outside full-blown vm).

> What you're saying there about Cabal is exactly my experience.  It is very
> nice to work with, and I've not yet seen a conceptual failure.
>
> But we're not far from that, with a static metadata file.

Several people have claimed this so far, but I don't understand why -
could you
expand on this ?  My impression is that the focus is mostly on version
specification and install/build requirements in the static data, but to me
that's a tiny detail. I want something like .cabal files, where you can
specify
documentation, data, source files, etc... Something like what I started to
prototype there:

http://github.com/cournape/toydist/

To take an example you are well familiar with, you can fully describe
sphinx with it, and the conversion is mostly automatic. This is not even
500 LOC. With this kind of design, you can use different build systems
on top of it (there is for example unpublished code in toydist to use a
scons-based build system instead  of distutils as currently done).

>
>> I won't rehearse it here, but basically:
>>     - distutils is too complex for simple packages, and too inflexible
>> for complex ones. Adding new features to distutils is a painful
>> experience. Even autotools with its mix of 100 000 lines autogenerated
>> shell code, perl, m4 is more pleasant.
>
> Really? 

Sure, the perl/shell/awk/m4 mix is painful, but at least the result is
reasonably robust, and can be extended.

>  I would have assumed that even writing a whole new distutils
> command/build step can't be more painful than adding the equivalent to
> an autotools-style build system, being Python after all.  However, I've
> never done such a thing, so I have to believe you.

I expand on that, because I think few people understand the problem
here, and
that's maybe the main source of frustration for core numpy developers as
far as
distutils is concerned. True, writing your own command is easy. But it
has many
failure modes:
 - if you extend an existing command, you have to take care whether you run
   under setuptools or distutils (and soon distribute will make this worse).
   Those commands will not work the same when you run them under paver
either.
 - the division in subcommands is painful, and the abstraction does not make
   much sense IMHO. Recently, I needed to access simple things like library
   filename (foo ->libfoo.a/foo.lib/etc..), install prefix. But those
are not
   accessible to each command. The install prefix was particularly
painful, it
   took me several hours to get it work right with distutils inplace,
develop
   mode on all platforms. All this is trvially easy to get with
autotools, scons or waf.
   Every new feature I needed to add to numpy.distutils was an unpleasant
   experience. I had to read the distutils sources (for every supported
python
   version), run it on several platforms, and got it working by trial an
error.
 - if you want to add a new source file extension, you have to rewrite the
   build_ext or build_src command and  you often cannot reuse the base class
   methods.
 - etc...

Also, the distutils code is horrible: you don't really know what's
public and
what's not, most attributes are added at runtime (and sometimes differ
depending on the platform). Often, you get strange errors with the exception
swallowed, and that happens only on some platforms for some users; in that
case, the only way to debug it is to be able to run their platform. When you
write extensions to distutils, this contributes to the whole unpleasant
experience.

>
> Coming back to Cabal, do you know how easy it is to customize its build
> steps?

No, I don't. I know you have to use makefile/autoconf for complex
packages (for example,
gtk wrapper for haskell does not use cabal AFAIK).

But I think the only thing which matters is to have a basic simple with
which
you can interoperate with. It would not make sense to require from standard
build/packaging system to support fortran or most of what we need in
numpy. For example,
I have written numscons to get away from distutils: it enables the use
of scons for all the
building part, and can build complicated packages such as numpy and
scipy on many platforms (including
windows) but we cannot use it as our main tool because you can't easily
interoperate with distutils (to get sdist, bdist_wininst, etc... working).
A system which would make this possible would already be great - such a
system
would be both simpler and more reliable than current distutils IMHO.

>
> That's what we're heading towards, I think.

Guido wanted to know how scientific python people feel about the whole
situation, and my own impression is that we are going further from what we
need. I don't think anything based on distutils can help us. This is not to
criticize Tarek, PJE and other people's work: I understand that
distutils and
setuptools solve a lot of problems for many people, and I may just be a
minority.

David


More information about the Distutils-SIG mailing list