[Distutils] People want CPAN :-)

Pauli Virtanen pav at iki.fi
Sat Nov 7 13:03:14 CET 2009


la, 2009-11-07 kello 00:14 +0200, Alex Grönholm kirjoitti:
[clip: problems in distributing scientific Python packages]   
> I for one did not understand the problem. What does CPAN have that
PyPI 
> doesn't?
> It is natural for packages (distributions, in distutils terms) to
have 
> dependencies on each other. Why is this a problem?

Personally, I've not so much had trouble with PyPi, but with the rest of
the toolchain.

What's special with scientific software is that

- They're usually not pure-Python
- Need support for not only for C, but e.g. Fortran compilers
- It may be necessary to build them on platforms
  where libraries etc. are in non-standard places
- It may be useful to be able to build them with non-gcc compilers
- They may need to ship more data files etc. than plain Python modules
- Python is a newcomer on the scientific scene.
  Not all people want to spend time to spend on installation problems.
  Not all people are experienced Python users.

So it may be more likely that the following things hurt in distributing
these Python modules:

1. Incomplete documentation for distutils.

   For example, where can you find out what `package_data` option
   of setup() wants as the input? What if you have your package in
   src/packagename and data files under data/? What are the paths
   given to it relative to?

   The Distribute documentation is starting to look quite
   reasonable -- so documentation is becoming less of a problem.
   But it seems still to assume that the reader is familiar with
   distutils.

2. Magic.

   For example, what decides which files are included by sdist?
   It appears this depends on (i) what's in the autogenerated
   *.egg-info/SOURCES.txt (ii) whether you are using SVN and are
   using setuptools (iii) possible package_data etc. options,
   (iv) MANIFEST or maybe MANIFEST.in.

   IMHO, the system is too byzantine in ordinary matters,
   which increases the number of things you need to learn.

3. Many layers: distutils, setuptools, numpy.distutils.

   Numpy has its own distutils extensions, primarily for Fortran
   support.

4. Inflexibility.

   The toolchain is a bit inflexible: suppose you need to do
   something "custom" during the build, say, detect sizeof(long double)
   and add a #define to build options according to it. Finding out how
   to do this properly again takes time.

5. Distutils, and tools derived from it have bad failure modes.

   This hurts most with building extension modules. Given the many
   layers, and the fact that the build is driven by software that few
   really understand, it's difficult to understand and fix even simple
   errors encountered.

   Suppose a build fails, because your C or Fortran compiler gets passed
   a flag it doesn't like. How do you work around this?

   Suppose you have a library installed in a non-standard location.
   How do you tell distutils to look for it in the correct place?
   (The answer is to use "build_ext" command separately and pass it -L,
   but this is difficult to find out, as "build" does not accept -L.)


The last one is in practice quite annoying, as given the heterogenous
environments, it's not easy to make your package buildable on all
possible platforms where people might want to use it. When people run
into problems, they are stumped by the complexity of distutils.


The above concerns only building packages -- perhaps there is more to
say also about other parts. Also, I don't really have much experience
with CPAN or CRAN, so I can't say how much Python is better or worse off
here.

-- 
Pauli Virtanen





More information about the Distutils-SIG mailing list