New subject: Announcing toydist, improving distribution and packaging situation

28 Dec 2009

      (warning, long post)

Hi there,

   As some of you already know, the packaging and distributions of
scientific python packages have been a constant source of frustration.
Open source is about making it easy for anyone to use software how
they see fit, and I think python packaging infrastructure has not been
very successfull for people not intimately familiar with python. A few
weeks ago, after Guido visited Berkeley and was told how those issues
were still there for the scientific community, he wrote an email
asking whether current efforts on distutils-sig will be enough (see
http://aspn.activestate.com/ASPN/Mail/Message/distutils-sig/3775972).

Several of us have been participating to this discussion, but I feel
like the divide between current efforts on distutils-sig and us (the
SciPy community) is not getting smaller. At best, their efforts will
be more work for us to track the new distribute fork, and more likely,
it will be all for nothing as it won't solve any deep issue. To be
honest, most of what is considered on distutils-sig sounds like
anti-goals to me.

Instead of keeping up with the frustrating process of "improving"
distutils, I think we have enough smart people and manpower in the
scientific community to go with our own solution. I am convinced it is
doable because R or haskell, with a much smaller community than
python, managed to pull out something with is miles ahead compared to
pypi. The SciPy community is hopefully big enough so that a
SciPy-specific solution may reach critical mass. Ideally, I wish we
had something with the following capabilities:

  - easy to understand tools
  - http-based package repository ala CRAN, which would be easy to
mirror and backup (through rsync-like tools)
  - decoupling the building, packaging and distribution of code and data
  - reliable install/uninstall/query of what is installed locally
  - facilities for building windows/max os x binaries
  - making the life of OS vendors (Linux, *BSD, etc...) easier

The packaging part
==============

Speaking is easy, so I started coding part of this toolset, called
toydist (temporary name), which I presented at Scipy India a few days
ago:

http://github.com/cournape/toydist/

Toydist is more or less a rip off of cabal
(http://www.haskell.org/cabal/), and consist of three parts:
  - a core which builds a package description from a declarative file
similar to cabal files. The file is almost purely declarative, and can
be parsed so that no arbitrary code is executed, thus making it easy
to sandbox packages builds (e.g. on a build farm).
  - a set of command line tools to configure, build, install, build
installers (egg only for now) etc... from the declarative file
  - backward compatibility tools: a tool to convert existing setup.py
to the new format has been written, and a tool to use distutils
through the new format for backward compatibility with complex
distutils extensions should be relatively easy.

The core idea is to make the format just rich enough to describe most
packages out there, but simple enough so interfacing it with external
tools is possible and reliable. As a regular contributor to scons, I
am all too aware that a build tool is a very complex beast to get
right, and repeating their efforts does not make sense. Typically, I
envision that complex packages such as numpy, scipy or matplotlib
would use make/waf/scons for the build - in a sense, toydist is
written so that writing something like numscons would be easier. OTOH,
most if not all scikits should be buildable from a purely declarative
file.

To give you a feel of the format, here is a snippet for the grin
package from Robert K. (automatically converted):

Name: grin
Version: 1.1.1
Summary: A grep program configured the way I like it.
Description:
    ====
    grin
    ====

    I wrote grin to help me search directories full of source code.
The venerable
    GNU grep_ and find_ are great tools, but they fall just a little
short for my
    normal use cases.

    <snip>
License: BSD
Platforms: UNKNOWN
Classifiers:
    License :: OSI Approved :: BSD License,
    Development Status :: 5 - Production/Stable,
    Environment :: Console,
    Intended Audience :: Developers,
    Operating System :: OS Independent,
    Programming Language :: Python,
    Topic :: Utilities,

ExtraSourceFiles:
    README.txt,
    setup.cfg,
    setup.py,

Library:
    InstallDepends:
        argparse,
    Modules:
        grin,

Executable: grin
    module: grin
    function: grin_main

Executable: grind
    module: grin
    function: grind_main

Although still very much experimental at this point, toydist already
makes some things much easier than with distutils/setuptools:
  - path customization for any target can be done easily: you can
easily add an option in the file so that configure --mynewdir=value
works and is accessible at every step.
 - making packages FHS compliant is not a PITA anymore, and the scheme
can be adapted to any OS, be it traditional FHS-like unix, mac os x,
windows, etc...
  - All the options are accessible at every step (no more distutils
commands nonsense)
  - data files can finally be handled correctly and consistently,
instead of the 5 or 6 magics methods currently available in
distutils/setuptools/numpy.distutils
  - building eggs does not involve setuptools anymore
  - not much coupling between package description and build
infrastructure (building extensions is actually done through distutils
ATM).

Repository
========

The goal here is to have something like CRAN
(http://cran.r-project.org/web/views/), ideally with a build farm so
that whenever anyone submits a package to our repository, it would
automatically be checked, and built for windows/mac os x and maybe a
few major linux distributions. One could investigate the build service
from open suse to that end (http://en.opensuse.org/Build_Service),
which is based on xen VM to build installers in a reproducible way.

Installed package db
===============

I believe that the current open source enstaller package from
Enthought can be a good starting point. It is based on eggs, but eggs
are only used as a distribution format (eggs are never installed as
eggs AFAIK). You can easily remove packages, query installed versions,
etc... Since toydist produces eggs, interoperation between toydist and
enstaller should not be too difficult.

What's next ?
==========

At this point, I would like to ask for help and comments, in particular:
 - Does all this make sense, or hopelessly intractable ?
 - Besides the points I have mentioned, what else do you think is needed ?
 - There has already been some work for the scikits webportal, but I
think we should bypass pypi entirely (the current philosophy of not
enforcing consistent metadata does not make much sense to me, and is
at the opposite of most other similar system out there).
 - I think a build farm for at least windows packages would be a
killer feature, and enough incentive to push some people to use our
new infrastructure. It would be good to have a windows guy familiar
with windows sandboxing/virtualization to do something there. The
people working on the opensuse build service have started working on
windows support
 - I think being able to automatically convert most of scientific
packages is a significant feature, and needs to be more robust - so
anyone is welcomed to try converting existing setup.py with toydist
(see toydist readme).

thanks,

David

Announcing toydist, improving distribution and packaging situation

Dag Sverre Seljebotn

Dag Sverre Seljebotn

Darren Dale

Dag Sverre Seljebotn

Keith Goodman

Keith Goodman

Dag Sverre Seljebotn

Darren Dale

Darren Dale

tags

participants (11)