[Distutils] People want CPAN :-)

Kevin Teague kevin at bud.ca
Sat Nov 7 02:56:25 CET 2009


Since part of my job includes assisting bioinformaticians with
installing packages and providing libraries which they can use, I'll
chime in with a couple of major thrusts behind the statement "CPAN >
PyPI":

1) Automatic dependency handling.

Python still doesn't officially support automatic dependency handling.
The addition of the 'Requires' field specifying import names and not
the project distribution names was unfortunate. Thankfully PJE gave us
'install_requires' and Tarek and others are helping to PEP that field
into the official package metadata.

So now it's possible to install many Python projects today with
automatic dependency handling. However, for scientific python
projects, there has been little movement towards this. Many projects
still have setup.py scripts that are written such that they can only
be installed with manual hand-holding. BioPython is one such example
where running 'python setup.py install' asks a series of interactive
questions on the command prompt ... which I imagine was written that
way because Python didn't have any dependency-handling tools at they
time the setup.py was created, so they essentially wrote a mini-
dependency handler in setup.py. This is a common pattern in many
scientific python projects (and it sucks!). This means that you can't
write a python project, state that you require BioPython, and
automatically install that dependency when your project is installed.

With the CPAN client being ubiquitous in Perl, any author who wrote a
Makefile.PL that behaved like that would be harangued until they fixed
it. But things are moving the right direction in Python, so we'll get
there, just a decade or so later than Perl. But better late than
never!

Client-wise, I think Python is starting to do really well (if you are
fortunate enough to only need to install projects with well written
setup.py scripts). Our bioinformaticians use Buildout or pip
+virtualenv, depending up which tool is better suited for them (e.g.
they just need some libraries to run against or a team collaborating
on a more complex web application or a project that includes some non-
python parts as well). In this area I think CPAN's first mover
advantage is working against them, as inertia and the "CPAN is better"
mantra has meant that tools that allow repeatable installations like
Buildout or Pip either don't exist or are very immature and rarely
used. I'd much rather declaratively state up-front (and store in
version control) what packages and the configuration those packages
may need, then grab a coffee while the tool does it's job, than sit
there guiding an interactive session along like the CPAN client
requires.

Being able to run private PyPI servers is a nice PyPI advantage on the
server side. Bioinformaticians need to share packages where the code
either can't be pulbically released, or they simply don't spend enough
time coding to want to release code which "anyone could see".


2) The python packaging "morass" as a scapegoat

Yeah, so installing, creating and distributing Python packages is a
bit of a morass. The fact that Python has a much larger Standard
Library than Perl means that out-of-the box Python is easier, but
folks using Perl are forced to learn how to install libraries much
earlier on, so they get over that initial hump of "how to deal with
3rd party libraries" a bit earlier. And as many have noted the
documentation surrounding Python packaging is not the greatest.

However, in my experience scientists can be notorious for not really
wanting to think about what code they are running or what version it
is or how they should be deploying it. Their brains are already quite
full trying to deal with the complex scientific problems they're
working on, and there is no glamour or kudos for being "good at
installating software" for a scientist. They often take the same
attitude to automated testing, but at least the scientist who says,
"ah, automated testing is a waste of time and adds little value" may
feel quite a bit of peer pressure to change this opinion. But a
scientist who says, "ah, packaging and installation is a morass and a
waste of time." tends only be met with, "amen! You're better of not
even trying you'll just be wasting your time". So it doesn't take much
for them to throw their arms up and say, "augh! It's hopeless!" and
they have no shortage of scapegoats in Python packaging right now! But
our organization has Perl using-scientists and Ruby using-scientists
and they love to say the same thing, "Library management is too hard
with Perl" or "Ruby gems suck" and "Java CLASSPATH management is a
nightmare".



> Is the work on distutils-sig going to be
> enough? Or do we need some other kind of work in addition? Do we need
> more than PyPI?

I think the current works-in-progress will eventually get us equal to
or greater than CPAN. But given the history and design of Distutils,
it is sometimes tempting to say, "Ah, let's just chuck the whole thing
out and start with something clean and well-designed." Of course,
pragmatically I don't think such an effort would do much more than
generate a lot of hand-wringing.

However, if we really had the interests of the scientific community in
mind, we'd be thinking about the fact that scientists write code in
Python, Ruby, Perl, R and whatever language happens to be handy and
gets the job done. And from their perspective the fact that if they
happen to use two different programming languages, they need to learn
two different packaging formats and two different package management
tools. If packaging formats and installation formats were standardized
enough that they worked for multiple languages, that would be
something that many might consider jumping ship for. And it would
provide a level playing fields for all F/OSS languages. It's a
terribly ambitious project, but I can dream can't I?

(and yes, we do use system package managers such as RPM+Yum and Dpkg
+apt-get where it makes sense - but those tools and formats are really
more geared towards sysadmins and I've never seen them used in a way
that integrates well with the requirements and workflow of a developer
or scientist)


More information about the Distutils-SIG mailing list