I think investigating these options is a great idea, especially if you think about it as allowing the install script to not have to be updated every single time numpy/matplotlib updates their version. We can let the install script stay more stable on the several month timescale and rely on other packaging to work out if people want up to the minute installs of some package.

As for conda vs canopy, I think we should experiment with one first, and polish it off rather than attempting both at the same time.  Since Nathan has more experience with conda than I do with canopy, I'd be +1 on going down the conda route first.


On Thu, Aug 29, 2013 at 12:50 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
I think both canopy and anaconda are good solutions that will create a useful python environment out of the box with a minimum of fuss on a wide variety of platforms.

I have more experience with anaconda so my preference for our main installation avenue is there. A concern with canopy is the somewhat arbitrary choice of packages in the free (as in beer) and non-free distributions.

Anaconda has a very liberal redistribution license (http://docs.continuum.io/anaconda/eula.html).  We could conceivably distribute it ourselves on yt-project.org.  I know Matt is also looking into a miniconda (http://repo.continuum.io/miniconda/) based deployment, which should work well.  I think we stand a good chance of being included with the standard anaconda distribution or via conda.

Continuum has also set up binstar to allow projects to upload packaged versions of software which is then available via conda.  One issue with binstar is that since we have C dependencies, we'll need buildbots to create packages for all supported OS/python combos (i.e. linux/2.7, OSX/2.7, linux/3.3 OSX/3.3).

Making the install script more lightweight is a good idea, although we'll still be susceptible to external changes in our hard dependencies. Numpy and matplotlib have historically been tough to keep up with.

On Thu, Aug 29, 2013 at 12:26 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi all,

We need to figure out yt packaging.  This is becoming increasingly
hard, particularly as the number of dependencies grows.  (The upgrade
to IPython 1.0 and Matplotlib 1.3.0 has caused several issues, which
spurred this discussion.)

As it stands, we mainly provide yt through the install script.  Every
time a new version comes out, we check compatibility, we update the
install script, and we deploy that.  Unfortunately, as packages evolve
externally to yt, this results in occasional breakages, new (implicit)
dependencies, and complexity that goes super-exponentially.  I like
the install script, and it is what I use, but I think we need to
re-strategize.  It was built many years ago when packaging was a
different landscape, and when we needed a way to get a relatively
small number of dependencies onto a relatively small set of system

Every day, it seems, brings another problem with the install script.
Not all of these are our fault.  But more importantly, I don't think
we should be spending our time on them, when we can only bandaid
something for so long before it's not workable.

That being said, installation is the single biggest impediment to
people using yt, so we need to ensure it is still easy and simple.

There are a few options for other installation procedures.  I would
like to retain a stripped down version of the install script for ease
and simplicity, but removing many of the optional installs and
focusing instead on the core packages.

So here are the options.  I'd prefer we choose *one* as the primary
method, and then we (potentially) demonstrate how to use the others.
As a note, part of this process will also be the relicensing as BSD
and shoring up our source-based installations, ensuring that they are
correctly packaged, following best-practices guidelines for Python
source.  I believe I may have dropped the ball somewhat on that front.

 * Conda / Anaconda: This package manager is gaining traction, and I
think that once relicensing is done we stand a good chance of being
included in the base install.  This would mean that someone could
download Conda and just use it.  Even without that inclusion, however,
I've heard good things.  Conda is based on binary distributions, but
we could also manage our own packaging (potentially in an automated
way) and update with some frequency.  Conda is also somewhat tied to
the Wakari platform, and being part of Conda would mean being
available on the IPython-in-the-cloud that is Wakari.  I believe this
works well on supers.
 * Canopy: This is the Enthought package manager, which Sam has had
some good experience with it.  I do not have a feeling for how it
works on supers.
 * Source-only: This is the way some packages are managed, but it is
essentially giving up, and while I think it is a good way to go
forward, I'm not sure we'll ever be trivially pip-installable.
 * Keep trying to plug holes as they come up in the install script.

What I think would be very productive is to hear people's experiences
with these package managers.  Sam, Nathan, anybody?

Focusing on a platform-specific manager (brew, macports, apt, rpm) is
a non-starter; they are good options, and we should develop a protocol
for supporting platform-specific packaging systems, but they
bottleneck quite seriously on person-time and we should think
carefully before we tie ourselves to one.


PS The period in the subject line was editorial.  I'd very much like
to settle on a path for all of this stuff; packaging remains one of
the hardest issues in scientific python, as Software Carpentry has
noted time and again.  We're now pushing the install script, which is
great for clusters, but it's a remnant of a time before packaging in
Python was as mature as it is now, and before we had as many corner
cases as we do now -- not because they didn't exist, but because we
didn't have enough users to see them.
yt-dev mailing list

yt-dev mailing list