[AstroPy] Co-ordinating Python astronomy libraries?

Perry Greenfield perry at stsci.edu
Wed Jul 7 13:42:02 EDT 2010


On Jul 6, 2010, at 6:01 PM, Joe Harrington wrote:

> We've hit this topic (monolithic vs. Balkanized) several times over
> the past 5 years or so in various forms.  Tom, you have hit many of
> the main issues in your posting (licenses, maintenance, management
> preferences, etc.).  However, what you are contemplating doing at the
> end is unfortunately something that seems a disease: making Yet
> Another Specialized Python Distro(TM).  There are already too many
> Sages and EPDs, and take it from one trying to get SciPy documented,
> managing a monolithic entity is a nightmare down the road when some of
> the packages go off maintenance, yet everyone depends on them.
>
> What I think we want from the users' perspective is an ability to say,
> *in the terms of their system's native package manager*, any of:
>
> give me this/these specific package(s)
> give me this/these group(s) of packages
> give me all the packages
>
> Then, we can publish metapackages that depend on groups of add-ons by
> topic (file format readers, coordinates/WCS/time, measurement
> extraction, spectral modeling, orbits, planets, galaxies, stars, etc.)
> and one that depends on all of the group packages, and viola, you can
> have Sage, EPD, or whatever fall out as a trivial consequence.
>
I wish it were that simple. It isn't.

There is a reason why there are "too many" Sages and EPDs;  that's  
because there isn't a good alternate yet. In fact, packaging your own  
stack is what Guido himself has recommended as a reasonable solution  
to the distribution problem for significant applications:

http://www.archive.org/details/ucb_py4science_2009_11_04_Guido_van_Rossum
(about 90 minutes into the video is where Guido discusses this issue,  
and 95 minutes where he specifically recommends packaging Python with  
applications)

Even RPMs aren't sufficient for all since they require root, and we do  
see a significant number of users who don't have root. RPMs also don't  
solve dependency conflicts.

> What is critical here is something you did not mention in your
> posting, and that is solving the current difficulty of building a good
> Python stack from OS packages (e.g., debs), or even from package
> sources.  The problem is that some of the packages do not play well
> with others.  For example, HDF5 libraries were particularly
> problematic a year ago, and often compiled code complained because two
> different packages wanted different, specific versions of the same C
> or FORTRAN add-on library.  Some of the code plain didn't work as
> advertized, or at all.  In the end, it takes my very skilled system
> manager more than a week to do it, each time we do it, which is about
> once a year.
>
> The root of the problem is that there was no centralized build and
> test suite, nobody managing unified, integrated build testing and
> resolving the problems with the code maintainers.
>
I think that's only part of the problem, and the idealized solution  
you suggest is also part of the problem. It is important to realize  
that that the general dependency issue is a very difficult one (it has  
been shown to be NP-complete). So centralized at what level? All  
software in the world? Then the coordination, synchronization, and  
testing issues become impossible to manage. (A linux distribution is  
an illustration of what it takes to do that; it's a lot of work, and  
it means the software included is often well out of date as a result.)  
If not, where is the line drawn? Regardless of where you draw it, you  
will have people complain that it doesn't include what they need.

Sage, EPD, and other stacks are different sets of centralized software  
sets that do this centralized building and testing. This is the  
essence of what they provide. It is natural that there has to be   
different versions to satisfy different communities if the problem is  
impossible to do globally. Sage includes much that most astronomers  
don't need, and doesn't include much that they do need. EPD is closer  
to what we need and we'll take a look at it as a possible basis for a  
distribution.

> A year ago at SciPy'09, I pointed people to the build and test suite
> NSF requires for all their software projects.  I think a few people
> looked at it at the time.  Configured correctly (which takes work), it
> will build packages on umpteen different linuxes and a few other
> systems and package them in the native format.
>
We took a look at it after SciPy'09 and concluded that it wasn't  
suitable.

> I think that the extended SciPy stack as a whole should be organized
> around such a system, but there seems to be little taste for it among
> the developer-heavy SciPy leadership.  I think we can have a bit more
> practical vision, and at least for our own stuff (loosely defined) we
> should organize around principles that include these:
>
> - build and test everything frequently
> - manage the namespace so that all can play together
>  - we really need to get everyone to agree to this or there will be
>    conflicts, it's only a matter of time
> - produce LPUs (least packageable units) in binary format for all OSes
> - produce by topical meta-packages
> - produce one or more mega-packages that are equivalent to Sage or EPD
> - do it all for the native installers of all OSes
> - provide and enforce standards for docs and licenses
> - review the code
> - plan together and put out RFPs for needed codes
> - make sure procedures don't stifle innovators
> - document the whole, but lightly
> - provide a web community site for discussions, examples, reviews of  
> code
> - locate and manage it so that it is owned by the community and
>  survives long-term
>  - ensure all jobs doable by at least 3 people
>  - document procedures well
>  - have formalized community governance and leadership
>  - have a solid funding model
> - agree to hang together, even if you don't like something!
>
> With those (and perhaps other) goals in mind, we should then look at
> decisions like where/how to host it and what kind of wiki to use.
>
> Also, I keep thinking that this is best solved by joining forces with
> other scientific communities.  The build-and-test part is hard, but
> once implemented, it scales fantastically.  Again, all of SciPy should
> be doing this.  We should at least build so that if others want to
> join, they have a place to fit in.  This will need to be considered
> when doing package naming conventions.  Generic names should be
> avoided so that two can play in the same sandbox.
>
> Even if we don't do the full thing from the start, we should plan it
> out and build as though that's where we're eventually going.
>
> AAS splinter meeting, anyone?

There is much in this that is Good. Yet doing all this requires a  
great deal of work for which resources currently aren't available. We  
can make a lot of progress and still not accomplish most of these  
things. We should be careful to prioritize these activities to those  
that are most important with the resources available.

Perry




More information about the AstroPy mailing list