[Python-Dev] Sumo

Stephen J. Turnbull stephen at xemacs.org
Thu May 27 17:56:33 CEST 2010


Paul Moore writes:
 > On 27 May 2010 00:11, geremy condra <debatem1 at gmail.com> wrote:
 > > I'm not clear, you seem to be arguing that there's a market for many
 > > augmented python distributions but not one. Why not just have one
 > > that includes the best from each domain?
 > 
 > Because that's "bloat". You later argue that a web designer wouldn't
 > care if his "distribution" included numpy. OK, maybe, but if my needs
 > are simply futures, cx_Oracle and pywin32, I *would* object to
 > downloading many megabytes of other stuff just to get those three.
 > It's a matter of degree.

So don't do that.  Go to PyPI and get just what you need.

The point of the sumo is that there are people and organizations with
more bandwidth/diskspace than brains (or to be more accurate, they
have enough bandwidth that optimizing bandwidth is a poor use of their
brains).

XEmacs has used a separate distribution for packages for over a
decade, and it's been a very popular feature.  Since originally all
packages were part of Emacs (and still are in GNU Emacs), the package
distribution is a single source hierarchy (like the Python stdlib).
So there are three ways of acquiring packages: check out the sources
and build and install them, download individual pre-built packages,
and download the sumo of all pre-built packages.  The sumos are very
popular.

The reason is simple.  A distribution of all Emacs packages ever made
would still probably be under 1GB.  This just isn't a lot of bandwidth
or disk space when people are schlepping around DVD images, even BD
images.  A Python sumo would probably be much bigger (multiple GB)
than XEmacs's (about 120MB, IIRC), but it would still be a negligible
amount of resources *for some people/organizations*.

And I have to support the "organizational constraints" argument here.
Several people have told me that (strangely enough, given its rather
random nature, both in what is provided and the quality) getting the
sumo certified by their organization was less trouble than getting
XEmacs itself certified, and neither was all that much more effort
than getting a single package certified.

Maintaining a sumo would be a significant effort.  The XEmacs rule is
that we allow someone to add a package to the distro if they promise
to maintain it for a couple years, or if we think it matters enough
that we'll accept the burden.  We're permissive enough that there are
at least 4 different MUAs in the distribution, several IRC clients,
two TeX modes, etc, etc.  Still, just maintaining contact with
"external maintainers" (who do go AWOL regularly), and dealing with
issues where somebody wants to upgrade (eg) "vcard" which is provided
by "gnus" but doesn't want to deal with "gnus", etc takes time,
thought, and sometimes improvement in the distribution
infrastructure.

It's not clear to me that Python users would benefit that much over
and above the current stdlib, which provides a huge amount of
functionality, of somewhat uneven but generally high quality.  But I
certainly think significant additional benefit would be gained, the
question is is it worth the effort?  It's worth discussing.

 > I don't believe that there's evidence that aggregation (except in the
 > context of specialist areas) does provide additional utility.

We'll just have to agree to disagree, then.  Plenty of evidence has
been provided; it just doesn't happen to apply to you.  Fine, but I
wish you'd make the "to me" part explicit, because I know that it does
apply to others, many of them, from their personal testimony, both
related to XEmacs and to Python.

 > PS One thing I haven't seen made clear - in my view, they hypothetical
 > "sumo" is a single aggregated distribution of Python
 > modules/packages/extensions. It would NOT include core Python and the
 > stdlib (in contrast to Enthought or ActivePython). I get the
 > impression that other people may be thinking in terms of a full Python
 > distribution, like those 2 cases. We probably ought to be clear which
 > we're talking about.

On the XEmacs model, it would not include core Python, but it would
include much of the stdlib.  The reason is that the stdlib makes
commitments to compatibility that the sumo would not need to.  So the
sumo might include (a) recent, relatively experimental versions of
stdlib packages (yes, this kind of duplication is a pain, but (some)
users do want it) and (b) packages which are formally separate but
duplicate functionality in the stdlib (eg, ipaddr and netaddr) -- in
some cases the sumo distro would want to make adjustments so they can
co-exist.

I wouldn't recommend building a production system on top of a sumo in
any case.  But (given resources to maintain multiple Python development
installations) it is a good environment for experimentation, because
not only batteries but screwdrivers and duct tape are supplied.



More information about the Python-Dev mailing list