[Python-ideas] Increasing public package discoverability (was: Adding jsonschema to the standard library)

Thu May 28 03:31:13 CEST 2015

Demian Brecht writes:

 > This is probably a silly idea, but given the above quote and the
 > new(er) focus on pip and distributed packages, has there been any
 > discussion around perhaps deprecating (and entirely removing from a
 > Python 4 release) non-builtin packages and modules?

Of course there has, including in parallel to your post.  It's a dead
obvious idea.  I'd point to threads, but none of the ones I remember
would be of great use; the same ideas and suggestions that were
advanced before have been reproduced here.

The problems are that the devil is in the details which are rarely
specified, and it would have a huge impact on relationships in the
community.  For example, in the context of a relatively short timed
release cycle, I do recall the debates mentioned by Nick over
corporate environments where "Python" (the CPython distribution) is
approved as a single package, so stdlib facilities are automatically
available to "Python" users, but other packages would need to be
approved on a package-by-package basis.  There's significant overhead
to each such application, so it is efficiency-increasing to have a
big stdlib in those environments.

OK, you say, so we automatically bundle the separate stdlib current at
a given point in time with the less frequently released Python core
distribution.  Now, in the Department of Devilsh Details, do those
"same core + new stdlib" bundles get the core version number, the
stdlib version number (which now must be different!) or a separate
bundle version number?  In the Bureau of Relationship Impacts, if I
were a fascist QA/security person, I would surely view that bundle as
a new release requiring a new iteration of the security vetting
process (relationship impact).  Maybe the departments doing such
vetting are not as fascist as I would be, but we'd have to find out,
wouldn't we?  If we just went ahead with this process and discovered
later that 80% of the people who were depending on the "Python"
package now cannot benefit from the bundling because the tarball
labelled "Python-X.Y" no longer is eternal, that would be sad.

And although that is the drag on a core/stdlib release cycle split
most often cited, I'm sure there are plenty of others.  Is it worth
the effort to try to discover and address all/most/some of those?
Which ones to address (and we don't know what problems might exist
yet!)?

 > I would think that if there was a system similar to Django Packages
 > that made discoverability/importing of packages as easy as using
 > those in the standard library, having a distributed package model
 > where bug fixes and releases could be done out of band with CPython
 > releases would likely more beneficial to the end users. If there
 > was a “recommended packages” framework, perhaps there could also be
 > buildbots put to testing interoperability of the recommended
 > package set.

I don't think either "recommended packages" or buildbots scales much
beyond Django (and I wonder whether buildbots would even scale to the
Django packages ecosystem).  But the Python ecosystem includes all of
Django already, plus NumPy, SciPy, Pandas, Twisted, Egenix's mx*
stuff, a dozen more or less popular ORMs, a similar number of web
frameworks more or less directly competing with Django itself, and all
the rest of the cast of thousands on PyPI.

At the present time, I think we need to accept that integration of a
system, even one that implements a single application, has a shallow
learning curve.  It takes quite a bit of time to become aware of needs
(my initial reaction was "json-schema in the stdlib? YAGNI!!"), and
some time and a bit of Google-foo to translate needs to search
keywords.  After that, the Googling goes rapidly -- that's a solved
problem, thank you very much DEC AltaVista.  Then you hit the multiple
implementations wall, and after recovering consciousness, you start
moving forward again slowly, evaluating alternatives and choosing one.

And that doesn't mean you're done, because those integration decisions
will not be set in stone.  Eg, for Mailman's 3.0 release, Barry
decided to swap out two mission-critical modules, the ORM and the REST
generator -- after the first beta was released!  Granted, Mailman 3.0
has had an extremely long release process, but the example remains
relevant -- such reevaluations occur in .2 or .9 releases all the
time.)  Except for Googling, none of these tasks are solved problems:
the system integrator has to go through the process over again each
time with a new system, or in an existing system when the relative
strengths of the chosen modules vs. alternatives change dramatically.
In this last case, it's true that choosing keywords is probably
trivial, and the alternative pruning goes faster, but retrofitting the
whole system to the new! improved! alternative!! module may be pretty
painful -- and there's not necessarily a guarantee it will succeed.

IMO, fiddling with the Python release and distribution is unlikely to
solve any of the above problems, and is likely to be a step backward
for some users.  Of course at some point we decide the benefits to
other users, the developers, and the release engineers outweigh the
costs to the users who don't like the change, but it's never a
no-brainer.