[Distutils] PEP 439 and pip bootstrap updated

Sat Jul 13 02:02:00 CEST 2013

On Jul 12, 2013, at 7:14 PM, Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:

> Donald Stufft <donald <at> stufft.io> writes:
> 
>> I could probably be convinced about something that makes handling versions
>> easier going into the standard lib, but that's about it.
> 
> That seems completely arbitrary to me. Why just versions? Why not, for
> example, support for the wheel format? Why not agreed metadata formats?

As I said in my email, because it's more or less standalone and it has the
greatest utility outside of installers/builders/archivers/indexes.

> 
>> There's a few reasons that I don't want these things added to the stdlib
>> themselves.
>> 
>> One of the major ones is that of "agility". We've seen with distutils how
>> impossible it can be to make improvements to the system. Now some of this
> 
> You say that, but setuptools, the poster child of packaging, improved quite
> a lot on distutils. I'm not convinced that it would have been as successful
> if there were no distutils in the stdlib, but of course you may disagree.

I've looked at many other languages where they had widely successful
packaging tools that weren't added to the standard lib until they were
ubiquitous and stable. Something the new tools for Python are not. So I
don't think adding it to the standard library is required.

And setuptools improved it *outside* of the standard library while distutils
itself stagnated. I would venture to guess that if distutils *hadn't* been in the
standard library than setuptools could have simply been patches to distutils
instead of needing to be essentially "replace" distutils and it just so happened
to reuse some it's functionality. So pointing towards setuptools just exposes
the fact that improving it in the standard library was hard enough that it was
done externally.

> 
> I'm well aware of the "the stdlib is where software goes to die" school of
> thought, and I have considerable sympathy for where it's coming from, but
> let's not throw the baby out with the bathwater. The agility argument could
> be made for lots of areas of functionality, to the point where you just
> basically never add anything new to the stdlib because you're worried about
> an inability to cope with change. Also, it doesn't seem right to point to
> particular parts of the stdlib which were hard to adapt to changing
> requirements and draw the conclusion that all software added to the stdlib
> would be equally hard to adapt. Of course one could look at a specific piece
> of software and assess its adaptability, but otherwise, isn't it verging on
> just arm-waving?

Well I am of the mind that the standard library is where software goes to die, and
I'm also of the mind that a smaller standard library and a strong packaging story
and ecosystem is far superior. But that's not what I'm advocating here. A key point
to almost every other part of the standard library is if stagnates or falls behind or
is unable to adapt then you simply don't use it. This is not a hard thing to do for
something like httplib, urllib2, urlib, etc because it's what people have *done* in
projects like requests. One persons choice to use url lib in his software has
little to no bearing on someone else who might choose to use requests.

However a packaging	system needs interoperability. My choice to use a particular
package software, if there is no interoperability, DRASTICALLY affects you if you
want to use my software at all. A huge thing i've been trying to push for is decoupling
packaging from a specific implementation so that we have a "protocol" (ala HTTP)
and not a "tool" (ala distutils). However the allure of working to the implementation
and not the standard is fairly high when there is a singular blessed implementation.

> 
>> is made better with the way the new system is being designed  with versioned
>> metadata but it doesn't completely go away. We can look at Python's past to
>> see just how long any individual version sticks around and we can assume that
>> if something gets added now that particular version will be around for a long
>> time.
> 
> That doesn't mean that overall improvements can't take place in the stdlib.
> For example, getopt -> optparse -> argparse.

It's funny you picked and example where improvements *couldn't* take place and
the entire system had to be thrown out and a new one written. getopt had to become a
new module named opt parse, which had to become a new module named argparse
in order to make changes to it. I don't think we need to have distutils, distlib, futurelib,
even-further-futurelib and I think that makes packaging even more confusing then it
needs to be. This also ties in with the above where one persons use of getopt instead
of argparse doesn't drastically affect another person using a different one.

> 
>> Another is because of how long it can take a new version of Python to become
>> "standard", especially in the 3.x series since the entire 3.x series itself
>> isn't standard, any changes made to the standard lib won't be usable for
>> years and years. This can be mitigated by releasing a backport on PyPI, but
>> if every version of Python but the latest one is going to require installing
>> these libs from PyPI in order to usefully interact with the "world", then you
>> might as well just require all versions of Python to install bits from PyPI.
> 
> Well, other approaches have been looked at - for example, accepting things
> into the stdlib but warning users about the provisional nature of some APIs.

Provisional API's still exist in that version of Python and the only way someone
would get a new one is by installing a package. I think that this makes the problem
even *worse* because now you're adding API's to the standard library that have
a good chance of needing to change and needing to require people to install
a package (with no good way to communicate to someone that they need to
update it since it's a standard library package and not a versioned installed package).

> 
> I think that where interoperability between different packaging tools is
> needed, that's where the argument for something in the stdlib is strongest,
> as Brett said.

You can gain interoperability in a few ways. One way is to just pick an implementation
and make that the standard. Another is to define *actual* standards. The second
one is harder, requires more thought and work. But it means that completely
different software can work together. It means that something written in Ruby
can easily work with a python package without shelling out to Python or without
trying to copy all the implementation details and having to guess which ones are
significant or not.

> 
>> Yet another is by blessing a particular implementation, that implementations
>> behaviors become the standard (indeed the way the PEP system generally works
>> for this is once it's been added to the standard lib the PEP is a historical
>> document and the documentation becomes the standard). However packaging is
> 
> That's because the PEP is needed to advocate the inclusion in the stdlib and
> as a record of the discussion and rationale for accepting/rejecting whatever
> was advocated, but there's no real benefit in keeping the PEP updated as the
> stdlib component gets refined from its real-world exposure through being in
> the stdlib.

And that's fine for a certain class of problems. It's not that useful for something
where you want interoperability outside of that tool. How terrible would it be if
HTTP was "well whatever Apache does, that's what HTTP is".

> 
>> not like Enums or urllibs, or smtp. We are essentially defining a protocol,
>> one that non Python tools will be expected to use (for Debian and RPMs for
>> example). We are using these PEPs more like a RFC than a proposal to include
>> something in the stdlib.
> 
> But we can assume that there will either be N different implementations of
> everything in the RFCs from the ground up, by N different tools, or ideally
> one canonical implementation in the stdlib that the tool makers can use (but
> are not forced to use if they don't want to). You might say that if there
> were some kick-ass implementation of these RFCs on PyPI people would just
> gravitate to it and the winner would be obvious, but I don't see things
> working like that. In the web space, look at HTTP Request/Response objects
> as an example: Pyramid, Werkzeug, Django all have their own, don't really
> interoperate in practice (though it was a goal of WSGI), and there's very
> little to choose between them technically. Just a fair amount of duplicated
> effort on something so low-level, which would have been better spent on
> truly differentiating features.

A singular blessed tool in the standard library incentivizes the standard becoming
and implementation detail. I *want* there to be multiple implementations written by
different people working on different "slices" of the problem. That incentivizes doing
the extra work on PEPs and other documents so that we maintain a highly documented
standard. It's true that adding something to the standard library doesn't rule that out
but it provides an incentive against properly doing standards because it's easier and
simpler to just change it in the implementation.

> 
>> There's also the case of usefulness. You mention some code that can parse the
>> JSON metadata and validate it. Weel assumingly we'll have the metadata for
>> 2.0 set down by the time 3.4 comes around. So sure 3.4 could have that, but
>> then maybe we release metadata 2.1 and now 3.4 can only parse _some_ of the
>> metadata. Maybe we release a metadata 3.0 and now it can't parse any
>> metadata. But even if it can parse the metadata what does it do with it? The
>> major places you'd be validating the metadata (other than merely consuming
>> it) is either on the tools that create packages or in PyPI performing checks
>> on a valid file upload. In the build tool case they are going to either need
>> to write their own code for actually creating the package or, more likely,
>> they'll reuse something like distlib. If those tools are already going to be
>> using a distlib-like library then we might as just keep the validation code
>> in there.
> 
> Is that some blessed-by-being-in-the-stdlib kind of library that everyone
> uses, or one of several balkanised versions a la HTTP Request / Response? If
> it's not somehow blessed, why should a particular packaging project use it,
> even if it's technically up to the job?

It's not blessed and a particular packaging project should use it if it fits their
needs and they want to use it. Or they shouldn't use it if they don't want.
Standards exist for a reason. So you can have multiple implementations that
all work together.

> 
>> Now the version parsing stuff which I said I could be convinced is slightly
>> different. It is really sort of it's own thing. It's not dependent on the
>> other pieces of packaging to be useful, and it's not versioned. It's also the
>> only bit that's really useful on it's own. People consuming the (future) PyPI
>> API could use it to fully depict the actual metadata so it's kind of like
>> JSON itself in that regard.
> 
> That's only because some effort has gone into looking at version
> comparisons, ordering, pre-/post-/dev-releases, etc. and considering the
> requirements in some detail. It looks OK now, but so did PEP 386 to many
> people who hadn't considered the ordering of dev versions of
> pre-/post-releases. Who's to say that some other issue won't come up that we
> haven't considered? It's not a reason for doing nothing.

I didn't make any claims as to it's stability or the amount of testing that went into
it. My ability to be convinced of that stems primarily from the fact that it's sort of
a side piece of the whole packaging infrastructure and toolchain and it's also
a piece that is most likely to be useful on it's own.

> 
>> The installer side of things the purist side of me doesn't like adding it to
>> the standard library for all the same reasons but the pragmatic side of me
>> wants it there because it enables fetching the other bits that are needed for
>> "pip install X" to be a reasonable official response to these kind of
>> questions. But I pushed for and still believe that if a prerequisite for
>> doing that involves "locking" in pip or any of it's dependencies by adding
>> them to the standard library then I am vehemently against doing it.
> 
> Nobody seems to be suggesting doing that, though.

I was (trying?) to explain that my belief doesn't extend to only distlib here and
instead to the entire toolchain.

> 
> Regards,
> 
> Vinay Sajip
> 
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130712/59821897/attachment.pgp>