[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Donald Stufft donald at stufft.io
Mon Oct 5 14:44:04 CEST 2015


On October 5, 2015 at 2:29:35 AM, Nathaniel Smith (njs at pobox.com) wrote:
> > Does that sound right?

Off the bat, I'm perfectly fine with `pip install archive-made-from-vcs.tar.gz`
working the same as `pip install git+https://github.com/foo/bar.git``. The fact
that one is a VCS and one isn't is immaterial to me that's just a transport
mechanism for an arbitrary collection of files and directories. These arbitrary
collections of file and directories can only provide a very minimal amount of
information about the thing that is contained inside of them, and indeed there
may even be multiple things inside of this rooted within different sub
directories of the top level arbitrary collection of files.

I agree with Paul though, I don't see any way this arbitrary collection of
files/directories can be distributed on PyPI. PyPI is not just a place to dump
whatever random archives you want for a package, it's specifically a place for
Python package files (sdists, wheels, and eggs right now for the most part).

It sounds like maybe you're just looking for a way to make it so that pip no
longer makes these "arbitrary files/directories" installs work by treating them 
like unpacked sdists and instead follows some supported path. If so, that is
reasonable to me (and something I planned on getting around to). If that is
what you're shooting for, I think it got confused by trying to mix in the sdist
concept, as our sdists and wheels are not really for human consumption any
more.

I don't think that it makes sense for pip to go directly from a VCS[1] to a
Wheel in the grand scheme of things. Right now we kind of do it, but that's
because we just treat them like an unpacked sdist [2], long term though I don't
think that is the correct way to do things. We (I?) want to minimize the
different installation "paths" that can be taken and the main variable then
when you do a ``pip install`` is how far long that path we already are. My
ideal path looks something like this:

    VCS -> Source Wheel [3] -> Wheel -> Installed
    \-> Inplace Installation [4]

So in that regards, I think that the only things (in my ideal world) people
should be uploading to PyPI are source wheels (mandatory? [5]) and binary
wheels (optional?). I don't know if downstream would prefer to use a source
wheel or an arbitrary collection of files that may or may not be in a tarball,
but I'm focused primarily on the needs of our toolchain, while still trying to
make sure we don't end up in a situation that hurts downstream as well.

I also don't think it makes sense for pip to ever really install these items,
no matter what the transport mechanism is (VCS, tarball, unpacked on disk)
without being explicitly pointed to by an URL or file path. There's obviously
some backwards compatability issues here, because we can't just stop fetching
.tar.gz links or anything, but I think the expectation should be that these
items are only ever directly installed, not installed as part of the dependency
resolution process. In that vein, they don't really participate in the
dependency resolution process either (we'll install their dependencies and what
not, but we'll assume that since you're pointing us to an explicit archive that
we aren't going to resolve that particular dependency to anything other than
what you've explicitly given us).

If we're only installing these items when explicitly being given them by a
direct URL or path, then a lot of the worries about non static metadata no
longer exist, because as a user you're explicitly opting into installing
something that isn't a real package, but is a VCS install of something that
could be a package. Which we'll fetch, turn into a source wheel, and then turn
that into a binary wheel (or do an inplace install).

I also don't think it makes sense for these VCS installs to directly support
outputting multiple different source wheels, but instead rely on the fact
pip lets you say "install this artifact, but first CD into a particular
directory". So you'd essentially just structure the filesystem of your VCS
install to make independent little mini projects that could be independently
packaged into their own VCS installs, but are all contained within some larger
thing and we need to install a specific sub directory. I'm happy to be
convinced otherwise, but given that this is a sort of edge case that a project
will need this and we already have the subdirectory support I think it's
simpler to just leverage that.

Given my desire to reduce the number of installation paths we actually support,
I think that trying to standardize this concept of a VCS install depends first
on standardizing the concept of a source wheel (or sdist 2.0). This is because,
in my mind, the main two things you can do with a VCS install, from pip's point
of view, is do an in place installation or create a source wheel.

I also agree with Paul that we should not be adding new formats to PyPI that
do not support static metadata. What metadata in specific makes sense for that
particular format (like ABI doesn't make sense for a source wheel/sdist) is
going to be format dependent, but there should never be a "run this bit of code
to find out X" situation. The closest we should get is "that type of metadata
doesn't make sense for X, you need to build X into a Y for it". PyPI needs
static metadata [6], and static metadata makes other tools easier to operate
as well.


[1] I'm going to just call these VCS installs, but I mean any install that is
    not a fully formed Python package and is instead just an arbitrary
    collection of files and directories where we can have a minimal amount of
    control over the structure.

[2] This goes back to my belief that one of the "original sins" of distutils
    and setuptools was blurring the lines between the different phases in the
    life cycle of a package.

[3] Unlike Paul, I think a Source Wheel is actually a decent name for this
    concept. It's similar to .rpm and .src.rpm in that world, and I think it
    makes it more obvious that this item isn't an installable item in it's own
    right, that it exists in order to produce binary wheels. However, this
    concept is currently being handled by the sdist "format", however
    ambigiously defined that currently is. I also think it makes it a bit
    easier to get rid of the ambigous "package", since we can just call them
    all "wheels" which is easier to say than "distribution".

[4] Even though I want to reduce the number of paths we take, I don't think
    we'll ever be able to reasonably get rid of the inplace installation path.
    There will probably need to be some back and forth about exactly which
    parts of the inplace install the installer should be responsible for and
    which parts the build tool is responsible for.

[5] Mandatory in the sense of, if you're going to upload any files to PyPI you
    must upload one of these, not mandatory in the absolute sense. In a future
    world, ideally the only thing people will need to upload is a source wheel
    and we'll have a build farm that will take that and automatically produce
    binary wheels from them.

[6] In reality, we don't have static metadata today, and we get by. This is
    mostly because we force uploads to include static metadata alongside the
    upload and we present that instead. In the future I want to move us to a
    situation where you *just* upload the file, and PyPI inspects the file for
    all of the metadata it needs.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA




More information about the Distutils-SIG mailing list