On May 30, 2017, at 6:34 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 30 May 2017 at 17:07, Donald Stufft <donald@stufft.io> wrote:
It can require that it is either unfiltered or an unpacked sdist since that is how a lot of build time projects treat it now. They handle a sdist differently from a vcs source, for example pbr. Swapping out a call to setup.py for an internal shim that calls a Python API doesn't change anything here, randomly filtering out some files from a vcs will break random projects. We ether only suppprt copying the whole directory or we add support for something that enables this in the PEP. There is basically no middle ground that isn't worse than one of those two options for PEP 517 style packages.

I also don't think that creating an sdist should be an optional part of the build interface, but things added in later PEPs can only be added as optional, not mandatory. There is already automation that relies on handling sdist- for example the Travis deployment to PyPI code path- that will be unable to support this new standard without either ONLY supporting wheels, or needing to add custom code for each individual build tool (unlikely to happen). The effect of which will be that either people simply can't use this spec without it or we incentivize releasing only wheels to PyPI instead of wheels and a sdist.

I don't see anyway really for this PEP to move forward without support for sdists without causing major regressions.

Is your concern that there's no explicit statement in the PEP saying
that build backends MUST NOT assume they will have access to version
control metadata directories in the source tree, since that source
tree may have come from an sdist rather than VCS checkout?

Aside from that possibility, I otherwise don't follow this chain of
reasoning, as I don't see how PEP 517 has any impact whatsoever on the
sdist build step.

My concern is that it’s literally impossible for the most common tooling outside of setuptools that acts at the build stage to function if we just randomly start filtering out files. For example let’s take a look at setuptools_scm, it assumes that one of two cases are true:

1) I am in a VCS checkout where I can run ``git`` commands to compute my version number, as well as what files should be added to the sdist, installed, etc.
2) I am in a sdist where the above information was “baked” into the sdist at sdist creation time and thus no longer requires access to the .git/ directory.

Those are the only two situations where it works. The “bad” case for performance reasons comes from the fact that a VCS checkout often times has A LOT of files that don’t need to be copied over typically, but that we do because when we attempted to filter things out previously it broke. These files can make the entire ``pip install .`` take over a minute on a slow hard drive. One of the biggest offenders is .tox/, but another big offender is .git. Another common offender is large chunks of demo data that doesn’t get added to the sdist.


Status quo:

- pip (et al) can use setup.py to build from an unfiltered source tree
- pip (et al) can use setup.py to build from an sdist
- creation of the sdist is up to the software publisher, so if pip or
another frontend wants to do an out of tree build, it copies the
entire unfiltered tree

Post PEP 517:

- pip (et al) can use setup.py to build from an unfiltered source tree
- pip (et al) can use setup.py to build from an sdist
- pip (et al) can use the pyproject.toml build-backend setting to
build from an unfiltered source tree
- pip (et al) can use the pyproject.toml build-backend setting to
build from an sdist
- creation of the sdist is still up to the software publisher, so if
pip or another frontend wants to do an out of tree build, it still
copies the entire unfiltered tree


Status quo is also that Travis CI, Gem Fury, etc can produce and upload a sdist using ``setup.py sdist``. Pip is not the only consumer of setup.py that needs to be able to operate on a VCS and do things with it. Ignoring this just means that we solve the problem of standardizing access for pip’s current use case, and tell these other use cases to go pound sand.



Post a to-be-written sdist-source-tree-export PEP:

- pip (et al) can use setup.py to build from an unfiltered source tree
- pip (et al) can use setup.py to build from an sdist
- pip (et al) can use the pyproject.toml build-backend setting to
build from an unfiltered source tree
- pip (et al) can use the pyproject.toml build-backend setting to
build from an sdist
- pip (et al) can use a new pyproject.toml setting defined in that PEP
("source-filter" perhaps?) to export a filtered version of a source
tree, otherwise they fall back on copying the entire unfiltered tree
(similar to the way build-backend falls back to setuptools & wheel if
not otherwise specified)

That approach would decouple the export backends from the build
backends, so we might even end up with a common VCS-aware source
exporter that projects could nominate (e.g. by adding this
functionality to twine), rather than every build backend having to
define its own source export logic.

Note that I'm also fine with pip as a project saying that it will only
ship support for the build-backend interface once the source filtering
interface is also defined and implemented.

I'm just saying that I don't see a close enough link between "here is
how to build this component from source" and "here is how to export a
filtered source tree for this component from an unfiltered VCS
checkout" for it to make sense to define them as part of the same
backend API. The only connection I'm aware of is that it makes sense
for projects to ensure that their source filtering when creating an
sdist isn't leaving out any files needed by their build process, but
that's no different from making sure that your build process produces
a wheel file that actually works when installed.


I don’t think there is any value in decoupling the generation of what goes into an sdist from the tool that builds them. If we did that, I suspect that 100% of the time the exact same tool is going to be used to handle both anyways (or people just won’t bother to put in the extra effort to produce sdists). I think trying to split them up serves only to make the entire toolchain harder and more complicated for people who aren’t stepped in packaging lore to figure out what goes where and what does what. The fact that we have different mechanisms just to control what goes into a sdist (MANIFEST.in) vs what gets installed (package_data) already confuses people, further splitting these two steps apart is only going to make that worse.

Keeping the two together also completely sidesteps the problems around “well what if only the sdist tool is defined but the build tool isn’t?” Or “what if only the build tool is defined but the sdist tool isn’t?”.

The only value I can even think of, is that some of the code is going to be re-usable, but we already have a perfectly serviceable way of allowing code re-use: publish a library and have end tools consume it. 



Donald Stufft