On Thu, Jun 1, 2017 at 5:34 AM, Donald Stufft <donald@stufft.io> wrote:

On Jun 1, 2017, at 3:44 AM, Paul Moore <p.f.moore@gmail.com> wrote:

On 1 June 2017 at 01:08, Donald Stufft <donald@stufft.io> wrote:
A sdist is a .tar.gz or a .zip file with a directory structure like (along
with whatever additional files the project needs in the sdist):

I'm confused. Isn't this basically what PEP 517 says already? You've
added some details and clarification, but that could just as easily be
done in a separate document/PEP. The details aren't needed for PEP 517

Yes, it’s basically what PEP 517 says already just more specific and detailed. I don’t know what more people want from “defining what an sdist is”, because that’s basically all an sdist is. I’ve always been of the opinion that PEP 517 is already defining (and then modifying) what an sdist is and I don’t know what more people would want.

PEP 517 needs to do it because PEP 517 wants to change the definition of what a sdist is, and you can’t really change the definition without in fact defining the new thing. I mean we could make a new PEP that just defines sdist (minus the pyproject.toml part) then make PEP 517 extend that PEP and add the pyproject.toml… but that seems kind of silly to me? Splitting it out into it’s own PEP gains us nothing and to me, feels like extra process for process’s sake.

PEP 518's pyproject.toml only specifies a single table, `build-system`, that matters. Can we just add a blurb to PEP 517 that says something to the effect of "If the following sub table exists, its location key can be used to pre-populate the metadata_directory of `get_wheel_metadata` automatically":

directory = some_dist_info_directory/

(pulled from the spec in 517 about what get_wheel_metadata is supposed to do)
Then we could default that directory to something obvious, like the aforementioned ./DIST-INFO or ./.dist-info, or whatever, because isn't such a directory expected to contain enough information to create a wheel anyway? Like {package-name and {version} via METADATA? And typically included in sdists already? If it has a SOURCE-RECORD file [new], then pip and friends can use that t
o know what files are needed for the build, and can use pyproject.toml (if it exists) for creating and/or updating it for later sdist generation. In the simple case, every normal file in a wheel is also in an sdist, verbatim, with no additional artifacts of any kind (pure python) and only additional metadata. The build doesn't care if things like LICENCE are in the tree. If there 
is no static SOURCE-RECORD, pip and friends fallback to a wholesale copy operation of the input source. The build backend's `get_wheel_metadata` (if defined) can update or backfill missing information within the METADATA file, and create the WHEEL file (or save that for `build_wheel`), if it finds the `metadata.directory` seeded from the static location referenced in pyproject.tom
l is incomplete.

In the end, the build frontend logic would look something like:

(also seems like `get_wheel_metadata` should maybe return the final .dist-info directory it decided on, or just settle on DIST-INFO and enough of this name-version.dist-info nonsense already... should possibly be a required build api function with the understanding `build_wheel` might update it)

* Is build-system.metadata.directory defined?
YES: copy to {metadata_directory}/DIST-INFO
NO: mkdir {metadata_directory}/DIST-INFO

* Does {metadata_directory}/DIST-INFO/SOURCE-RECORD exist?
YES: use that to isolate/prune/copy source tree for initial build, if desired, and also confirm hashes, if any
NO: do nothing

(we have something that might look like an sdist, but possibly incomplete [eg. still no METADATA])

* Is build-backend.MODULE.get_build_requires defined?
YES: make sure those things exist then
NO: do nothing

* Is build-backend.MODULE.get_wheel_metadata defined?
YES: call it like PEP 517 says, DIST-INFO is ready for updating
NO: do nothing

(we have something that might look like an sdist, but possibly incomplete [eg. still no METADATA])

* Is build-backend.MODULE.build_wheel defined?
YES: call it like PEP 517 says, replace RECORD with the final record from build?
NO: do nothing

* Is {metadata_directory}/DIST-INFO/* valid and the resultant whl as well?
YES: YAY! \o/

* Does {metadata_directory}/DIST-INFO/SOURCE-RECORD exist [must reference pyproject.toml! too]?
YES: use that to prune files when creating a proper sdist AFTER the build
NO: sdist is original source tree + {metadata_directory}/DIST-INFO - RECORD(?)

(we have enough information to produce an complete sdist that could be used to generate a valid wheel again)

Because the build itself can output additional source files, that may be desirable to include in an sdist later, I honestly don't think you can pass through a "proper" sdist before a wheel. I think you can 99% of the time do that, but some builds using Cython and friends could actually have a custom initial build that generates standard .h/.c/.py, and even outputs an alternative p
yproject.toml that *no longer needs* a custom build backend. Or just straight deletes it from SOURCE-RECORD once the custom build is done, because some artifacts are enough to rebuild a wheel next time. It seems to me the only possibly correct order is:

1. VCS checkout
2. partial sdist, but still likely an sdist, no promises!
3. wheel
4. proper sdist from generated SOURCE-RECORD, or updated static SOURCE-RECORD, or just original source tree + DIST-INFO

I don't see a way to get a 100% valid sdist without first building the project and effectively asking the build backend (via its SOURCE-RECORD, if any) "Well golly, you did a build! What, *from both the source tree and build artifacts*, is important for wrapping up into a redistributable?"

Maybe I'm overlooking something yuge (I've tried to follow this discussion, and have sort of checked out of python lately, but I'm fairly well-versed in packing lore and code), but in general I think we really are making sdists way way... way scarier than need be. They're pretty much whatever the build tells you is important for redistribution, at the end, with as much static meta
data as possible, to the point of possibly obviating their need for pyproject.toml in the first place... maybe this aspect is what is hanging everyone up? A redistibutable source does not need to be as flexible as the original VCS input. An sdist is pinned to a specific version of a project, whereas VCS represents all possible versions (albeit only one is checkout out), and sdists
 *are not* wheels! The same expectations need not apply. Two sdists of the same version might not be identical; one might request the custom build backed via pyproject.toml, and the other might have already done some of the steps for whatever reason. Authors must decide which is more appropriate for sharing.

This ended up longer than I meant, but hopefully it's not all noise.


C Anthony