On Thu, Jun 1, 2017 at 1:22 PM, Donald Stufft <donald@stufft.io> wrote:

On Jun 1, 2017, at 2:12 PM, C Anthony Risinger <c@anthonyrisinger.com> wrote:

Because the build itself can output additional source files, that may be desirable to include in an sdist later, I honestly don't think you can pass through a "proper" sdist before a wheel. I think you can 99% of the time do that, but some builds using Cython and friends could actually have a custom initial build that generates standard .h/.c/.py, and even outputs an alternative p
yproject.toml that *no longer needs* a custom build backend. Or just straight deletes it from SOURCE-RECORD once the custom build is done, because some artifacts are enough to rebuild a wheel next time. It seems to me the only possibly correct order is:

1. VCS checkout
2. partial sdist, but still likely an sdist, no promises!
3. wheel
4. proper sdist from generated SOURCE-RECORD, or updated static SOURCE-RECORD, or just original source tree + DIST-INFO

I don't see a way to get a 100% valid sdist without first building the project and effectively asking the build backend (via its SOURCE-RECORD, if any) "Well golly, you did a build! What, *from both the source tree and build artifacts*, is important for wrapping up into a redistributable?"

Maybe I'm overlooking something yuge (I've tried to follow this discussion, and have sort of checked out of python lately, but I'm fairly well-versed in packing lore and code), but in general I think we really are making sdists way way... way scarier than need be. They're pretty much whatever the build tells you is important for redistribution, at the end, with as much static meta
data as possible, to the point of possibly obviating their need for pyproject.toml in the first place... maybe this aspect is what is hanging everyone up? A redistibutable source does not need to be as flexible as the original VCS input. An sdist is pinned to a specific version of a project, whereas VCS represents all possible versions (albeit only one is checkout out), and sdists
 *are not* wheels! The same expectations need not apply. Two sdists of the same version might not be identical; one might request the custom build backed via pyproject.toml, and the other might have already done some of the steps for whatever reason. Authors must decide which is more appropriate for sharing.


Do any projects build a copy of the library and use that for influencing what gets copied into the sdist today? As far as I am aware there is not any that do that. I think the standard thing to do in Cython is to produce the .c files as part of the sdist process, which is perfectly fine to do. With the newer PEPs it doesn’t _need_ to do that, since you can depend on Cython in your build steps and just ship the pyx files (although you’re still free to compute the .c files AOT and include them in the sdist).

I admit I don't know of any either. Nor did I know the standard expectation in Cython was to generate things in the sdist phase. I have not personally used it for a project and was only using it as an example of a process that produces potentially redistributable artifacts.

What you are saying though I think only emphasizes previous comments about needing to pin down, once and for all, what it means to "be an sdist" (which I agree with), right now, and decide who is responsible for constructing that. We don't need to go on a rampage pronouncing new formats or files requirements like LICENCE... just state the status-quo and accept it.

Maybe we can start with simply defining an sdist as "some tree with a DIST-INFO". I'll avoid the package-name.dist-info for now and the problems therein, unless there is a simple consensus there.

From that, there seems to only be a small gap in the build api hooks, and a missing "[sdist-system]" phase (maybe that doesn't sound as nice as build-system) that I believe would be a small PEP or additional to 517. In all honesty, I think probably *both* the sdist-system and the build-system need to be consulted to fully populate DIST-INFO... this can be the `build_sdist` hook in both. In all honestly, as you have clarified, Cython is *not* a build-system! It's an sdist-system. The C compiler is the expected build-system. For many build-systems, `build_sdist` might even be a noop, but it might still want the opportunity to make adjustments before starting.

It seems reasonable to me that both systems simply begin from a static DIST-INFO, if any, then work together to populate and update it. Something like setuptools_scm, as a sdist-system, might only generate a barebones METADATA with name and version info, then the selected build-system comes in and fills the rest. Or something like the Cython sdist-system might generate more source files and not even touch DIST-INFO, then the selected build-system comes in and fills the rest. Neither have anything to do with "the build".

Something like Cython is effectively doing a partial build before producing a redistributable source tree, and if we skip that step and go straight to a build via the build-system, then the only real option for sdists at that point is to ask the same backend, post-build, for the important redistibutable parts, which may or may not reflect a stable/reproducible input to the same system.

If I select "Cython" as the build-system, but I need to use a different compiler for currently-unsupported-platform-X, I'm going to have a bad day.

--

C Anthony