[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Nathaniel Smith njs at pobox.com
Fri Oct 2 23:45:55 CEST 2015


On Fri, Oct 2, 2015 at 1:03 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 2 October 2015 at 20:02, Marcus Smith <qwcode at gmail.com> wrote:
>>> So wouldn't they then download the sdist, build a wheel as an
>>> intermediate, and then generate the .deb file?
>>
>> the new goal I think was to have standardized metadata immediately available
>> in an sdist, and get away from the model, that you had to run a build step,
>> before you had a metadata artifact.
>> so here,  you'd have to build a wheel (potentially one with binary
>> extensions) just to know what the metadata is? that doesn't sound right.
>
> I'm uncomfortable with the fact that the proposed sdist format has
> more or less no metadata of its own (even the filename format is only
> a recommendation) so (for example) if someone does "pip install
> foo==1.0" I don't see how pip can find a suitable sdist, if no wheel
> is available.

About the filename thing:

The reason that the draft makes the inclusion of package/version info
a SHOULD instead of a MUST is that regardless of what the spec says,
all decent installation tools are going to support doing things like

    curl https://github.com/numpy/numpy/archive/master.zip -O numpy-master.zip
    pip install numpy-master.zip

So we can either handle that by saying that "numpy-master.zip" is an
sdist, just not one that we would allow on PyPI (which is what the
current draft does), or we could handle it by saying that
numpy-master.zip is almost-but-not-quite an sdist, and handling it is
a commonly supported extension to the standard. Doesn't really matter
that much either way -- just a matter of terminology. Either way the
sdists are PyPI are obviously going to be named
<package>-<version>.<ext>.

For sdists that do have a name/version: it's not really crucial to the
proposal that name/version are only in the filename -- they could be
repeated inside the file as well. Given that the version number in
particular is something that usually would need to be computed at
sdist-build-time (from a __version__.py or whatever -- it's very
common that the source-of-truth for version numbers is not static),
then leaving it out of the static metadata is nice because it makes
sdist-building-code much simpler -- 90% of the time you could just
keep the static metadata file instead of having to rewrite it for each
sdist. But that's just an engineering trade-off, it's not crucial to
the concept.

> I would rather see an sdist format that can be introspected *without*
> running code or a build tool. Installers and packaging tools like pip
> need to be able to do that - one of the biggest issues with pip's
> current sdist handling is that it can't make any meaningful decisions
> before building at least the egg-info.

Another way to look at this is to say that pip's current handling is
proof that the build-to-get-metadata strategy is viable :-). It would
indeed be nice if this weren't necessary, but the python packaging
ecosystem has a long history of trying to make simplifying assumptions
that turn out to bite us later... I think this is one of those.

Note that for making installation decisions, name + version aren't
enough: you also need full dependency information. And dependency
information is definitely not fixed at sdist-creation-time.

> Ultimately the question for me is at what point do we require
> packaging tools like pip (and ad-hoc distribution analysis scripts - I
> write a lot of those!) to run code from the package in order to
> continue? I'd like to be able to know, at a minimum, the package name
> and version, as those are needed to make decisions on whether there is
> a conflict with an already installed version.
>
> Basically, why aren't you using a PEP 426-compatible metadata format
> in the sdist (there's no reason not to, you just have to mandate that
> tools used to build sdists generate that form of metadata file)? You
> could also say that source trees SHOULD store metadata in the
> _pypackage directory (in an appropriate defined format, maybe one more
> suited to human editing than JSON) and tools that work on source trees
> (build tools, things that create sdists) SHOULD use that data as the
> definitive source of metadata, rather than using their own
> configuration. I don't see a problem with allowing source trees to
> have some flexibility, but sdists are tool-generated and so could
> easily be required to contain static metadata in a standard format.
>
>>> Is there another proposal I'm unaware for the sdist -> wheel step that is
>>> build tool-agnostic?
>>
>> PEP426 talks about it some
>> https://www.python.org/dev/peps/pep-0426/#metabuild-system
>
> While the metabuild system is a good long-term goal, I'll take
> something that's being developed now over a great idea that no-one has
> time to work on... Wheel came about because Daniel just got on with
> it.
>
> Having said that, I very strongly prefer a sdist proposal that's
> compatible with PEP 426 (at least to the extent that tools like wheel
> already support it). Throwing away all of the work already done on PEP
> 426 doesn't seem like a good plan.

Nothing is being thrown away -- the proposal is just that sdists and
wheels are different things, so we should think of PEP 426 as wheel
metadata, rather than all metadata.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org


More information about the Distutils-SIG mailing list