[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Wes Turner wes.turner at gmail.com
Sat Oct 3 01:58:41 CEST 2015


On Oct 2, 2015 5:53 PM, "Paul Moore" <p.f.moore at gmail.com> wrote:
>
> On 2 October 2015 at 23:15, Nathaniel Smith <njs at pobox.com> wrote:
> > "Project" is a pretty messy concept. Obviously in simple cases there's
> > a one-to-one mapping between project <-> wheel <-> importable package,
> > but this breaks down quickly in edge cases.
>
> I mistakenly used "project" in an attempt to avoid confusion resulting
> from me using the word "distribution" as a more general term than the
> way you were using "source distribution" or "binary distribution".
> Clearly I failed and made things more confusing.
>
> I use the term "distribution" in the sense used here
> https://packaging.python.org/en/latest/glossary/#term-distribution-package
.
> Note that this is in contrast to the terms "source distribution" and
> "binary distribution" or "built distribution" in the same page.
>
> Sorry for confusing things. I'll stick to the terminology as in the
> PUG glossary from now on.
>
> > Consider a project that provides builds multiple wheels out of the
> > same source tree. You obviously can't expect that all of these
> > packages will have the same dependencies.
>
> Correct. But a distribution can and should (I believe) have the same
> dependencies for all of the source and built distributions derived
> from it.
>
> > This situation is not common today for Python packages, but the only
> > reason for that is that distutils makes it really hard to do -- it's
> > extremely common in other package ecosystems, and the advantages are
> > obvious. E.g., maybe numpy.distutils should be split into a separately
> > installable package from numpy -- there's no technical reason that
> > this should mean we are now forced to move the code for it into its
> > own VCS repository.
>
> I'm lost here, I'm afraid. Could you rephrase this in terms of the
> definitions from the PUG glossary? It sounds to me like the VCS
> repository is the project, which contains multiple distributions. I
> don't see how that's particularly hard. Each distribution just has its
> own subdirectory (and setup.py) in the VCS repository...
>
> > (I assume that by "platform tags" you mean what PEP 426 calls
> > "environment markers".)
>
> Nope, I mean as defined in PEP 425. The platform tag is part of the
> compatibility tag. Maybe I meant the ABI tag, I don't really follow
> the distinctions.
>
> > Environment markers are really useful for extending the set of cases
> > that can be handled by a single architecture-dependent wheel. And
> > they're a good fit for that environment, given that wheels can't
> > contain arbitrary code.
> >
> > But they're certainly never going to be adequate to provide a single
> > static description of every possible build configuration of every
> > possible project. And installing an sdist already requires arbitrary
> > code execution, so it doesn't make sense to try to build some
> > elaborate system to avoid arbitrary code execution just for the
> > dependency specification.
> >
> > You're right that in a perfect future world numpy C API related
> > dependencies would be handling by some separate ABI-tracking mechanism
> > similar to how the CPython ABI is tracked, so here are some other
> > examples of why environment markers are inadequate:
> >
> > In the future it will almost certainly be possible to build numpy in
> > two different configurations: one where it expects to find BLAS inside
> > a wheel distributed for this purpose (e.g. this is necessary to
> > provide high-quality windows wheels), and one where it expects to find
> > BLAS installed on the system. This decision will *not* be tied to the
> > platform, but be selectable at build time. E.g., on OS X there is a
> > system-provided BLAS library, but it has some issues. So the default
> > wheels on PyPI will probably act like windows and depend on a
> > BLAS-package that we control, but there will also be individual users
> > who prefer to build numpy in the configuration where it uses the
> > system BLAS, so we definitely need to support both options on OS X.
> > Now the problem: There will never be a single environment marker that
> > you can stick into a wheel or sdist that says "we depend on the
> > 'pyblas' package if the system is OS X (ok) and the user set this flag
> > in this configuration file during the build process (wait wut)".
> >
> > Similarly, I think someone was saying in a discussion recently that
> > lxml supports being built either in a mode where it requires libxml be
> > available on the system, or else it can be statically linked. Even if
> > in the future we start having metadata that lets us describe
> > dependencies on external system libraries, it's never going to be the
> > case that we can put the *same* dependency metadata into wheels that
> > are built using these two configurations.
>
> This is precisely the very complex issue that's being discussed under
> the banner of extending compatibility tags in a way that gives a
> viable but practical way of distinguishing binary wheels. You can
> either see that as a discussion about "expanding compatibility tags"
> or "finding something better than compatibility tags". I don't have
> much of a stake in that discussion, as the current compatibility tags
> suit my needs fine, as a Windows user. The issues seem to be around
> Linux and possibly some of the complexities around binary dependencies
> for numerical libraries.
>
> But the key point here is that I see the solution for this as being
> about distinguishing the "right" wheel for the target environment.
> It's not about anything that should reach back to sdists. Maybe a
> solution will involve a PEP 426 metadata enhancement that adds
> metadata that's only valid in binary distributions and not in source
> distributions, but that's fine by me. But it won't replace the
> existing dependency data, which *is* valid at the sdist level.

this would be good to discuss here:
PEP 426: Define a JSON-LD context as part of the proposal #31
https://github.com/pypa/interoperability-peps/issues/31

>
> At least as far as I can see - I'm willing to be enlightened. But your
> argument seems to be that sdist-level dependency information should be
> omitted because more detailed ABI compatibility data *might* be needed
> at the wheel level for some packages. I don't agree with that - we
> still need the existing metadata, even if more might be required in
> specialist cases.

if sys.platform:
    extras_require =

>
>
> >> [1] If extras and environment markers don't cover the needs of
> >> scientific modules, we need some input into their design from the
> >> scientific community. But again, let's not throw away the work that's
> >> already done.
> >
> > As far as sdists go, you can either cover 90% of the cases by building
> > increasingly elaborate metadata formats, or you can cover 100% of the
> > cases by keeping things simple...

everything in one (composed) 'pydist.jsonld'

>
> But your argument seems to be that having metadata generated from
> package build code is "simpler". My strong opinion, based on what I've
> seen of the problems caused by having metadata in an "exectable
> setup.py", is that static metadata is far simpler.

static JSONLD is easily indexable with warehouse (Postgresql),
elasticsearch, triplestores

>
> I don't believe that the cost of changing to a new system can be
> justified *without* getting the benefits of static metadata.

external benefits of
canonical package URIs + schema.org/Action

>
> Paul
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151002/4e7036b3/attachment.html>


More information about the Distutils-SIG mailing list