[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Daniel Holth dholth at gmail.com
Mon Oct 5 15:25:14 CEST 2015


The OP asks for a Python callable interface to pip instead of setup.py's
command line interface. That could be accomplished now by figuring out all
of the arguments that pip will send to setup.py (setup.py egg_info and
setup.py bdist_wheel)?, and then by writing a setup.py emulator that
implements those commands by invoking the discovered callables, but it's
hard to know exactly what that command line interface is. It has been a
packaging TODO to write the list of arguments pip can send to setup.py
down...

Suppose someone from the pip team wrote the generic setup.py. It would
implement the command line interface, import a script called, say,
build.py, and invoke Python callables to do egg_info and bdist_wheel. Then
the flit author could implement a couple of functions instead of having to
reverse-engineer a command line interface. This would be an improvement
because build system authors do not know how to extend setuptools or
reverse engineer the necessary setup.py command line interface.

On Mon, Oct 5, 2015 at 7:28 AM Paul Moore <p.f.moore at gmail.com> wrote:

> OK, I've had a better read of your email now. Responses inline.
>
> On 5 October 2015 at 07:29, Nathaniel Smith <njs at pobox.com> wrote:
> > First, let's drop the word "sdist", it's confusing.
>
> We can't (see below for details). We can deprecate the sdist concept,
> if that's what you want to propose. From what I gather, you're
> proposing deprecating it in favour of a "source wheel" concept. I
> don't have a huge issue with that other than that I don't see the
> necessity - the sdist concept pretty much covers what you want, except
> maybe that it's not clear enough to people outside the packaging
> community how it differs from a VCS checkout.
>
> > I'm starting from the long history and conventions around how people make
> > what I'll call "source releases" (and in a few paragraphs will contrast
> with
> > "source wheels"). 'Everyone knows' that when you release a new version of
> > some package (in pretty much any language), then one key step is to put
> > together a file called <package>-<version>.<zip or .tar.gz>. And
> 'everyone
> > knows' that if you see a file that follows this naming convention, and
> you
> > download it, then what you'll find inside is: a single directory called
> > <package>-<version>/, and inside this directory will be something that's
> > almost like a VCS checkout -- it'll probably contain a README, source
> files
> > in convenient places to be edited or grepped, etc. The differences from a
> > VCS checkout (if any) will be little convenience stuff -- like
> ./autogen.sh
> > will have been run already, or there will be an extra file containing a
> > fixed version number instead of it being autodetected, or -DNDEBUG will
> be
> > in the default CFLAGS, or Cython files will have been pre-translated to
> C --
> > but fundamentally it will be similar to a VCS checkout, and switching
> back
> > and forth between them won't be too jarring. 95% of the time there will
> be a
> > standard way to build the thing ('./configure && make && make install',
> or
> > 'python setup.py install', or similar).
>
> Breaking at this point, because that's frankly *not* the reality in
> the Python packaging world (at least not nowadays - I'm not clear to
> what extent you're just talking about history and background here,
> although your reference to Cython makes me think you're talking in
> terms of current practice). It may look like that, but there are some
> fundamental differences.
>
> First and foremost, nobody zips up and publishes their VCS checkout in
> the way you describe. (At least not if they are using the standard
> tools - distutils and setuptools). Instead, they create a "sdist"
> using the "python setup.py sdist" command. I'm sorry, but I'm going to
> carry on using the "sdist" term here, because I'm describing current
> practice and sdists *are* current practice.
>
> The difference is between a sdist and what you call a "source release"
> is subtle, precisely because the current sdist format is a bit of a
> mess, but the key point is that all sdists are created by a standard
> process, and conform to a standard naming convention and layout. The
> packaging tools rely on being able to make that assumption, in all
> sorts of ways which we're doing our best to clarify as part of this
> thread, but which honestly have been a little bit implicit up to this
> point.
>
> Further muddying the water is the fact that as you say, pip needs to
> be able to build from a VCS checkout (a directory on the user's local
> system) and we have code in pip that does that - mostly by assuming
> that you can treat a VCS checkout as an unpacked sdist, but there are
> hacks we need to do to make that work (we run setup.py egg-info to get
> the metadata we need, for example, which has implications as we only
> get that data at a later stage than we have it in the sdist case) and
> differences in functionality (develop mode).
>
> At this point I'm not saying that things have to be this way, or even
> that "make a source release however you choose as long as it follows
> these conventions" isn't a viable way forward, but I do think we need
> to agree on our picture of how things are now, or we'll continue
> talking past each other.
>
> > And these kind of source releases
> > have a rich ecosystem around them and serve a wide range of uses: they
> > provide a low-tech archival record (while VCS's come and go), they end
> up in
> > deb and rpm "original source" bundles, they get downloaded by users and
> > built by hand (maybe with weird configury on top, like a hack to enable
> > cross-compilation) or poked around in by hand, etc. etc. When sdists were
> > originally designed, then "source releases" is what the designers were
> > thinking about.
>
> This, on the other hand, I suspect is not far from the truth. When
> sdists were designed, they were a convenience for bundling the stuff
> needed to do setup.py install later, possibly on a different machine.
>
> But that's a long time ago, and not really relevant now. For better or
> worse. Unless you are suggesting that we go all the way back to that
> original point? Which you may be, but that means discarding the work
> that's been done based on the sdist concept since then. Which leads
> nicely on to...
>
> > Then, easy_install came along, and pulled off a clever hack where when
> you
> > asked for some package by name, then it would try to automagically go out
> > and track down any relevant source releases and build them all. And it
> works
> > great, except when it doesn't. And creates massive headaches for everyone
> > trying to work on python packaging afterwards, because source releases
> were
> > not designed to be used this way.
> >
> > My hypothesis is that the requirements that were confusing me are based
> > around the idea that an sdist should be something designed to slot into
> this
> > particular use case: i.e., something that pip can automatically grab and
> > work with while solving a dependency resolution problem. Therefore it
> really
> > needs to have a static name, and static version number, and static
> > dependencies, and must produce exactly one binary wheel that shares all
> that
> > metadata.
>
> Anyone who knows my history will know that I'm the last person to
> defend setuptools' hacks, but you hit the nail on the head above. When
> it works, it works great (meh, "sufficiently well" :-))
>
> And pip *needs* to do static dependency resolution. We have enough bug
> reports and feature requests asking that we improve the dependency
> resolution process that nobody is going to be happy with anything that
> doesn't allow for at least as much static information as we currently
> have, and ultimately more.
>
> > Let's call this a "source wheel" -- what we're really looking for
> > here is a way to ship extension modules inside something that acts like
> an
> > architecture-neutral "none-any" wheel.
>
> I don't understand this statement. What do extension modules matter
> here? We need to be able to ship sources in a form that can
> participate in dependency resolution (and any other potential
> discovery processes that may turn up in future) without having to run
> code to do so. The reasons for this are:
>
> 1. Running code is a performance overhead, and possibly even a
> security risk (even trusted code may behave in ways you didn't
> anticipate). We want to do as little as possible of that as we can,
> and in particular we want to discard invalid candidate files without
> running any of their code.
> 2. Running code introduces the possibility of that code failing. We
> don't want end users to have installs fail because code in
> distributions we're going to discard is buggy.
> 3. Repositories like PyPI need to present project metadata, for both
> human and tool consumption - they can only do this if it's available
> statically.
>
> You seem to be thinking that binary wheels are sufficient for this for
> pure-Python code. Look at it the other way - we discard sdists from
> the dependency calculations whenever there's an equivalent binary
> wheel available. That's always for non-any wheels, but less often for
> architecture-dependent wheels. But in order to know that the wheel is
> equivalent, we need to match it with the sdist - so the sdist needs
> the metadata you're trying to argue against providing...
>
> > So: the email that started this thread was a proposal for how to
> standardize
> > the format of "source releases", and Donald's counter was a proposal for
> how
> > to standardize the format of "source wheels". Does that sound right?
>
> Well, essentially yes, although I read it as your original email being
> a proposal for a new format to replace sdists, and Donald's and my
> counter is that there's already a been a certain amount of thinking
> and design gone into how we move from the current ad-hoc sdist format
> to a better defined and specified "next version", so how does your
> proposal affect that?
>
> It seems that your answer is that you want to bypass that and offer an
> alternative. Is that fair?
>
> For the record, I don't like the term "source wheel" and would prefer
> to stick with "sdist" if appropriate, or choose a term that doesn't
> include the word "wheel" otherwise (as wheels seem to me to be
> strongly, and beneficially, linked in people's minds to the concept of
> a binary release format).
>
> > If so, then some follow-up thoughts:
> >
> > 1) If we design a source wheel format, then I am 100% in favor of the
> > suggestion of giving it a unique extension like "swhl". I'm still a young
> > whippersnapper compared to some, but I've been downloading files named
> > <package>-<version>.<zip or tar.gz> for 20 years, and AFAICR every one of
> > those files unpacked to make a single directory that was laid out like a
> VCS
> > checkout. Obviously we can change what goes inside, but we should change
> the
> > naming convention at the same time because otherwise we're just going to
> > confuse people.
>
> I have no problem with making it clear that "sdist version 2" or
> "source wheel" is not the same as a packed VCS checkout. I don't see
> the need for a new term, I'd be happy with "<package>-<version>.sdist"
> as the name. I'd also like to emphasize strongly that PyPI only hosts
> sdists, and *not* source releases - that source releases are typically
> only seen in Python in the form of a VCS checkout or development
> directory.
>
> (There's an implication there that we need to explore, that pip won't
> necessarily gain the ability to be pointed at a non-sdist format
> packed "source release" archive, and download it and process it.
> That's not a given, but I'd like to be sure we are happy with the
> potential re-introduction of confusion over the distinction between a
> sdist/source wheel and a source release that would result).
>
> > 2) I think there's a strong case to be made that Python actually needs
> > standards for *both* source releases and source wheels. There's
> certainly no
> > logical contradiction -- they're conceptually different things. It sounds
> > like we all agree that "pip" should continue to have a way to build and
> > install an arbitrary VCS checkout, and extending that standard to cover
> > building and installing a classic "source release" would be... almost
> > difficult *not* to do.
>
> As noted above, I'm happy for that discussion to occur. But I'm *not*
> sure the case is as strong as you think. Technically, it's certainly
> not too hard, but the social issues are what concern me. How will we
> explain to someone that they can't upload their file to PyPI because
> it's a "source release" not a "source wheel"? What is the implication
> on people's workflow? How would we explain why people might want to
> make "source releases" *at all*? Personally, I can only see a need in
> my personal experience for a VCS url that people can clone, a packaged
> source artifact that I can upload to PyPI for automatic consumption,
> and (binary) wheels. That second item is a "source wheel" - not a
> "source release".
>
> > And I think that there will continue to be a clear need for source
> releases
> > even in a world where source wheels exist, because of all those places
> where
> > source releases get used that aren't automagic-easy_install/pip-builds.
> For
> > example, most pure Python packages (the ones that already make "none-any"
> > wheels) have no need at all for source wheels, but they still need source
> > releases to serve as archival snapshots. And more complex packages that
> need
> > build-time configuration (e.g. numpy) will continue to require source
> > releases that can be configured to build wheels that have a variety of
> > different properties (e.g., different dependency metadata), so they can't
> > get by with source wheels alone -- but you can imagine that such projects
> > might reasonably *in addition* provide a source wheel that locks down the
> > same default configuration that gets used for their uploaded binary wheel
> > builds, and is designed for pip to use when trying to resolve
> dependencies
> > on platforms where a regular binary wheel is unavailable.
> >
> > Pictorially, this world would look like:
> >
> > VCS checkout -> source release
> >      \              \
> >       --------------------+--> in-place install
> >                           |
> >                           +--> wheels -> install
> >                           |
> >                           +--> source wheels -> wheels -> install
>
> I don't see the need for source releases that you do. That's likely
> because I don't deal with the sorts of complex projects you do,
> though, so I'm not dismissing the issue. As I say, my objections are
> mostly non-technical. I do think you should consider how to document
> "what a source release is intended to achieve" in a way that explains
> it to people who don't need the complexity it adds - and with the
> explicit goal of making sure that you dissuade people who *don't* need
> source releases from thinking they do.
>
> > 3) It sounds like we all agree that
> >   - 'pip install <VCS checkout>' should work
>
> Yes.
>
> >   - that there is some crucial metadata that VCS checkouts won't be able
> to
> > provide without running arbitrary code (e.g. dependencies and version
> > numbers)
>
> I'm still resisting this one, although I can live with "Nathanial
> tells me so" :-)
>
> >   - that what metadata they do provide (e.g., which arbitrary code to
> run)
> > should be specified in a human-friendly configuration file
>
> I don't agree to that one particularly, in the sense that I don't
> really care. I'd be happy with a system that said something like that
> for a VCS checkout, pip expects "setup.py egg-info" and "setup.py
> sdist" to work, and produce respectively a set of static metadata in a
> known location, and a properly formatted "source wheel"/sdist file.
> Non-distutils build tools can write a wrapper setup.py that works
> however they prefer. (That's roughly what we have now, BTW).
>
> > Given this, in the big picture it sounds like the only really essentially
> > controversial things about the original proposal might be:
> >   - that 'pip install <tarball of VCS checkout>' should work the same as
> > 'pip install <VCS checkout>' (does anyone actually disagree?)
>
> Yes, to the extent that I want to ensure it's clearly documented how,
> and why, this differs from a sdist/source wheel. But that's not a
> technical issue.
>
> >   - the 1 paragraph describing the "transitional plan" allowing pip to
> > automatically install from these source releases *as part of a dependency
> > resolution plan* (as opposed to when a VCS-checkout-or-equivalent is
> > explicitly given as an install target). Which honestly I don't like
> either,
> > it's just there as a backcompat measure so that these source releases
> don't
> > create a regression versus existing sdists -- note that one of the goals
> of
> > the design was that current sdists could be upgraded into this format by
> > dropping in a single static file (or by an install tool "virtually"
> dropping
> > in this file when it encounters an old-style sdist -- so you don't need
> to
> > keep around code to handle both cases separately).
>
> Given that these source releases won't be hosted on PyPI (as the
> proposal currently stands) there's no real need for this - all you
> need to say is that you can point pip at any old URL and take your
> chances :-)
>
> > Does that sound right?
>
> Not really. For me the big controversy is whether we move forward from
> where we are with sdists, or we ignore the current sdist mechanism and
> start over.
>
> A key further question, which I don't think has been stated explicitly
> until I started this email, is what formats will be supported for
> hosting on PyPI. I am against hosting formats that don't support
> static metadata, such as your "source distribution", as I don't see
> how PyPI would be able to publish the metadata if it weren't static.
>
> And following on from that, we need to agree whether the key formats
> should be required to have a static version. I'm OK with a VCS
> checkout having a dynamically generated version, that's part of the
> "all bets are off" contract over such things (if you don't generate a
> version that reflects every change, you get to deal with the
> consequences) but I don't think that's a reasonable thing to allow in
> "published" formats.
>
> > (Other features of the original proposal include stuff like the lack of
> > trivial metadata like "name" and "description", and the support for
> > generating multiple wheels from one directory. I am explicitly calling
> these
> > "inessential".)
>
> Not sure what you mean by lack of name/description being
> "inessential". The double negative confuses me. Do you mean you're OK
> with requiring them? Fair enough.
>
> For multiple wheels, I'd tend to consider the opposite to be true -
> it's not that the capability is non-essential, but rather that in
> published formats (source wheel and later in the chain) it's essential
> that one source generates one target.
>
> Paul
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151005/4003f1cf/attachment-0001.html>


More information about the Distutils-SIG mailing list