Let me see if I can help clarify, so it's not just Donald who says so :-) It does feel as if we're trying to explain a lot of things that "everybody knows". Clearly not everybody knows, as you don't, but what we're trying to clarify here is the de facto realities of how sdists work, and how people expect them to work. Unfortunately, there's an awful lot of things in the packaging ecosystem that are defined by existing practice, and traditionally haven't been formally documented. I'm sure it feels as if we're just repeatedly saying "it has to be like that" - but in truth, it's more that what we're saying is the definition of a sdist, as established by existing practice. I wish we could point you to a formal definition of the requirements, but unfortunately they've never been written down. With luck, one of the outcomes here will be that someone will record what a sdist is - but we need to reflect current reality, and not end up reusing the term "sdist" to mean something different from what people currently use it for. On 4 October 2015 at 19:22, Nathaniel Smith <njs@pobox.com> wrote:
"a sdist is fundamentally different from a VCS checkout",
Specifically, a sdist is built by the packaging tools - at the moment, by "setup.py sdist", but in future by whatever tool(s) may replace distutils/setuptools. So a sdist has a defined format, and we can mandate certain things about it. In particular, we can require files to be present which are in tool-friendly formats, because the tools will build them. On the other hand, a VCS checkout is fundamentally built by a human, for use by humans. File formats need to be human-editable, we have to be prepared to work with constraints imposed by workflows and processes *other* than Python packaging tools. So we have much less ability to dictate the format. Your proposal mandates a single directory "owned" by the packaging ecosystem, which follows the git/hg/subversion model, so it's lightweight and low-risk. But you still cant realistically ask the user to maintain package data in (for example) a JSON file in that directory.
"there must be a 1-1 mapping between sdists and wheels",
The fundamental reason is one I know I've mentioned here before - pip implements "pip install <sdist>" by first building a wheel and then installing it. If a sdist generates two wheels, how will pip know which one to install? Also, users expect "pip wheel <sdist>" to produce the wheel corresponding to the sdist. You're proposing to change that expectation - the onus is on you to justify that change. You need to consider backward compatibility in the wider sense here too - right now, there *is* a one-to-one mapping between a sdist and a wheel. If you want to change that you need to justify it, it's not enough just to claim that no-one has come up with a persuasive argument to keep things as they are. Change is not a bad thing, and "because we've always done it that way" is not a good argument, but change needs to be justified even so.
"pip needs sdists that have full wheel metadata in static form"
I assume here you're now OK with the distinction between a sdist and a VCS checkout? If you still think we're saying that pip needs static metadata in *VCS checkouts* then please review the comments already made about the difference between a sdist and a VCS checkout. But basically, a sdist is a tool-generated archive that captures the state of the project and allows for *reproducible* builds of that project. If your understanding of what a sdist is differs from this, we need to stop and agree on terminology before going any further. I will concede that https://packaging.python.org/en/latest/glossary/ doesn't mention the point that a sdist needs to provide reproducible builds. But that's certainly how sdists are used at present, and how people expect them to work. Certainly, if I lost the wheel I'd built from a sdist, I'd expect to just rebuild it from the sdist and get the same wheel. Pip needs metadata to do dependency resolution. This includes project name, version, and dependency information. We could debate about whether *full* metadata is needed, but I'm not sure what the point is. Once you are recording the stuff that pip needs, why *not* record everything? There are other tools (and ad-hoc scripts) that would benefit from having the full metadata, so why would you make it harder for them to work? You claim that you want to keep your options open - but to me, it's more important to leave the *user's* options open. If we don't provide certain values, a user who needs that data has to propose a change to the format, wait for it to be implemented, and even then they can't rely on it until all projects move to the new format. Better to just require everything from the start, then users can get at whatever they need. As far as why the metadata should be static, the current sdist format does actually include static metadata, in the PKG-INFO file. So again we have a case where it's up to you to justify the backward compatibility break. But it's a little less clear-cut here, because you are proposing a new sdist format, so you've already argued for a break with the old format. Also the old format is not typically introspected, it's just used to unpack and run setup.py. So you can reasonably argue that the current state of affairs is irrelevant. However, we're talking here about whether the metadata should be statically available, or dynamically generated. The key point here is that dynamic metadata requires the tool (pip, my one-off script, whatever) to *run arbitrary code* in order to get the metadata. OK, with signing we can ensure that it's *trusted* code, but it still could do anything the project author wanted, and we can make no assumptions about what it does. That makes a tool's job much harder. A common bug report for pip is users finding that their installs fail, because setup.py requires numpy to be installed in order to run, and yet pip is running setup.py egg-info precisely to find out what the requirements are. We tell the user that the setup.py is written incorrectly, and they should install numpy and retry the install, but it's not a good user experience. And from a selfish point of view, users blame *pip* for the consequences of a project whose code to generate the metadata is buggy. Those bug reports are a drain on the time of the pip developers, as well as a frustrating experience for the users. If you want to argue that a VCS checkout, or development directory, needs to generate metatata dynamically, I won't argue. That's fine. But the sdist is a tool-generated snapshot of a *specific* release of a project (maybe "the release I made at 1:15 today for my test build", but still a specific build) and it should be perfectly possible to capture the dynamically generated metadata from the VCS checkout and store it in the sdist when it is built. If you feel that there is metadata that cannot be stored statically in the sdist, could you please give a specific example? But do remember that a sdist is intended as a *snapshot* of a VCS checkout that can be used to reproducibly build the project - so "the version number needs to include the time of the build" isn't a valid example. Paul