Hi all, Again trying to split out some more focused discussion from the big thread about sdists... One big theme there has been the problem of "sources of truth": e.g. in current sdists, there is a PKG-INFO file that has lots of static metadata in it, but because the "real" version of that metadata is in setup.py, everyone ignores PKG-INFO. A clear desideratum for a new sdist format is that we avoid this problem, by having static metadata that is actually trustworthy. I see two fundamentally different strategies that we might use to accomplish this. In time honored mailing list tradition, these are of course the one that I hear other people advocating and the one that I like ;-). The first strategy is: sdists and the wheels they generate logically share the same metadata; so, we need some mechanism to enforce that whatever static metadata is in the sdist will match the metadata in the resulting wheel. (The wheel might potentially have additional metadata beyond what is in the sdist, but anything that overlaps has to match.) An open question is what this mechanism will look like -- if everyone used distutils/setuptools, then we could write the code in distutils/setuptools so that when it generated wheel metadata, it always copied it directly out of the sdist metadata (when present). But not everyone will use distutils/setuptools, because distutils delenda est. So we need some mechanism to statically analyze an arbitrary build system and prove things about the data it outputs. Which sounds... undecideable. Or we could have some kind of after-the-fact enforcement mechanism, where tools like pip are required -- as the last step when building a wheel from an sdist -- to double-check that all the metadata matches, and if it doesn't then they produce a hard error and refuse to continue. But even this wouldn't necessarily guarantee that PyPI can trust the metadata, since PyPI is not going to run this enforcement mechanism... The second strategy is: put static metadata in both sdists and wheels, but treat them as logically distinct things: the static metadata in sdists is the source of truth for information *about that sdist* (sdist name, sdist version, sdist description, sdist authors, etc.), and the static metadata in wheels is the source of truth for information about that wheel, but we think of these as distinct things and don't pretend that we can statically guarantee that they will match. I mean, in practice, they basically always will match. But IMO making this distinction in our minds leads to clearer thinking. When PyPI needs to know the name/version/description for an sdist, it can still do that; and since we've lowered our ambitions to only finding the sdist name instead of the wheel name, it can actually do it reliably in a totally static way, without having to run arbitrary code to validate this. OTOH pip will always have to be prepared to handle the possibility of mismatch between what it was expecting based on the sdist metadata and what it actually got after building it, so we might as well acknowledge that in our mental model. One potential advantage of this approach is that we might be able to talk ourselves into trusting the existing PKG-INFO as providing static metadata about the sdist, and thus PyPI at least could start trusting it for things like the "description" field, and if we define a new sdist format then it would be possible to generate its static metadata from current setup.py files (e.g. by modifying setuptools's sdist command). Contrast this with the other approach, where getting any kind of static source-of-truth would require rewriting almost all existing setup.py files. The challenge, of course, is that there are a few places where pip actually does need to know something about wheels based on examining an sdist -- in particular name and version and (controversially) dependencies. But this can/should be addressed explicitly, e.g. by writing down a special rule about the name and version fields. -n -- Nathaniel J. Smith -- http://vorpus.org