File integrity checking and host blocking for EasyInstall
After thinking over the last week's distutils-sig discussion about security, signatures, etc., I think I have a plan for handling basic file integrity checking and (non-cryptographic) trust management for EasyInstall. It is not a high-security end-to-end solution, but I think it will allow security-conscious persons to take a more "locked down" approach if they want to, while providing everyone else with some baseline protection against corrupted files.
The first part of the plan is to add md5 digest checking to EasyInstall. Because one of EasyInstall's design goals is to make it easy for anybody to publish links to packages, we need to be able to include the md5 signature in a package's URL. I'm thinking we could achieve this via an '#md5=...' fragment identifier. For example, a setuptools source archive URL might be:
The advantage of this approach is that it allows anyone to assert what the md5 of the targeted file is, and it can be asserted in any web page, just by pointing an HREF at the file. EasyInstall could detect the '#md5=' marker, and then use this to verify the file during download.
The disadvantage, of course, is that PyPI doesn't currently support this; it creates a separate link to a page that displays the md5, and that URL doesn't contain anything that connects it back to the distribution file it refers to. I could probably create some kind of parsing hack to fix that for PyPI, but it seems it might be worth adding the #md5 trick to PyPI to support this.
EasyInstall would also need to grow a --require-md5 option, which would refuse to install anything from a Subversion checkout or a distribution without a known md5 signature.
In addition to md5 support in EasyInstall, I propose to also add it to ez_setup; there, however, the md5 values for various distributions will be hardcoded into ez_setup.py itself. (I'll make my "release" script append the md5 signatures for new distributions to the end of ez_setup.py.) In this way, the bootstrap installation of setuptools can also be reasonably secured, as long as you trust a particular version of ez_setup.py.
The next part of the plan would be to add an --allow-hosts option to EasyInstall. This would be a list of host wildcards that EasyInstall would be allowed to contact. For example, --allow-hosts=*.python.org would let EasyInstall download or scan pages from PyPI or www.python.org, but not anywhere else. The default, if not specified, would be '*', meaning that any host may be accessed. If EasyInstall finds itself about to download a page or distribution from a host that isn't allowed, it will abort with a message explaining the problem.
This would allow folks like Paul Moore to configure a default --allow-hosts list in their pydistutils.cfg, to prevent EasyInstall from downloading things from just any old place on the Internet. Once he's verified that he trusts a particular site, he can edit pydistutils.cfg and add it, or else manually download the blocked URL, publish it on a trusted intranet host, etc.
So, this is not a complete security solution, as it doesn't deal with end-to-end file integrity, and could easily be subverted by taking over a site somewhere in the middle (e.g. python.org). But until we have more of the cryptographic infrastructure in place, I think this plan could provide us with a good starting point. Comments, anyone?
Phillip J. Eby