[Catalog-sig] A modest proposal for securing PyPI with TUF

Daniel Holth dholth at gmail.com
Thu Mar 14 02:19:12 CET 2013

On Wed, Mar 13, 2013 at 8:11 PM, Trishank Karthik Kuppusamy
<tk47 at students.poly.edu> wrote:
> On 03/13/2013 02:15 PM, Daniel Holth wrote:
>> With all the different kinds of metadata, It's interesting to note
>> that currently TUF seems to only be concerned with the available file
>> names and their integrity. (Some of us will think of PEP 426
>> "PKG-INFO" first when we hear the word metadata.)
> Yes, you are right that the many different kinds of metadata in this
> discussion (TUF metadata, PyPI metadata) makes things a little confusing
> sometimes! :))
> My understanding of PEP 426 is that the distribution metadata is specified
> by the developer with the setup.py script.
> To take the running Django example, since the Django developers will sign
> everything under the Django role with their own keys that the D role will
> talk about, setup.py, as well as the generated "PKG-INFO", will be signed by
> the Django developers. This means that pip + TUF will be able to verify
> these distribution metadata indirectly via the source distribution package.
> Does this answer your question?

Thanks, yes. The individual .tar.gz distributions do contain PKG-INFO
but we would eventually like to expose it in a more efficient way.
Then to be suitably paranoid you would also have to check that it
matched the package you downloaded! :(

Also note that on http://crate.io the simple index works the same way
as on pypi, except that the actual packages are on a different (CDN)



>> It looks like the D metadata lists all the filenames for Django, and
>> then Django lists them again with hashes and signatures. Why all the
>> lists? Does every Django release re-assert all the versions of Django
>> that are available on the index?
> Good observation. For D, you are talking about the "paths" attribute here:
> https://updateframework.com/pypi/repository/metadata/targets/packages/source/D.txt
> For Django, you are talking about the "targets" attribute here:
> https://updateframework.com/pypi/repository/metadata/targets/packages/source/D/Django.txt
> Why is "paths" in D listing all the "targets" that Django already talks
> about? Presently, this is because our target delegation tool (signercli.py)
> is being paranoid and making sure that D is explicitly delegating only
> targets matching these "paths".
> However, the TUF specification allows for D to simply say, "I delegate any
> target whatsoever under Django", by settings "paths" to
> "packages/source/D/Django/**":
> https://www.updateframework.com/browser/specs/tuf-spec.txt#L525
>> How might I deal with producing the official source distribution
>> myself and having a friend produce the official Windows build of a
>> package?
> There are a few solutions. You could have your friend produce the official
> Windows build for a package, and then you could sign it, implicitly trusting
> your friend but not publishing that trust.
> A more secure solution would have you delegate that target to your friend.
>> As an aside PyPI has been doubling in size every 1.5 - 2 years.
> Exponential growth strikes again! We have anticipated this, and we have a
> few solutions to curb the growth of TUF metadata. Since TUF metadata is
> simply text, GZIP compression would go a long way. Alternatively, we could
> implement delta updates of TUF metadata.
> The more difficult problem is how to ensure that target delegation structure
> scales with PyPI growth. A good design will keep this in mind and plan
> accordingly.
> Speaking of which, it may be the case that our design document for
> integrating PyPI with TUF may not be terribly easy to understand. (After
> all, you do need to understand TUF first, but TUF is fairly easy once you
> understand its main ideas.) I plan to publish a friendlier document which
> introduce TUF at a very high-level and instead discuss more pragmatic issues
> (such as workflows).

More information about the Catalog-SIG mailing list