I am a research programmer at the NYU School of Engineering. My colleagues
(Trishank Kuppusamy and Justin Cappos) and I are requesting community
feedback on our proposal, "Surviving a Compromise of PyPI." The two-stage
proposal can be reviewed online at:
Summary of the Proposal:
"Surviving a Compromise of PyPI" proposes how the Python Package Index
(PyPI) can be amended to better protect end users from altered or malicious
packages, and to minimize the extent of PyPI compromises against affected
users. The proposed integration allows package managers such as pip to be
more secure against various types of security attacks on PyPI and defend
end users from attackers responding to package requests. Specifically,
these PEPs describe how PyPI processes should be adapted to generate and
incorporate repository metadata, which are signed text files that describe
the packages and metadata available on PyPI. Package managers request
(along with the packages) the metadata on PyPI to verify the authenticity
of packages before they are installed. The changes to PyPI and tools will
be minimal by leveraging a library, The Update Framework
<https://github.com/theupdateframework/tuf>, that generates and
transparently validates the relevant metadata.
The first stage of the proposal (PEP 458
<http://legacy.python.org/dev/peps/pep-0458/>) uses a basic security model
that supports verification of PyPI packages signed with cryptographic keys
stored on PyPI, requires no action from developers and end users, and
protects against malicious CDNs and public mirrors. To support continuous
delivery of uploaded packages, PyPI administrators sign for uploaded
packages with an online key stored on PyPI infrastructure. This level of
security prevents packages from being accidentally or deliberately tampered
with by a mirror or a CDN because the mirror or CDN will not have any of
the keys required to sign for projects.
The second stage of the proposal (PEP 480
<http://legacy.python.org/dev/peps/pep-0480/>) is an extension to the basic
security model (discussed in PEP 458) that supports end-to-end verification
of signed packages. End-to-end signing allows both PyPI and developers to
sign for the packages that are downloaded by end users. If the PyPI
infrastructure were to be compromised, attackers would be unable to serve
malicious versions of these packages without access to the project's
developer key. As in PEP 458, no additional action is required by end
users. However, PyPI administrators will need to periodically (perhaps
every few months) sign metadata with an offline key. PEP 480 also proposes
an easy-to-use key management solution for developers, how to interface
with a potential build farm on PyPI infrastructure, and discusses the
security benefits of end-to-end signing. The second stage of the proposal
simultaneously supports real-time project registration and developer
signatures, and when configured to maximize security on PyPI, less than 1%
of end users will be at risk even if an attacker controls PyPI and goes
undetected for a month.
We thank Nick Coghlan and Donald Stufft for their valuable contributions,
and Giovanni Bajo and Anatoly Techtonik for their feedback.
PEP 458 & 480 authors.
As a new Twine maintainer I've been running into questions like:
* Now that Warehouse doesn't use "register" anymore, can we deprecate it from distutils, setuptools, and twine? Are any other package indexes or upload tools using it? https://github.com/pypa/twine/issues/311
* It would be nice if Twine could depend on a package index providing an HTTP 201 response in response to a successful upload, and fail on 200 (a response some non-package-index servers will give to an arbitrary POST request).
I do not see specifications to guide me here, e.g., in the official guidance on hosting one's own package index https://packaging.python.org/guides/hosting-your-own-index/ . PEP 301 was long enough ago that it's due an update, and PEP 503 only concerns browsing and download, not upload.
I suggest that I write a PEP specifying an API for uploading to a Python package index. This PEP would partially supersede PEP 301 and would document the Warehouse reference implementation. I would write it in collaboration with the Warehouse maintainers who will develop the reference implementation per pypa/warehouse/issues/284 and maybe add a header referring to compliance with this new standard. And I would consult with the maintainers of packaging and distribution tools such as zest.releaser, flit, poetry, devpi, pypiserver, etc.
Per Nick Coghlan's formulation, my specific goal here would be close to:
> Documenting what the current upload API between twine & warehouse actually is, similar to the way PEP 503 focused on describing the status quo, without making any changes to it. That way, other servers (like devpi) and other upload clients have the info they need to help ensure interoperability.
Since Warehouse is trying to redo its various APIs in the next several months, I think it might be more useful to document and work with the new upload API, but I'm open to feedback on this.
After a little conversation here on distutils-sig, I believe my steps would be:
1. start a very early PEP draft with lots of To Be Determined blanks, submit as a PR to the python/peps repo, and share it with distutils-sig
2. ping maintainers of related tools
3. discuss with others at the packaging sprints https://wiki.python.org/psf/PackagingSprints next week
4. revise and get consensus, preferably mostly on this list
5. finalize PEP and get PEP accepted by BDFL-Delegate
6. coordinate with PyPA, maintainers of `distutils`, maintainers of packaging and distribution tools, and documentation maintainers to implement PEP compliance
Thoughts are welcome. I originally posted this at https://github.com/pypa/packaging-problems/issues/128 .
Making download stats available through BigQuery seems like a good idea,
but as it is currently this seems a bit expensive on a user's end. For
example, I just looked at stats for a package of mine and 840GB are
reported to be processed (therefore billed) in one query.
Is table clustering used when building these tables ? If not, could we have
it at least on projects name ? (See the following about table clustering:
Bandersnatch is the static PyPI mirroring software following PEP381. It allows you to have a local full static copy of PyPI (no dynamic APIs).
We've finally cut 3.0 (with a .1 to change URL to GitHub on PyPI) which has fixed a few bugs and moved to asyncio thread with requests. My original plan was to depreciate XMLRPC and move to full asyncio code base, on top of aiohttp, but we didn't settle on a new API on PyPI to replace XMLRPC.
Please upgrade and test and let me know of any bugs and we'll roll out fixes.
Noted Changes from latest 2.2.1:
- Move to asyncio executors around request calls Fixes #81 (on BitBucket)
- Use platform.uname() to support Windows Fixes #19 (Windows still has a few more bugs until full support)
- Add bandersnatch verify subcommand to re-download + delete unneeded packages Fixes #8 + many follow on Issues during testing - Thanks - electricworry & tau3 for testing + fixes!
- Introduce much more Lint checks (black, isort, mypy) other than flake8 - Thanks @asottile
- Make tox run lint checks + print out test coverage
- Add whitelist + blacklist plugins - Thanks @dwighthubbard
- Add generated documentation - Thanks @dwighthubbard
- Move to requiring Python >= 3.6.1 Fixes #66
Cheers all who reported bugs and PRs - Keep them coming!
Regarding the Bloomberg packaging sprint on Oct. 27-28--
Are there ways that people can participate who won't be there in
person (e.g. in addition to the tracker)? I won't be there but can
probably dedicate some time to write and review pip patches.
Also, what PyPA members have signed up to participate?
I've created a website that is likely to be of interest to people here: Wheelodex <https://www.wheelodex.org>, a site for browsing the metadata of wheels on PyPI.
It allows you to find out what projects a wheel depends on, what other projects depend on a given project, what commands & other entry points a wheel defines, what files are in a wheel, etc. You can even search for wheels containing a given module or file, or browse a list of all commands & other entry points defined by wheels. There's also a basic API for getting wheel data as JSON: <https://www.wheelodex.org/json-api/>.
I'm open to suggestions on what else to do with the data. I'm also open to suggestions on how to make the interface look less sucky.
-- John Wodder
I am fairly sure if you give the PyPA that suggestion, they will just deflate at the thought of the workload. Besides, we already offer private repos for free, several ways ranging from devpi to python -m SimpleHTTPServer in a specially created directory.
From: Python-ideas <python-ideas-bounces+tritium-list=sdamon.com(a)python.org> On Behalf Of Nick Humrich
Sent: Wednesday, April 4, 2018 12:26 PM
Subject: [Python-ideas] Pypi private repo's
I am sure this has been discussed before, and this might not even be the best place for this discussion, but I just wanted to make sure this has been thought about.
What if pypi.org <http://pypi.org> supported private repos at a cost, similar to npm?
This would be able to help support the cost of pypi, and hopefully make it better/more reliable, thus in turn improving the python community.
If this discussion should happen somewhere else, let me know.
Surprisingly, the manylinux1 spec doesn't seem to include the zlib in the list of known-to-be-available libraries (are there GNU/Linux systems out there without a zlib installed?).
Since I'm assuming several packages already had a need for that, is there a recommended way to link in the zlib as part of a manylinux1 wheel? Would you recommend static linking with a private version, or dynamic linking?