[Distutils] Malicious packages on PyPI

Donald Stufft donald at stufft.io
Fri Jun 2 13:43:57 EDT 2017

> On Jun 2, 2017, at 12:04 AM, Nick Timkovich <prometheus235 at gmail.com> wrote:
> I suggested on one of those issues to try to auto-blacklist common 404s as that should pose a negligible usability hit. I'd like to start by logging them to collect data, but I'm confused nowadays as to if that should go into pypa/warehouse or pypa/pypi-legacy. How long until warehouse is where most requests go, or do some go there right now, but from which clients...so confuz, plz halp.

The easiest thing to do would probably be to add this to linehaul (pypa/linehaul on GH). That’s the daemon the processes log lines from Fastly. It’d need some way to dispatch between successful downloads, and a 404 on /simple/<foo>/ (possibly just have it bind to two ports one for downloads one for 404s? I dunno) and it’d need to store the records somewhere (possibly it’d make sense just to make a second set of BigQuery tables, these ones not public and likely expiring after a period of time). There would need to be a little bit of VCL work to actually get Fastly sending those log streams, but that’s pretty easy once the work to ingest them is done.

That would get us to the point we can start collecting data and storing it. The next step would be to start processing that data to implement a black list, which would require work to be done in both Warehouse and legacy PyPI. Warehouse you’d want to implement the thing that actually periodically processes the big query data to generate the block list, and then in both Warehouse and Legacy PyPI you’d want to implement the block list support in the upload/register routines.

Donald Stufft

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170602/67ecd26f/attachment-0001.html>

More information about the Distutils-SIG mailing list