Re: [Python-Dev] SK-CSIRT identified malicious software libraries in the official Python package repository, PyPI

16 Sep 2017

      On 16 September 2017 at 07:08, Victor Stinner <victor.stinner@gmail.com> wrote:
...
Benjamin Bach and Hanno Böck are running
https://www.pytosquatting.org/ and registered many projects lilke
https://pypi.python.org/pypi/urllib2
"In June 2016, Typosquatting programming language package managers
stated that urllib2 had ~4,000 downloads in 2 weeks. The package name
is now squatted by us (the good guys). We take these findings
seriously."
It seems like we need a solution to prevent that a project removed
because it contains malicious code, can be recreated automatically.
The PyPI admins have the ability to blacklist project names (with one
of the simplest mechanisms being for admins to typosquat them
pre-emptively - while, as Jakub notes, that currently requires
uploading an actual sdist, it doesn't need to be an installable one),
so that isn't the hard part of the problem.

The hard parts are the ones that potentially require people to scale:

1. Noticing typosquatting in the first place
2. Evaluating and responding to notifications of malicious
namesquatting in a timely manner
3. Handling requests to use names that have been reserved by the admins

The first part can already be handled in a distributed fashion -
bandersnatch will give anyone a full mirror of PyPI in a few hours
(depending on their available network bandwidth), and going to
https://pypi.org/simple/ will give you a listing of all the registered
project names as a HTML file.

The second part is tricky, since there aren't currently any consistent
public markers for "benign" blacklisting. For example, if you go to
https://pypi.org/simple/pkg-resources/ (the normalised name for
"pkg_resources") you'll get a response, since that's a registered name
with no uploads (IIRC, it's reserved by Donald). However, if you go to
https://pypi.org/project/pkg-resources/ you get a 404 page, rather
than a notification that the name has been reserved by the PyPI
admins.

So that would probably be a useful enhancement - having "reserved by
the PyPI admins" as an explicit state, so 3rd party researchers can
more readily flag such cases as being as safe as we can reasonably
make them from a security perspective.

For the "review for malice" aspect, notifying the PyPI admins of all
close names isn't a particularly viable option as the default
behaviour, since they're either volunteers, or folks with partial time
grants from their employers. However, something that I think does have
the potential to scale reasonably well is to instead notify the
maintainers of projects with similar names:
https://github.com/pypa/warehouse/issues/2268

That way the initial review is distributed to the maintainers of
targeted packages, and then only the cases that those maintainers
consider to be potentially malicious would need to be escalated to the
PyPI admins.

The final aspect is one of the items considered in PEP 541:
https://www.python.org/dev/peps/pep-0541/#invalid-projects

Getting PEP 541 accepted will likely make it easier to start scaling
the ticket review team for PyPI (with both paid and volunteer
efforts), and that's currently with Donald for review as an initial
policy that we can start with, and then iterate on over time.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia