On 16 September 2017 at 07:08, Victor Stinner <victor.stinner@gmail.com> wrote:
Benjamin Bach and Hanno Böck are running https://www.pytosquatting.org/ and registered many projects lilke https://pypi.python.org/pypi/urllib2
"In June 2016, Typosquatting programming language package managers stated that urllib2 had ~4,000 downloads in 2 weeks. The package name is now squatted by us (the good guys). We take these findings seriously."
It seems like we need a solution to prevent that a project removed because it contains malicious code, can be recreated automatically.
The PyPI admins have the ability to blacklist project names (with one of the simplest mechanisms being for admins to typosquat them pre-emptively - while, as Jakub notes, that currently requires uploading an actual sdist, it doesn't need to be an installable one), so that isn't the hard part of the problem. The hard parts are the ones that potentially require people to scale: 1. Noticing typosquatting in the first place 2. Evaluating and responding to notifications of malicious namesquatting in a timely manner 3. Handling requests to use names that have been reserved by the admins The first part can already be handled in a distributed fashion - bandersnatch will give anyone a full mirror of PyPI in a few hours (depending on their available network bandwidth), and going to https://pypi.org/simple/ will give you a listing of all the registered project names as a HTML file. The second part is tricky, since there aren't currently any consistent public markers for "benign" blacklisting. For example, if you go to https://pypi.org/simple/pkg-resources/ (the normalised name for "pkg_resources") you'll get a response, since that's a registered name with no uploads (IIRC, it's reserved by Donald). However, if you go to https://pypi.org/project/pkg-resources/ you get a 404 page, rather than a notification that the name has been reserved by the PyPI admins. So that would probably be a useful enhancement - having "reserved by the PyPI admins" as an explicit state, so 3rd party researchers can more readily flag such cases as being as safe as we can reasonably make them from a security perspective. For the "review for malice" aspect, notifying the PyPI admins of all close names isn't a particularly viable option as the default behaviour, since they're either volunteers, or folks with partial time grants from their employers. However, something that I think does have the potential to scale reasonably well is to instead notify the maintainers of projects with similar names: https://github.com/pypa/warehouse/issues/2268 That way the initial review is distributed to the maintainers of targeted packages, and then only the cases that those maintainers consider to be potentially malicious would need to be escalated to the PyPI admins. The final aspect is one of the items considered in PEP 541: https://www.python.org/dev/peps/pep-0541/#invalid-projects Getting PEP 541 accepted will likely make it easier to start scaling the ticket review team for PyPI (with both paid and volunteer efforts), and that's currently with Donald for review as an initial policy that we can start with, and then iterate on over time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia