[Security-announce] Typo squatting and malicious packages on PyPI

Victor Stinner victor.stinner at gmail.com
Fri Sep 22 05:00:03 EDT 2017


This is an incident report covering the recent takedown of a number of
malicious packages from the Python Package Index (PyPI), as well as
the subsequent pre-emptive reservation of a range of additional
project names by the PyPI administrators.

New dedicated list for security announcements
=============================================

After malicious packages were removed from the Python Package Index
based on a report received by the Python Security Response Team, the
PSRT discussed how to announce the issue, as the PSRT had no official
public channel to specifically communicate Python security
announcements.

To that end, a new security-announce at python.org mailing list has been
created to address that issue. You can now subscribe to the new
mailing list here:

    https://mail.python.org/mailman/listinfo/security-announce

This is an announce-only list; discussions are redirected to
security-sig at python.org

    https://mail.python.org/mailman/listinfo/security-sig

Rather than waiting for the new list to be available, the report and
subsequent package removal were announced on the python-dev mailing
list to start a discussion on how we can prevent further attempts or
make them less effective:

   [Python-Dev] SK-CSIRT identified malicious software libraries in
the official Python package repository, PyPI
   https://mail.python.org/pipermail/python-dev/2017-September/149569.html


Malicious packages published in June 2016
=========================================

On the  6th of September 2017, the National Security Authority of
Slovakia contacted the Python Security Response Team (PSRT) to report
that the Python Package Index (PyPI) was hosting malicious packages.
The PSRT contacted PyPI administrators and all identified packages
were taken down, within 70 minutes of the PSRT receiving the report.

Installing these packages sent data (name and version of the fake
package, user name of the user who installs the package, hostname) to
a HTTP server, but also installed the expected module so it wasn't
easy to notice the attack.

List of the 11 malicious packages:

- acqusition
- apidev-coop
- bzip
- crypt
- django-server
- pwd
- setup-tools
- telnet
- urlib3
- urllib
- xml

See the advisory for more information:
http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/

Thanks to the National Security Authority of Slovakia for reporting the issue!

Typo Squatting
==============

This incident was not due to a compromise of the PyPI service nor any
third-party project, but instead an instance of the "typo squatting"
concern that inherently arises due to the nature of PyPI as an open
publication platform with deliberately minimal barriers to
participation.

This form of attack is not specific to Python or the Python Package
Index - it arises for any publication platform which does not impose a
formal pre-publication review process on potential software
publishers, and instead expects consumers of the published components
to conduct their own post-publication review (either individually or
collectively).

Examples of other systems impacted by this kind of problem include the
npmjs.com repository for JavaScript projects, the rubygems.org
repository for Ruby projects, and the Domain Name System itself (which
provides the origin of the term:
https://en.wikipedia.org/wiki/Typosquatting ).


Recent History
==============

The parallels between typo squatting domain names and typo squatting
dynamic language package managers were highlighted in June 2016, when
Nikolai Tschacher at the University of Hamburg quantified the
potential scope of the problem for the Python, Ruby, and JavaScript
ecosystems in his undergraduate thesis
(Summary: http://incolumitas.com/2016/06/08/typosquatting-package-managers/;
Full PDF: http://incolumitas.com/data/thesis.pdf )

Subsequently, in May 2017, fate0 published around 26 packages, which
remained online for a period of  5 days. fate0 wrote a blog post
summarising the results of the experiment:
http://blog.fatezero.org/2017/06/01/package-fishing/

In June 2017, 11 malicious packages were published on PyPI and
remained online until the 6th of September (a period of 3 months).
These packages sent basic user data to 121.42.217.44 server (TCP port
8080), and are the main subject of this incident report.

In September 2017, Benjamin Bach and Hanno Böck started a project to
reserve PyPI package names to prevent malicious usage:
https://www.pytosquatting.org/

Their package setup.py sends a HTTP request for statistics collection
purposes (https://github.com/benjaoming/pytosquatting ), and they
report that they blocked around 7500 attempted standard library
package installations over a period of 4 days from September 13th to
16th (as a point of reference, this figure represents around 0.006% of
the ~123 million PyPI package downloads that took place over that
period).

For additional links related to typo squatting and Python package security, see:
https://python-security.readthedocs.io/packages.html#pypi-typo-squatting


Mitigation technique: 3rd party component review
================================================

The primary mitigation technique for these kinds of attacks (both typo
squatting and social engineering) is to rely on 3rd party component
reviewers that are independent of the original publishers.

While this approach does tend to significantly reduce the number of
available Python components to either the hundreds (commercial Python
redistributors and commercially supported Linux distributions) or the
thousands (community Linux distributions), as compared to the tens of
thousands of components available on PyPI, it also substantially
reduces the risks of inadvertently installing a malicious package.


Mitigation technique: blocking package registration
===================================================

The PyPI administrators have historically only had limited tools
available to prohibit the use of particular project names:

* registering the name themselves
* updating a list of prohibited names stored directly in the source code

These mechanisms have now been replaced by a database backed mechanism
which administrators can update directly through the admin interface:
https://github.com/pypa/warehouse/pull/2396

In addition to these explicitly prohibited names, the server now also
dynamically prohibits the use of any standard library package names,
based on the list extracted from the standard library documentation by
the stdlib-list project:
https://github.com/pypa/warehouse/pull/2409

https://github.com/pypa/warehouse/issues/2401 is an open issue to
discuss whether or not we want to make any further changes to the
error messages reported when attempting to either download or register
a project with a prohibited name (the current behaviour is to simply
report a generic 404 error for attempted downloads, and a 403 error
noting that the name is prohibited for attempted uploads).


Mitigation technique: client side typo detection and notification
=================================================================

While the default pip client utility is unlikely to ever implement
typo detection due to the additional dependencies required, the higher
level pipenv client (which also incorporates virtual environment
management) has been enhanced to check for similarities to the top
1000 most popular downloads from PyPI and notify the user if the
package they're installing is similar to, but not the same as, one of
those names:
https://github.com/kennethreitz/pipenv/commit/aeaabf42f16e8167ca67af5ab7a34d864e7b358d


Potential mitigation technique: server notifications for similar project names
==============================================================================

While not yet implemented, notifying project maintainers (rather than
the PyPI admins) when projects with similar names to their existing
ones are registered offers a potential mechanism for reviewing new
projects for potential typosquatting concerns without overwhelming the
available resources of the Python Software Foundation's infrastructure
management staff and volunteers.

The PyPI admins would then only need to deal with cases where either
new project names are similar to names on the prohibited, or else a
maintainer of a previously published project has reviewed the new
project and considers it potentially suspicious.

This feature has *not* yet been implemented, and can be discussed
further at https://github.com/pypa/warehouse/issues/2268


Mozilla Open Source Support grant
=================================

The Python Packaging Index is currently undergoing a migration from
the legacy service hosted at https://pypi.python.org to an updated
service hosted at https://pypi.org.

This migration is taking place as the original service was built in a
way that limited the applicability of most modern design & development
techniques (such as test-driven development and continuous
integration).

While the upload features of the legacy service were successfully
switched off in July 2017, a number of other enhancements to the
replacement service are still required before the legacy service can
be shut off, and PyPI development can focus entirely on the new, more
robust, and more contributor friendly implementation.

To that end, the PSF's Packaging Working Group applied for (and was
awarded in September 2017), a $170k Mozilla Open Source Support
foundational grant.

The scope of this grant covers the design, development, and project
management activities needed to finalise the migration from
pypi.python.org to pypi.org, and thus make the implementation of
additional security enhancements and other features more feasible.


Ongoing sustaining engineering funding
======================================

While the MOSS grant will be incredibly beneficial, the fact remains
that the PyPI service and the related client applications are
noticeably understaffed given their importance as pieces of
infrastructure for some of the world's largest organisations.

Complex migrations that could potentially have been performed in a
matter of months given more focused attention (e.g. migrating to
per-user installations as the default in `pip`) have instead lingered
for years, as the developers involved know that they're going to have
to deal with a lot of users being upset by the change, and there's
only so much of that anyone is prepared to put up with as part of a
volunteer activity.

In-kind donations of online services have been most welcome
(especially the Fastly CDN, without which there is no way the legacy
service would be able to handle the current download volumes), but
they primarily serve to sustain current operations: they don't
typically help to move the ecosystem forward through the addition of
new capabilities or making improvements to default behaviours.

Currently, ongoing PyPI maintenance and operations is largely being
handled by two individuals, Donald Stufft (both on his own time, and
on time granted by his employer, Amazon Web Services), and Ernest W.
Durbin III (entirely on his own time).

They are supported in this effort by the PSF's Infrastructure Manager,
Mark Mangoba, and the PSF Board. The funding received from the MOSS
grant award will also allow Nicole Harris and Sumana Harihareswara to
dedicate additional time to design & project management activities.

Client tools benefit from a broader contributor base (as updating them
doesn't carry the same risk of immediately breaking a key production
service for the community), but even there, the number of paid,
full-time contributors stands at a grand total of zero.

In many ways, this is similar to the situation that existed with the
OpenSSL project prior to 2014, before the major security vulnerability
"Heartbleed" was disclosed, and customers of OpenSSL redistributors
all realised that their assumption that someone was already taking
care of ensuring OpenSSL's sustainability was incorrect. The
industry's collective response to the crisis was the Core
Infrastructure Initiative, a multimillion-dollar project announced by
the Linux Foundation on April, 2014 to provide funds to critical
elements of the global information infrastructure.

While the PSF is currently undertaking a membership drive to encourage
sign-ups of new supporting members, and maintains a page for targeted
PyPI-specific donations at https://donate.pypi.io/, these initiatives
are not expected to be sufficient on their own to fully cover the task
of suitably maintaining the shared PyPI service.

Rather, what is likely needed is for Python's larger commercial
redistributors to acknowledge the importance of the Python Package
Index to the developer experience they're offering to their customers,
and determine an appropriate level of active upstream contribution as
part of their own sustaining engineering plans.


Additional Links
================

Discussion on the latest typo squatting issue reported by SK-CSIRT:

* Advisory: http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/
* Ars Technica:
https://arstechnica.com/information-technology/2017/09/devs-unknowingly-use-malicious-modules-put-into-official-python-repository/
* Python-Dev: https://mail.python.org/pipermail/python-dev/2017-September/149569.html
* Hacker News: https://news.ycombinator.com/item?id=15256121
* LWN: https://lwn.net/Articles/733853/

Links to Python security:

* Python Security: http://python-security.readthedocs.io/
* PyPI security: https://pypi.org/security/


-- Python Security Response Team (PSRT)


More information about the Security-announce mailing list