SK-CSIRT identified malicious software libraries in the official Python package repository, PyPI
Hi, Last week, the National Security Authority of Slovakia contacted the Python Security Response Team (PSRT) to report that the Python Package Index (PyPI) was hosting malicious packages. Installing these packages send user data to a HTTP server, but also install the expected module so it was an easy to notice the attack. Advisory: http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/ Kudos to them to report the issue! It's not a compromise of the PyPI server nor a third-party project, but the "typo squatting" issue which is known since at least June 2016 (for PyPI). The issue is not specific to Python, npmjs.com or rubygems.org are vulnerable to the same issue. For example, a malicious package used the names "urllib" (no 3) and "urlib3" (1 L) instead of "urllib3" (2 L). These packages were downloaded by users, so the attack was effective. More information on typo squatting and Python package security: https://python-security.readthedocs.io/packages.html#pypi-typo-squatting The PRST contacted PyPI administrators and all identified packages were taken down, only 1h10 after the PSRT received the email from the National Security Authority of Slovakia! The typo squatting issue is known and discussed, but not solution was found yet. See for example this warehouse issue: https://github.com/pypa/warehouse/issues/2151 It seems like the consensus is that pip is not responsible to detect malicious code, it's more the responsability of PyPI. The problem is to decide how to detect malicious code and/or prevent typo squatting on PyPI. The issue has been discussed privately on the PSRT list last week. The National Security Authority of Slovakia just published their advisory, and a public discussion started on reddit: https://news.ycombinator.com/item?id=15256121 I consider that it's now time to find a solution on the public python-dev mailing list. Let's try to find a solution! Can we learn something from the Update Framework (TUF)? How does Javascript, Ruby, Perl and other programming languages deal with these security issues on their package manager? See also my other notes on Python security and the list of known CPython vulnerabilities: https://python-security.readthedocs.io/ Victor
Benjamin Bach and Hanno Böck are running https://www.pytosquatting.org/ and registered many projects lilke https://pypi.python.org/pypi/urllib2 "In June 2016, Typosquatting programming language package managers stated that urllib2 had ~4,000 downloads in 2 weeks. The package name is now squatted by us (the good guys). We take these findings seriously." It seems like we need a solution to prevent that a project removed because it contains malicious code, can be recreated automatically. pytosquatting.org projects contain a download file: a tarball with a setup.py file. This setup.py raises an exception, but also send a HTTP request, a "pingback", to their server. Thank you for reserving names of the standard library. But I'm not sure of the HTTP "pingback" part. It can be on CIs, a restricted environments, etc. Why not just reserving the name but don't provide any download file? With no download file, the user will likely understand his/her error, no? Note: I don't think that Benjamin Bach and Hanno Böck are related to the PSRT nor PyPI administrators. Victor 2017-09-15 22:28 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
Hi,
Last week, the National Security Authority of Slovakia contacted the Python Security Response Team (PSRT) to report that the Python Package Index (PyPI) was hosting malicious packages. Installing these packages send user data to a HTTP server, but also install the expected module so it was an easy to notice the attack.
Advisory: http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/
Kudos to them to report the issue!
It's not a compromise of the PyPI server nor a third-party project, but the "typo squatting" issue which is known since at least June 2016 (for PyPI). The issue is not specific to Python, npmjs.com or rubygems.org are vulnerable to the same issue.
For example, a malicious package used the names "urllib" (no 3) and "urlib3" (1 L) instead of "urllib3" (2 L). These packages were downloaded by users, so the attack was effective.
More information on typo squatting and Python package security: https://python-security.readthedocs.io/packages.html#pypi-typo-squatting
The PRST contacted PyPI administrators and all identified packages were taken down, only 1h10 after the PSRT received the email from the National Security Authority of Slovakia!
The typo squatting issue is known and discussed, but not solution was found yet. See for example this warehouse issue: https://github.com/pypa/warehouse/issues/2151
It seems like the consensus is that pip is not responsible to detect malicious code, it's more the responsability of PyPI.
The problem is to decide how to detect malicious code and/or prevent typo squatting on PyPI.
The issue has been discussed privately on the PSRT list last week. The National Security Authority of Slovakia just published their advisory, and a public discussion started on reddit: https://news.ycombinator.com/item?id=15256121
I consider that it's now time to find a solution on the public python-dev mailing list.
Let's try to find a solution!
Can we learn something from the Update Framework (TUF)?
How does Javascript, Ruby, Perl and other programming languages deal with these security issues on their package manager?
See also my other notes on Python security and the list of known CPython vulnerabilities: https://python-security.readthedocs.io/
Victor
* Victor Stinner <victor.stinner@gmail.com>, 2017-09-15, 23:08:
Why not just reserving the name but don't provide any download file?
Is is possible at the moment? I tried "python setup.py register", but all I got was: Server response (410): Project pre-registration is no longer required or supported, so continue directly to uploading files. -- Jakub Wilk
On 16 September 2017 at 07:08, Victor Stinner <victor.stinner@gmail.com> wrote:
Benjamin Bach and Hanno Böck are running https://www.pytosquatting.org/ and registered many projects lilke https://pypi.python.org/pypi/urllib2
"In June 2016, Typosquatting programming language package managers stated that urllib2 had ~4,000 downloads in 2 weeks. The package name is now squatted by us (the good guys). We take these findings seriously."
It seems like we need a solution to prevent that a project removed because it contains malicious code, can be recreated automatically.
The PyPI admins have the ability to blacklist project names (with one of the simplest mechanisms being for admins to typosquat them pre-emptively - while, as Jakub notes, that currently requires uploading an actual sdist, it doesn't need to be an installable one), so that isn't the hard part of the problem. The hard parts are the ones that potentially require people to scale: 1. Noticing typosquatting in the first place 2. Evaluating and responding to notifications of malicious namesquatting in a timely manner 3. Handling requests to use names that have been reserved by the admins The first part can already be handled in a distributed fashion - bandersnatch will give anyone a full mirror of PyPI in a few hours (depending on their available network bandwidth), and going to https://pypi.org/simple/ will give you a listing of all the registered project names as a HTML file. The second part is tricky, since there aren't currently any consistent public markers for "benign" blacklisting. For example, if you go to https://pypi.org/simple/pkg-resources/ (the normalised name for "pkg_resources") you'll get a response, since that's a registered name with no uploads (IIRC, it's reserved by Donald). However, if you go to https://pypi.org/project/pkg-resources/ you get a 404 page, rather than a notification that the name has been reserved by the PyPI admins. So that would probably be a useful enhancement - having "reserved by the PyPI admins" as an explicit state, so 3rd party researchers can more readily flag such cases as being as safe as we can reasonably make them from a security perspective. For the "review for malice" aspect, notifying the PyPI admins of all close names isn't a particularly viable option as the default behaviour, since they're either volunteers, or folks with partial time grants from their employers. However, something that I think does have the potential to scale reasonably well is to instead notify the maintainers of projects with similar names: https://github.com/pypa/warehouse/issues/2268 That way the initial review is distributed to the maintainers of targeted packages, and then only the cases that those maintainers consider to be potentially malicious would need to be escalated to the PyPI admins. The final aspect is one of the items considered in PEP 541: https://www.python.org/dev/peps/pep-0541/#invalid-projects Getting PEP 541 accepted will likely make it easier to start scaling the ticket review team for PyPI (with both paid and volunteer efforts), and that's currently with Donald for review as an initial policy that we can start with, and then iterate on over time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
An idea for typo squatting would be to compute the Levenshtein distance with package names of standard library and top 100 most popular PyPI packages, and require to contact a moderation team if the name is too close to an existing package. The moderation team will review the email, but also watch the package during 1 month to check if everything seems fine. It requires to have a list of all package names of the standard library, and maintain an up to date list of popular PyPI package names. It also requires to set up a mailing list, and tooling to report the error message to users, and then give moderators the right to create the package. I'm not sure that it's easy to implement it. Victor
Hi, FYI I just sent a public advisory for the PyPI typo squatting issue to the new security-announce list: [Security-announce] Typo squatting and malicious packages on PyPI https://mail.python.org/pipermail/security-announce/2017-September/000000.ht... Please subscribe to this newly created mailing list to stay tuned! https://mail.python.org/mailman/listinfo/security-announce Victor 2017-09-15 22:28 GMT+02:00 Victor Stinner <victor.stinner@gmail.com>:
Hi,
Last week, the National Security Authority of Slovakia contacted the Python Security Response Team (PSRT) to report that the Python Package Index (PyPI) was hosting malicious packages. Installing these packages send user data to a HTTP server, but also install the expected module so it was an easy to notice the attack.
Advisory: http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/
Kudos to them to report the issue!
It's not a compromise of the PyPI server nor a third-party project, but the "typo squatting" issue which is known since at least June 2016 (for PyPI). The issue is not specific to Python, npmjs.com or rubygems.org are vulnerable to the same issue.
For example, a malicious package used the names "urllib" (no 3) and "urlib3" (1 L) instead of "urllib3" (2 L). These packages were downloaded by users, so the attack was effective.
More information on typo squatting and Python package security: https://python-security.readthedocs.io/packages.html#pypi-typo-squatting
The PRST contacted PyPI administrators and all identified packages were taken down, only 1h10 after the PSRT received the email from the National Security Authority of Slovakia!
The typo squatting issue is known and discussed, but not solution was found yet. See for example this warehouse issue: https://github.com/pypa/warehouse/issues/2151
It seems like the consensus is that pip is not responsible to detect malicious code, it's more the responsability of PyPI.
The problem is to decide how to detect malicious code and/or prevent typo squatting on PyPI.
The issue has been discussed privately on the PSRT list last week. The National Security Authority of Slovakia just published their advisory, and a public discussion started on reddit: https://news.ycombinator.com/item?id=15256121
I consider that it's now time to find a solution on the public python-dev mailing list.
Let's try to find a solution!
Can we learn something from the Update Framework (TUF)?
How does Javascript, Ruby, Perl and other programming languages deal with these security issues on their package manager?
See also my other notes on Python security and the list of known CPython vulnerabilities: https://python-security.readthedocs.io/
Victor
participants (3)
-
Jakub Wilk
-
Nick Coghlan
-
Victor Stinner