[Python-Dev] PEP 578: Python Runtime Audit Hooks

Fri Mar 29 15:18:17 EDT 2019

On 28/03/2019 23.35, Steve Dower wrote:
> Hi all
> 
> Time is short, but I'm hoping to get PEP 578 (formerly PEP 551) into
> Python 3.8. Here's the current text for review and comment before I
> submit to the Steering Council.
> 
> The formatted text is at https://www.python.org/dev/peps/pep-0578/
> (update just pushed, so give it an hour or so, but it's fundamentally
> the same as what's there)
> 
> No Discourse post, because we don't have a python-dev equivalent there
> yet, so please reply here for this one.
> 
> Implementation is at https://github.com/zooba/cpython/tree/pep-578/ and
> my backport to 3.7 (https://github.com/zooba/cpython/tree/pep-578-3.7/)
> is already getting some real use (though this will not be added to 3.7,
> unless people *really* want it, so the backport is just for reference).
> 
> Cheers,
> Steve
> 
> =====
> 
> PEP: 578
> Title: Python Runtime Audit Hooks
> Version: $Revision$
> Last-Modified: $Date$
> Author: Steve Dower <steve.dower at python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 16-Jun-2018
> Python-Version: 3.8
> Post-History:
> 
> Abstract
> ========
> 
> This PEP describes additions to the Python API and specific behaviors
> for the CPython implementation that make actions taken by the Python
> runtime visible to auditing tools. Visibility into these actions
> provides opportunities for test frameworks, logging frameworks, and
> security tools to monitor and optionally limit actions taken by the
> runtime.
> 
> This PEP proposes adding two APIs to provide insights into a running
> Python application: one for arbitrary events, and another specific to
> the module import system. The APIs are intended to be available in all
> Python implementations, though the specific messages and values used
> are unspecified here to allow implementations the freedom to determine
> how best to provide information to their users. Some examples likely
> to be used in CPython are provided for explanatory purposes.
> 
> See PEP 551 for discussion and recommendations on enhancing the
> security of a Python runtime making use of these auditing APIs.
> 
> Background
> ==========
> 
> Python provides access to a wide range of low-level functionality on
> many common operating systems. While this is incredibly useful for
> "write-once, run-anywhere" scripting, it also makes monitoring of
> software written in Python difficult. Because Python uses native system
> APIs directly, existing monitoring tools either suffer from limited
> context or auditing bypass.
> 
> Limited context occurs when system monitoring can report that an
> action occurred, but cannot explain the sequence of events leading to
> it. For example, network monitoring at the OS level may be able to
> report "listening started on port 5678", but may not be able to
> provide the process ID, command line, parent process, or the local
> state in the program at the point that triggered the action. Firewall
> controls to prevent such an action are similarly limited, typically
> to process names or some global state such as the current user, and
> in any case rarely provide a useful log file correlated with other
> application messages.
> 
> Auditing bypass can occur when the typical system tool used for an
> action would ordinarily report its use, but accessing the APIs via
> Python do not trigger this. For example, invoking "curl" to make HTTP
> requests may be specifically monitored in an audited system, but
> Python's "urlretrieve" function is not.
> 
> Within a long-running Python application, particularly one that
> processes user-provided information such as a web app, there is a risk
> of unexpected behavior. This may be due to bugs in the code, or
> deliberately induced by a malicious user. In both cases, normal
> application logging may be bypassed resulting in no indication that
> anything out of the ordinary has occurred.
> 
> Additionally, and somewhat unique to Python, it is very easy to affect
> the code that is run in an application by manipulating either the
> import system's search path or placing files earlier on the path than
> intended. This is often seen when developers create a script with the
> same name as the module they intend to use - for example, a
> ``random.py`` file that attempts to import the standard library
> ``random`` module.
> 
> This is not sandboxing, as this proposal does not attempt to prevent
> malicious behavior (though it enables some new options to do so).
> See the `Why Not A Sandbox`_ section below for further discussion.
> 
> Overview of Changes
> ===================
> 
> The aim of these changes is to enable both application developers and
> system administrators to integrate Python into their existing
> monitoring systems without dictating how those systems look or behave.
> 
> We propose two API changes to enable this: an Audit Hook and Verified
> Open Hook. Both are available from Python and native code, allowing
> applications and frameworks written in pure Python code to take
> advantage of the extra messages, while also allowing embedders or
> system administrators to deploy builds of Python where auditing is
> always enabled.
> 
> Only CPython is bound to provide the native APIs as described here.
> Other implementations should provide the pure Python APIs, and
> may provide native versions as appropriate for their underlying
> runtimes. Auditing events are likewise considered implementation
> specific, but are bound by normal feature compatibility guarantees.
> 
> Audit Hook
> ----------
> 
> In order to observe actions taken by the runtime (on behalf of the
> caller), an API is required to raise messages from within certain
> operations. These operations are typically deep within the Python
> runtime or standard library, such as dynamic code compilation, module
> imports, DNS resolution, or use of certain modules such as ``ctypes``.
> 
> The following new C APIs allow embedders and CPython implementors to
> send and receive audit hook messages::
> 
>    # Add an auditing hook
>    typedef int (*hook_func)(const char *event, PyObject *args,
>                             void *userData);
>    int PySys_AddAuditHook(hook_func hook, void *userData);
> 
>    # Raise an event with all auditing hooks
>    int PySys_Audit(const char *event, PyObject *args);
> 
>    # Internal API used during Py_Finalize() - not publicly accessible
>    void _Py_ClearAuditHooks(void);
> 
> The new Python APIs for receiving and raising audit hooks are::
> 
>    # Add an auditing hook
>    sys.addaudithook(hook: Callable[[str, tuple]])
> 
>    # Raise an event with all auditing hooks
>    sys.audit(str, *args)
> 
> 
> Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time,
> including before ``Py_Initialize()``, or by calling
> ``sys.addaudithook()`` from Python code. Hooks cannot be removed or
> replaced.
> 
> When events of interest are occurring, code can either call
> ``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The
> string argument is the name of the event, and the tuple contains
> arguments. A given event name should have a fixed schema for arguments,
> which should be considered a public API (for each x.y version release),
> and thus should only change between feature releases with updated
> documentation.
> 
> For maximum compatibility, events using the same name as an event in
> the reference interpreter CPython should make every attempt to use
> compatible arguments. Including the name or an abbreviation of the
> implementation in implementation-specific event names will also help
> prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly
> distinguised from an ``ipy.jit_invoked`` event.
> 
> When an event is audited, each hook is called in the order it was added
> with the event name and tuple. If any hook returns with an exception
> set, later hooks are ignored and *in general* the Python runtime should
> terminate. This is intentional to allow hook implementations to decide
> how to respond to any particular event. The typical responses will be to
> log the event, abort the operation with an exception, or to immediately
> terminate the process with an operating system exit call.
> 
> When an event is audited but no hooks have been set, the ``audit()``
> function should impose minimal overhead. Ideally, each argument is a
> reference to existing data rather than a value calculated just for the
> auditing call.
> 
> As hooks may be Python objects, they need to be freed during
> ``Py_Finalize()``. To do this, we add an internal API
> ``_Py_ClearAuditHooks()`` that releases any Python hooks and any
> memory held. This is an internal function with no public export, and
> we recommend it raise its own audit event for all current hooks to
> ensure that unexpected calls are observed.
> 
> Below in `Suggested Audit Hook Locations`_, we recommend some important
> operations that should raise audit events.
> 
> Python implementations should document which operations will raise
> audit events, along with the event schema. It is intentional that
> ``sys.addaudithook(print)`` be a trivial way to display all messages.
> 
> Verified Open Hook
> ------------------
> 
> Most operating systems have a mechanism to distinguish between files
> that can be executed and those that can not. For example, this may be an
> execute bit in the permissions field, a verified hash of the file
> contents to detect potential code tampering, or file system path
> restrictions. These are an important security mechanism for preventing
> execution of data or code that is not approved for a given environment.
> Currently, Python has no way to integrate with these when launching
> scripts or importing modules.
> 
> The new public C API for the verified open hook is::
> 
>    # Set the handler
>    typedef PyObject *(*hook_func)(PyObject *path, void *userData)
>    int PyImport_SetOpenForImportHook(hook_func handler, void *userData)
> 
>    # Open a file using the handler
>    PyObject *PyImport_OpenForImport(const char *path)
> 
> The new public Python API for the verified open hook is::
> 
>    # Open a file using the handler
>    importlib.util.open_for_import(path : str) -> io.IOBase
> 
> 
> The ``importlib.util.open_for_import()`` function is a drop-in
> replacement for ``open(str(pathlike), 'rb')``. Its default behaviour is
> to open a file for raw, binary access. To change the behaviour a new
> handler should be set. Handler functions only accept ``str`` arguments.
> The C API ``PyImport_OpenForImport`` function assumes UTF-8 encoding.

[...]

> All import and execution functionality involving code from a file will
> be changed to use ``open_for_import()`` unconditionally. It is important
> to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go
> through this function - an audit hook that includes the code from these
> calls is the best opportunity to validate code that is read from the
> file. Given the current decoupling between import and execution in
> Python, most imported code will go through both ``open_for_import()``
> and the log hook for ``compile``, and so care should be taken to avoid
> repeating verification steps.
> 
> There is no Python API provided for changing the open hook. To modify
> import behavior from Python code, use the existing functionality
> provided by ``importlib``.

I think that the import hook needs to be extended. It only works for
simple Python files or pyc files. There are at least two other important
scenarios: zipimport and shared libraries.

For example how does the importhook work in regarding of alternative
importers like zipimport? What does the import hook 'see' for an import
from a zipfile?

Shared libraries are trickier. libc doesn't define a way to dlopen()
from a file descriptor. dlopen() takes a file name, but a file name
leaves the audit hook open to a TOCTOU attack.

Christian