On 28/03/2019 23.35, Steve Dower wrote:
Time is short, but I'm hoping to get PEP 578 (formerly PEP 551) into Python 3.8. Here's the current text for review and comment before I submit to the Steering Council.
The formatted text is at https://www.python.org/dev/peps/pep-0578/ (update just pushed, so give it an hour or so, but it's fundamentally the same as what's there)
No Discourse post, because we don't have a python-dev equivalent there yet, so please reply here for this one.
Implementation is at https://github.com/zooba/cpython/tree/pep-578/ and my backport to 3.7 (https://github.com/zooba/cpython/tree/pep-578-3.7/) is already getting some real use (though this will not be added to 3.7, unless people *really* want it, so the backport is just for reference).
PEP: 578 Title: Python Runtime Audit Hooks Version: $Revision$ Last-Modified: $Date$ Author: Steve Dower email@example.com Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 16-Jun-2018 Python-Version: 3.8 Post-History:
This PEP describes additions to the Python API and specific behaviors for the CPython implementation that make actions taken by the Python runtime visible to auditing tools. Visibility into these actions provides opportunities for test frameworks, logging frameworks, and security tools to monitor and optionally limit actions taken by the runtime.
This PEP proposes adding two APIs to provide insights into a running Python application: one for arbitrary events, and another specific to the module import system. The APIs are intended to be available in all Python implementations, though the specific messages and values used are unspecified here to allow implementations the freedom to determine how best to provide information to their users. Some examples likely to be used in CPython are provided for explanatory purposes.
See PEP 551 for discussion and recommendations on enhancing the security of a Python runtime making use of these auditing APIs.
Python provides access to a wide range of low-level functionality on many common operating systems. While this is incredibly useful for "write-once, run-anywhere" scripting, it also makes monitoring of software written in Python difficult. Because Python uses native system APIs directly, existing monitoring tools either suffer from limited context or auditing bypass.
Limited context occurs when system monitoring can report that an action occurred, but cannot explain the sequence of events leading to it. For example, network monitoring at the OS level may be able to report "listening started on port 5678", but may not be able to provide the process ID, command line, parent process, or the local state in the program at the point that triggered the action. Firewall controls to prevent such an action are similarly limited, typically to process names or some global state such as the current user, and in any case rarely provide a useful log file correlated with other application messages.
Auditing bypass can occur when the typical system tool used for an action would ordinarily report its use, but accessing the APIs via Python do not trigger this. For example, invoking "curl" to make HTTP requests may be specifically monitored in an audited system, but Python's "urlretrieve" function is not.
Within a long-running Python application, particularly one that processes user-provided information such as a web app, there is a risk of unexpected behavior. This may be due to bugs in the code, or deliberately induced by a malicious user. In both cases, normal application logging may be bypassed resulting in no indication that anything out of the ordinary has occurred.
Additionally, and somewhat unique to Python, it is very easy to affect the code that is run in an application by manipulating either the import system's search path or placing files earlier on the path than intended. This is often seen when developers create a script with the same name as the module they intend to use - for example, a ``random.py`` file that attempts to import the standard library ``random`` module.
This is not sandboxing, as this proposal does not attempt to prevent malicious behavior (though it enables some new options to do so). See the `Why Not A Sandbox`_ section below for further discussion.
Overview of Changes
The aim of these changes is to enable both application developers and system administrators to integrate Python into their existing monitoring systems without dictating how those systems look or behave.
We propose two API changes to enable this: an Audit Hook and Verified Open Hook. Both are available from Python and native code, allowing applications and frameworks written in pure Python code to take advantage of the extra messages, while also allowing embedders or system administrators to deploy builds of Python where auditing is always enabled.
Only CPython is bound to provide the native APIs as described here. Other implementations should provide the pure Python APIs, and may provide native versions as appropriate for their underlying runtimes. Auditing events are likewise considered implementation specific, but are bound by normal feature compatibility guarantees.
In order to observe actions taken by the runtime (on behalf of the caller), an API is required to raise messages from within certain operations. These operations are typically deep within the Python runtime or standard library, such as dynamic code compilation, module imports, DNS resolution, or use of certain modules such as ``ctypes``.
The following new C APIs allow embedders and CPython implementors to send and receive audit hook messages::
# Add an auditing hook typedef int (*hook_func)(const char *event, PyObject *args, void *userData); int PySys_AddAuditHook(hook_func hook, void *userData);
# Raise an event with all auditing hooks int PySys_Audit(const char *event, PyObject *args);
# Internal API used during Py_Finalize() - not publicly accessible void _Py_ClearAuditHooks(void);
The new Python APIs for receiving and raising audit hooks are::
# Add an auditing hook sys.addaudithook(hook: Callable[[str, tuple]])
# Raise an event with all auditing hooks sys.audit(str, *args)
Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time, including before ``Py_Initialize()``, or by calling ``sys.addaudithook()`` from Python code. Hooks cannot be removed or replaced.
When events of interest are occurring, code can either call ``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The string argument is the name of the event, and the tuple contains arguments. A given event name should have a fixed schema for arguments, which should be considered a public API (for each x.y version release), and thus should only change between feature releases with updated documentation.
For maximum compatibility, events using the same name as an event in the reference interpreter CPython should make every attempt to use compatible arguments. Including the name or an abbreviation of the implementation in implementation-specific event names will also help prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly distinguised from an ``ipy.jit_invoked`` event.
When an event is audited, each hook is called in the order it was added with the event name and tuple. If any hook returns with an exception set, later hooks are ignored and *in general* the Python runtime should terminate. This is intentional to allow hook implementations to decide how to respond to any particular event. The typical responses will be to log the event, abort the operation with an exception, or to immediately terminate the process with an operating system exit call.
When an event is audited but no hooks have been set, the ``audit()`` function should impose minimal overhead. Ideally, each argument is a reference to existing data rather than a value calculated just for the auditing call.
As hooks may be Python objects, they need to be freed during ``Py_Finalize()``. To do this, we add an internal API ``_Py_ClearAuditHooks()`` that releases any Python hooks and any memory held. This is an internal function with no public export, and we recommend it raise its own audit event for all current hooks to ensure that unexpected calls are observed.
Below in `Suggested Audit Hook Locations`_, we recommend some important operations that should raise audit events.
Python implementations should document which operations will raise audit events, along with the event schema. It is intentional that ``sys.addaudithook(print)`` be a trivial way to display all messages.
Verified Open Hook
Most operating systems have a mechanism to distinguish between files that can be executed and those that can not. For example, this may be an execute bit in the permissions field, a verified hash of the file contents to detect potential code tampering, or file system path restrictions. These are an important security mechanism for preventing execution of data or code that is not approved for a given environment. Currently, Python has no way to integrate with these when launching scripts or importing modules.
The new public C API for the verified open hook is::
# Set the handler typedef PyObject *(*hook_func)(PyObject *path, void *userData) int PyImport_SetOpenForImportHook(hook_func handler, void *userData)
# Open a file using the handler PyObject *PyImport_OpenForImport(const char *path)
The new public Python API for the verified open hook is::
# Open a file using the handler importlib.util.open_for_import(path : str) -> io.IOBase
The ``importlib.util.open_for_import()`` function is a drop-in replacement for ``open(str(pathlike), 'rb')``. Its default behaviour is to open a file for raw, binary access. To change the behaviour a new handler should be set. Handler functions only accept ``str`` arguments. The C API ``PyImport_OpenForImport`` function assumes UTF-8 encoding.
All import and execution functionality involving code from a file will be changed to use ``open_for_import()`` unconditionally. It is important to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this function - an audit hook that includes the code from these calls is the best opportunity to validate code that is read from the file. Given the current decoupling between import and execution in Python, most imported code will go through both ``open_for_import()`` and the log hook for ``compile``, and so care should be taken to avoid repeating verification steps.
There is no Python API provided for changing the open hook. To modify import behavior from Python code, use the existing functionality provided by ``importlib``.
I think that the import hook needs to be extended. It only works for simple Python files or pyc files. There are at least two other important scenarios: zipimport and shared libraries.
For example how does the importhook work in regarding of alternative importers like zipimport? What does the import hook 'see' for an import from a zipfile?
Shared libraries are trickier. libc doesn't define a way to dlopen() from a file descriptor. dlopen() takes a file name, but a file name leaves the audit hook open to a TOCTOU attack.