PEP 578: Python Runtime Audit Hooks

Hi all Time is short, but I'm hoping to get PEP 578 (formerly PEP 551) into Python 3.8. Here's the current text for review and comment before I submit to the Steering Council. The formatted text is at https://www.python.org/dev/peps/pep-0578/ (update just pushed, so give it an hour or so, but it's fundamentally the same as what's there) No Discourse post, because we don't have a python-dev equivalent there yet, so please reply here for this one. Implementation is at https://github.com/zooba/cpython/tree/pep-578/ and my backport to 3.7 (https://github.com/zooba/cpython/tree/pep-578-3.7/) is already getting some real use (though this will not be added to 3.7, unless people *really* want it, so the backport is just for reference). Cheers, Steve ===== PEP: 578 Title: Python Runtime Audit Hooks Version: $Revision$ Last-Modified: $Date$ Author: Steve Dower <steve.dower@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 16-Jun-2018 Python-Version: 3.8 Post-History: Abstract ======== This PEP describes additions to the Python API and specific behaviors for the CPython implementation that make actions taken by the Python runtime visible to auditing tools. Visibility into these actions provides opportunities for test frameworks, logging frameworks, and security tools to monitor and optionally limit actions taken by the runtime. This PEP proposes adding two APIs to provide insights into a running Python application: one for arbitrary events, and another specific to the module import system. The APIs are intended to be available in all Python implementations, though the specific messages and values used are unspecified here to allow implementations the freedom to determine how best to provide information to their users. Some examples likely to be used in CPython are provided for explanatory purposes. See PEP 551 for discussion and recommendations on enhancing the security of a Python runtime making use of these auditing APIs. Background ========== Python provides access to a wide range of low-level functionality on many common operating systems. While this is incredibly useful for "write-once, run-anywhere" scripting, it also makes monitoring of software written in Python difficult. Because Python uses native system APIs directly, existing monitoring tools either suffer from limited context or auditing bypass. Limited context occurs when system monitoring can report that an action occurred, but cannot explain the sequence of events leading to it. For example, network monitoring at the OS level may be able to report "listening started on port 5678", but may not be able to provide the process ID, command line, parent process, or the local state in the program at the point that triggered the action. Firewall controls to prevent such an action are similarly limited, typically to process names or some global state such as the current user, and in any case rarely provide a useful log file correlated with other application messages. Auditing bypass can occur when the typical system tool used for an action would ordinarily report its use, but accessing the APIs via Python do not trigger this. For example, invoking "curl" to make HTTP requests may be specifically monitored in an audited system, but Python's "urlretrieve" function is not. Within a long-running Python application, particularly one that processes user-provided information such as a web app, there is a risk of unexpected behavior. This may be due to bugs in the code, or deliberately induced by a malicious user. In both cases, normal application logging may be bypassed resulting in no indication that anything out of the ordinary has occurred. Additionally, and somewhat unique to Python, it is very easy to affect the code that is run in an application by manipulating either the import system's search path or placing files earlier on the path than intended. This is often seen when developers create a script with the same name as the module they intend to use - for example, a ``random.py`` file that attempts to import the standard library ``random`` module. This is not sandboxing, as this proposal does not attempt to prevent malicious behavior (though it enables some new options to do so). See the `Why Not A Sandbox`_ section below for further discussion. Overview of Changes =================== The aim of these changes is to enable both application developers and system administrators to integrate Python into their existing monitoring systems without dictating how those systems look or behave. We propose two API changes to enable this: an Audit Hook and Verified Open Hook. Both are available from Python and native code, allowing applications and frameworks written in pure Python code to take advantage of the extra messages, while also allowing embedders or system administrators to deploy builds of Python where auditing is always enabled. Only CPython is bound to provide the native APIs as described here. Other implementations should provide the pure Python APIs, and may provide native versions as appropriate for their underlying runtimes. Auditing events are likewise considered implementation specific, but are bound by normal feature compatibility guarantees. Audit Hook ---------- In order to observe actions taken by the runtime (on behalf of the caller), an API is required to raise messages from within certain operations. These operations are typically deep within the Python runtime or standard library, such as dynamic code compilation, module imports, DNS resolution, or use of certain modules such as ``ctypes``. The following new C APIs allow embedders and CPython implementors to send and receive audit hook messages:: # Add an auditing hook typedef int (*hook_func)(const char *event, PyObject *args, void *userData); int PySys_AddAuditHook(hook_func hook, void *userData); # Raise an event with all auditing hooks int PySys_Audit(const char *event, PyObject *args); # Internal API used during Py_Finalize() - not publicly accessible void _Py_ClearAuditHooks(void); The new Python APIs for receiving and raising audit hooks are:: # Add an auditing hook sys.addaudithook(hook: Callable[[str, tuple]]) # Raise an event with all auditing hooks sys.audit(str, *args) Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time, including before ``Py_Initialize()``, or by calling ``sys.addaudithook()`` from Python code. Hooks cannot be removed or replaced. When events of interest are occurring, code can either call ``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The string argument is the name of the event, and the tuple contains arguments. A given event name should have a fixed schema for arguments, which should be considered a public API (for each x.y version release), and thus should only change between feature releases with updated documentation. For maximum compatibility, events using the same name as an event in the reference interpreter CPython should make every attempt to use compatible arguments. Including the name or an abbreviation of the implementation in implementation-specific event names will also help prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly distinguised from an ``ipy.jit_invoked`` event. When an event is audited, each hook is called in the order it was added with the event name and tuple. If any hook returns with an exception set, later hooks are ignored and *in general* the Python runtime should terminate. This is intentional to allow hook implementations to decide how to respond to any particular event. The typical responses will be to log the event, abort the operation with an exception, or to immediately terminate the process with an operating system exit call. When an event is audited but no hooks have been set, the ``audit()`` function should impose minimal overhead. Ideally, each argument is a reference to existing data rather than a value calculated just for the auditing call. As hooks may be Python objects, they need to be freed during ``Py_Finalize()``. To do this, we add an internal API ``_Py_ClearAuditHooks()`` that releases any Python hooks and any memory held. This is an internal function with no public export, and we recommend it raise its own audit event for all current hooks to ensure that unexpected calls are observed. Below in `Suggested Audit Hook Locations`_, we recommend some important operations that should raise audit events. Python implementations should document which operations will raise audit events, along with the event schema. It is intentional that ``sys.addaudithook(print)`` be a trivial way to display all messages. Verified Open Hook ------------------ Most operating systems have a mechanism to distinguish between files that can be executed and those that can not. For example, this may be an execute bit in the permissions field, a verified hash of the file contents to detect potential code tampering, or file system path restrictions. These are an important security mechanism for preventing execution of data or code that is not approved for a given environment. Currently, Python has no way to integrate with these when launching scripts or importing modules. The new public C API for the verified open hook is:: # Set the handler typedef PyObject *(*hook_func)(PyObject *path, void *userData) int PyImport_SetOpenForImportHook(hook_func handler, void *userData) # Open a file using the handler PyObject *PyImport_OpenForImport(const char *path) The new public Python API for the verified open hook is:: # Open a file using the handler importlib.util.open_for_import(path : str) -> io.IOBase The ``importlib.util.open_for_import()`` function is a drop-in replacement for ``open(str(pathlike), 'rb')``. Its default behaviour is to open a file for raw, binary access. To change the behaviour a new handler should be set. Handler functions only accept ``str`` arguments. The C API ``PyImport_OpenForImport`` function assumes UTF-8 encoding. A custom handler may be set by calling ``PyImport_SetOpenForImportHook()`` from C at any time, including before ``Py_Initialize()``. However, if a hook has already been set then the call will fail. When ``open_for_import()`` is called with a hook set, the hook will be passed the path and its return value will be returned directly. The returned object should be an open file-like object that supports reading raw bytes. This is explicitly intended to allow a ``BytesIO`` instance if the open handler has already read the entire file into memory. Note that these hooks can import and call the ``_io.open()`` function on CPython without triggering themselves. They can also use ``_io.BytesIO`` to return a compatible result using an in-memory buffer. If the hook determines that the file should not be loaded, it should raise an exception of its choice, as well as performing any other logging. All import and execution functionality involving code from a file will be changed to use ``open_for_import()`` unconditionally. It is important to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go through this function - an audit hook that includes the code from these calls is the best opportunity to validate code that is read from the file. Given the current decoupling between import and execution in Python, most imported code will go through both ``open_for_import()`` and the log hook for ``compile``, and so care should be taken to avoid repeating verification steps. There is no Python API provided for changing the open hook. To modify import behavior from Python code, use the existing functionality provided by ``importlib``. API Availability ---------------- While all the functions added here are considered public and stable API, the behavior of the functions is implementation specific. Most descriptions here refer to the CPython implementation, and while other implementations should provide the functions, there is no requirement that they behave the same. For example, ``sys.addaudithook()`` and ``sys.audit()`` should exist but may do nothing. This allows code to make calls to ``sys.audit()`` without having to test for existence, but it should not assume that its call will have any effect. (Including existence tests in security-critical code allows another vector to bypass auditing, so it is preferable that the function always exist.) ``importlib.util.open_for_import(path)`` should at a minimum always return ``_io.open(path, 'rb')``. Code using the function should make no further assumptions about what may occur, and implementations other than CPython are not required to let developers override the behavior of this function with a hook. Suggested Audit Hook Locations ============================== The locations and parameters in calls to ``sys.audit()`` or ``PySys_Audit()`` are to be determined by individual Python implementations. This is to allow maximum freedom for implementations to expose the operations that are most relevant to their platform, and to avoid or ignore potentially expensive or noisy events. Table 1 acts as both suggestions of operations that should trigger audit events on all implementations, and examples of event schemas. Table 2 provides further examples that are not required, but are likely to be available in CPython. Refer to the documentation associated with your version of Python to see which operations provide audit events. .. csv-table:: Table 1: Suggested Audit Hooks :header: "API Function", "Event Name", "Arguments", "Rationale" :widths: 2, 2, 3, 6 ``PySys_AddAuditHook``, ``sys.addaudithook``, "", "Detect when new audit hooks are being added. " ``PyImport_SetOpenForImportHook``, ``setopenforimporthook``, "", " Detects any attempt to set the ``open_for_import`` hook. " "``compile``, ``exec``, ``eval``, ``PyAst_CompileString``, ``PyAST_obj2mod``", ``compile``, "``(code, filename_or_none)``", " Detect dynamic code compilation, where ``code`` could be a string or AST. Note that this will be called for regular imports of source code, including those that were opened with ``open_for_import``. " "``exec``, ``eval``, ``run_mod``", ``exec``, "``(code_object,)``", " Detect dynamic execution of code objects. This only occurs for explicit calls, and is not raised for normal function invocation. " ``import``, ``import``, "``(module, filename, sys.path, sys.meta_path, sys.path_hooks)``", "Detect when modules are imported. This is raised before the module name is resolved to a file. All arguments other than the module name may be ``None`` if they are not used or available. " "``open``", ``open``, "``(path, mode, flags)``", "Detect when a file is about to be opened. *path* and *mode* are the usual parameters to ``open`` if available, while *flags* is provided instead of *mode* in some cases. " ``PyEval_SetProfile``, ``sys.setprofile``, "", "Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that ``threading.setprofile`` eventually calls this function, so the event will be audited for each thread. " ``PyEval_SetTrace``, ``sys.settrace``, "", "Detect when code is injecting trace functions. Because of the implementation, exceptions raised from the hook will abort the operation, but will not be raised in Python code. Note that ``threading.settrace`` eventually calls this function, so the event will be audited for each thread. " "``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``, ``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``"," ``object.__setattr__``","``(object, attr, value)``","Detect monkey patching of types and objects. This event is raised for the ``__class__`` attribute and any attribute on ``type`` objects. " "``_PyObject_GenericSetAttr``",``object.__delattr__``,"``(object, attr)``","Detect deletion of object attributes. This event is raised for any attribute on ``type`` objects. " "``Unpickler.find_class``",``pickle.find_class``,"``(module_name, global_name)``","Detect imports and global name lookup when unpickling. " .. csv-table:: Table 2: Potential CPython Audit Hooks :header: "API Function", "Event Name", "Arguments", "Rationale" :widths: 2, 2, 3, 6 ``_PySys_ClearAuditHooks``, ``sys._clearaudithooks``, "", "Notifies hooks they are being cleaned up, mainly in case the event is triggered unexpectedly. This event cannot be aborted. " ``code_new``, ``code.__new__``, "``(bytecode, filename, name)``", " Detect dynamic creation of code objects. This only occurs for direct instantiation, and is not raised for normal compilation. " ``func_new_impl``, ``function.__new__``, "``(code,)``", "Detect dynamic creation of function objects. This only occurs for direct instantiation, and is not raised for normal compilation. " "``_ctypes.dlopen``, ``_ctypes.LoadLibrary``", ``ctypes.dlopen``, " ``(module_or_path,)``", "Detect when native modules are used. " ``_ctypes._FuncPtr``, ``ctypes.dlsym``, "``(lib_object, name)``", " Collect information about specific symbols retrieved from native modules. " ``_ctypes._CData``, ``ctypes.cdata``, "``(ptr_as_int,)``", "Detect when code is accessing arbitrary memory using ``ctypes``. " "``new_mmap_object``",``mmap.__new__``,"``(fileno, map_size, access, offset)``", "Detects creation of mmap objects. On POSIX, access may have been calculated from the ``prot`` and ``flags`` arguments. " ``sys._getframe``, ``sys._getframe``, "``(frame_object,)``", "Detect when code is accessing frames directly. " ``sys._current_frames``, ``sys._current_frames``, "", "Detect when code is accessing frames directly. " "``socket.bind``, ``socket.connect``, ``socket.connect_ex``, ``socket.getaddrinfo``, ``socket.getnameinfo``, ``socket.sendmsg``, ``socket.sendto``", ``socket.address``, "``(address,)``", "Detect access to network resources. The address is unmodified from the original call. " "``member_get``, ``func_get_code``, ``func_get_[kw]defaults`` ",``object.__getattr__``,"``(object, attr)``","Detect access to restricted attributes. This event is raised for any built-in members that are marked as restricted, and members that may allow bypassing imports. " "``urllib.urlopen``",``urllib.Request``,"``(url, data, headers, method)``", "Detects URL requests. " Performance Impact ================== The important performance impact is the case where events are being raised but there are no hooks attached. This is the unavoidable case - once a developer has added audit hooks they have explicitly chosen to trade performance for functionality. Performance impact with hooks added are not of interest here, since this is opt-in functionality. Analysis using the Python Performance Benchmark Suite [1]_ shows no significant impact, with the vast majority of benchmarks showing between 1.05x faster to 1.05x slower. In our opinion, the performance impact of the set of auditing points described in this PEP is negligible. Rejected Ideas ============== Separate module for audit hooks ------------------------------- The proposal is to add a new module for audit hooks, hypothetically ``audit``. This would separate the API and implementation from the ``sys`` module, and allow naming the C functions ``PyAudit_AddHook`` and ``PyAudit_Audit`` rather than the current variations. Any such module would need to be a built-in module that is guaranteed to always be present. The nature of these hooks is that they must be callable without condition, as any conditional imports or calls provide opportunities to intercept and suppress or modify events. Given it is one of the most core modules, the ``sys`` module is somewhat protected against module shadowing attacks. Replacing ``sys`` with a sufficiently functional module that the application can still run is a much more complicated task than replacing a module with only one function of interest. An attacker that has the ability to shadow the ``sys`` module is already capable of running arbitrary code from files, whereas an ``audit`` module could be replaced with a single line in a ``.pth`` file anywhere on the search path:: import sys; sys.modules['audit'] = type('audit', (object,), {'audit': lambda *a: None, 'addhook': lambda *a: None}) Multiple layers of protection already exist for monkey patching attacks against either ``sys`` or ``audit``, but assignments or insertions to ``sys.modules`` are not audited. This idea is rejected because it makes it trivial to suppress all calls to ``audit``. Flag in sys.flags to indicate "audited" mode -------------------------------------------- The proposal is to add a value in ``sys.flags`` to indicate when Python is running in a "secure" or "audited" mode. This would allow applications to detect when some features are enabled or when hooks have been added and modify their behaviour appropriately. Currently, we are not aware of any legitimate reasons for a program to behave differently in the presence of audit hooks. Both application-level APIs ``sys.audit`` and ``importlib.util.open_for_import`` are always present and functional, regardless of whether the regular ``python`` entry point or some alternative entry point is used. Callers cannot determine whether any hooks have been added (except by performing side-channel analysis), nor do they need to. The calls should be fast enough that callers do not need to avoid them, and the program is responsible for ensuring that any added hooks are fast enough to not affect application performance. The argument that this is "security by obscurity" is valid, but irrelevant. Security by obscurity is only an issue when there are no other protective mechanisms; obscurity as the first step in avoiding attack is strongly recommended (see `this article <https://danielmiessler.com/study/security-by-obscurity/>`_ for discussion). This idea is rejected because there are no appropriate reasons for an application to change its behaviour based on whether these APIs are in use. Why Not A Sandbox ================= Sandboxing CPython has been attempted many times in the past, and each past attempt has failed. Fundamentally, the problem is that certain functionality has to be restricted when executing the sandboxed code, but otherwise needs to be available for normal operation of Python. For example, completely removing the ability to compile strings into bytecode also breaks the ability to import modules from source code, and if it is not completely removed then there are too many ways to get access to that functionality indirectly. There is not yet any feasible way to generically determine whether a given operation is "safe" or not. Further information and references available at [2]_. This proposal does not attempt to restrict functionality, but simply exposes the fact that the functionality is being used. Particularly for intrusion scenarios, detection is significantly more important than early prevention (as early prevention will generally drive attackers to use an alternate, less-detectable, approach). The availability of audit hooks alone does not change the attack surface of Python in any way, but they enable defenders to integrate Python into their environment in ways that are currently not possible. Since audit hooks have the ability to safely prevent an operation occuring, this feature does enable the ability to provide some level of sandboxing. In most cases, however, the intention is to enable logging rather than creating a sandbox. Relationship to PEP 551 ======================= This API was originally presented as part of `PEP 551 <https://www.python.org/dev/peps/pep-0551/>`_ Security Transparency in the Python Runtime. For simpler review purposes, and due to the broader applicability of these APIs beyond security, the API design is now presented separately. PEP 551 is an informational PEP discussing how to integrate Python into a secure or audited environment. References ========== .. [1] Python Performance Benchmark Suite `<https://github.com/python/performance>`_ .. [2] Python Security model - Sandbox `<https://python-security.readthedocs.io/security.html#sandbox>`_ Copyright ========= Copyright (c) 2019 by Microsoft Corporation. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).

The implementation can be viewed as a PR at https://github.com/python/cpython/pull/12613 On 28Mar2019 1535, Steve Dower wrote:

Hi, I read quickly the PEP, I'm not sure that I understood it correctly, so here are some early questions more about the usage of the PEP, than its implementation.
I don't understand well the overall security model. If malicious behaviors can still occur, what is the the purpose of auditing? For example, if an audit hook writes events into a local log file, the attacker can easily remove this log file, no? While you say that it's not a sandbox, you designed multiple protections to protect auditing, and the design is very close to a sandbox. Example: "``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``, ``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``"," ``object.__setattr__``","``(object, attr, value)``","Detect monkey patching of types and objects. This event is raised for the ``__class__`` attribute and any attribute on ``type`` objects. It reminds me Python 2 Bastion module which has simply been removed because it was unsafe: https://docs.python.org/2/library/bastion.html For example, using ctypes, you can access directly the underlying dictionary of a type and modify "private" attributes. It's just an example. I wrote pysandbox in the past, and it was broken by default, as you wrote in the PEP :-) The failure of pysandbox (2013) https://lwn.net/Articles/574215/ If you want to secure Python, run Python in a sandbox, don't try to "add" a sandbox "on top" of Python (I know that it's more complicated in practice). Or use PyPy sandbox which has a very different design. Le jeu. 28 mars 2019 à 23:39, Steve Dower <steve.dower@python.org> a écrit :
Is it possible to implement these features without adding a new API or modifying Python? Short example adding logs to open(): --- import builtins import functools def add_log(name, func): @functools.wraps(func) def wrapper(*args): print(name, args) return func(*args) return wrapper builtins.open = add_log("open", builtins.open) open(__file__).close() ---
In my experience, it doesn't work just because Python has too many functions opening files indirectly or call external C libraries which open files. I vaguely recall an exploit in my pysandbox project which uses the internal code of Python which displays a traceback... to read the content of an arbitrary file on the disk :-( Game over. I would never expect that there are so many ways to read a file in Python... Even when I restricted pysandbox to the bare minimum of the Python language (with no import), multiple exploits have been found. Moreover, at the end, Python just became useful. More generally, there are a lot of codes in Python which allow arbitrary code injection :-( (Most of them are now fixed, hopefully!) I did my best to modify as much functions as possible to implement the PEP 446 "Make newly created file descriptors non-inheritable", but I know that *many* functions call directly open() or fopen() and so create inheritable file descriptors. For example, the Python ssl module takes directly filenames and OpenSSL open directly files. It's just one example. You will never be able to cover all cases. Having a single function which allows to open an arbitrary file without triggering an audit event would defeat the whole purpose of auditing, no? Again, maybe I didn't understand well the overall purpose of the PEP, sorry.
(The Linux kernel uses advance tooling to inject hooks: it has no impact on performances when no hook is used. Machine code of functions is patched to inject a hook. Impressive stuff :-)) Here I expect a small overhead. But the global overhead will be proportional to the number of hooks, no? Maybe it's not significant with the proposed list of events, but it will be more significant with 100 or 1000 events? I'm not saying that it's a blocker issue, I'm just thinking aloud to make sure that I understood correctly :-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.

On 29/03/2019 01.02, Victor Stinner wrote:
An attacker may not have permission to mess with the auditing subsystem. For example an attacker may be able to modify an application like a web server or web application. Audit loggers typically run in a different, more protected context. On modern, hardened operation systems root / Adminstrator aren't all powerful, too. They are typically restricted by additional policies like SELinux. Further more, servers also send auditing data to remote nodes for analysis. Keep in mind that auditing is not primarily about preventing compromises. It's about detecting what, when, who, and how a system was compromised.
The verified open hook is not about sandboxing. It's a mechanism to prevent a class of attacks like directory traversal attacks. On Linux, the open-for-import hook could refuse access to .py and .pyc files that do not have the user.python_code or root.python_code extended file attribute. This verified open hook could have prevent the compromise of wiki.python.org many years ago.
I agree. Don't draw the wrong conclusion from your statement. PEP 578 adds hooks for auditing, which in return can be used to harden and log an application. Unlike secure sandboxing, it doesn't have to be perfect. Alex Gaynor summed this up in his blog post https://alexgaynor.net/2018/jul/20/worst-truism-in-infosec/
This case can be detected during development and QE phase. You simply have to count the amount of open syscalls and compare it to the amount of open auditing events.
The performance impact can be remedied and reduced with a simple check. If there is no audit hook installed, it's just a matter of a pointer deref + JNZ. Christian

Thanks Christian for responding - I endorse and support all your comments. (I'd hoped that by explicitly saying "this is not a sandbox" it would avoid people thinking it was a sandbox, but apparently I would have been better just to avoid the keyword completely...) On 29Mar2019 0324, Christian Heimes wrote:
Yep, the performance case we care about is when there are no hooks attached, since that's the only time a user cannot do anything to improve performance themselves. See the "Performance Impact" section in the PEP. In my implementation the cost is about as low as I can make it - see https://github.com/python/cpython/pull/12613/files#diff-f38879f4833a6b6847e5... (looking at it again I can probably avoid the exception preservation and a few conditionals at the end) Basically, PySys_Audit takes a format string and arguments, rather than making callers eagerly construct the tuple that gets passed to the hook, and only actually allocates when there is a hook to call. There aren't even any Py_INCREF's if there are no hooks. And as Christian says, it's a deref+JNZ. Now, if someone has implemented a hook and that hook has performance issues, yeah things will slow down. In general, the places where we are interested in hooks is where calls are being made into the operating system, so most of them will also involve a few syscalls and the cost of the hook should be minimal in comparison. But there isn't another way to provide the functionality - offloading it to the OS just means the OS is going to suffer the performance penalty, so it really is just moving the blame elsewhere. I dislike playing that game. Cheers, Steve

On Thu, Mar 28, 2019 at 5:03 PM Victor Stinner <vstinner@redhat.com> wrote:
Hi,
I read quickly the PEP
I would like to encourage everyone to read PEPs so that they never feel the need to write those words ever again. ;) PEPs are never decided in less than 24 hours, so there is no rush to read a PEP as quickly as possible in order to reply ASAP. We also have so much volume as it is when discussing PEPs that I think we should be encouraging people to take the time to be informed by reading thoroughly before replying so the back-and-forth is minimized and optimized for impactful discussions (personally, I would love it if we all aimed for one, thorough response/day when discussing PEPs, but that's just me). Otherwise we end up with way more time spent in replies to things that would not have been necessary to ask if we took our time reading. Remember, every email you send is read by tons of other people and so there's a real time commitment you're asking of the world every time you hit that Reply button.

On 28/03/2019 23.35, Steve Dower wrote:
Hi Steve, I wonder if the hooks could be replaced by a more efficient mechanism. These days, Linux, macOS, and most recently Windows [1] support dtrace probes. DTrace is a very powerful and efficient mechanism to trace user-space processes from Kernel space. At least we should consider to add DTrace probes to the auditing framework. Regards, Christian [1] https://techcommunity.microsoft.com/t5/Windows-Kernel-Internals/DTrace-on-Wi...

On 29Mar2019 0334, Christian Heimes wrote:
Calling into those frameworks will still require as much work as these hooks do, and also make it very difficult to do things like auditing unit tests or using pure-Python code when you're not concerned about initialization (e.g. in a long-running web server). So I'm inclined to say that if you want those probes, you can enable them by adding a hook that calls into them? A similar argument is made for using ETW on Windows (which will work on versions of Windows that have been released, unlike DTrace) (and yes, this is a real argument I've already had over this proposal ;) ), so I really think leaving it open-ended and Python-specific is the best approach. (Reading further down the link you provided, it seems DTrace in Windows will only be enabled for essentially-administrators. So that rules it out as a substitute for this proposal in my opinion.) Cheers, Steve

Like in the mktemp thread earlier, I would request a threat model (what use cases are supposed to be protected (in this case, by reporting rather than preventing) and from what threats) -- in the discussion, and eventually, in the PEP. Without one, any claims and talks about whether something would be an effective security measure are pointless -- 'cuz you would never know if you accounted for everything and would not even have the definition of that "everything". On 29.03.2019 1:35, Steve Dower wrote:
-- Regards, Ivan

PEP 551 (referenced from this one) contains information about using these hooks for security purposes, along with other approaches to minimize the risk of having Python in your production environments. Threat models have to be designed by the user; we can't predict what it looks like for the incredibly diverse user base we have. This PEP is explicitly only about the API changes. Cheers, Steve On 29Mar2019 1044, Ivan Pozdeev via Python-Dev wrote:

On 28/03/2019 23.35, Steve Dower wrote:
[...]
I think that the import hook needs to be extended. It only works for simple Python files or pyc files. There are at least two other important scenarios: zipimport and shared libraries. For example how does the importhook work in regarding of alternative importers like zipimport? What does the import hook 'see' for an import from a zipfile? Shared libraries are trickier. libc doesn't define a way to dlopen() from a file descriptor. dlopen() takes a file name, but a file name leaves the audit hook open to a TOCTOU attack. Christian

On 29Mar2019 1218, Christian Heimes wrote:
Yes, good point. I think opening the zip file with open_for_import() is the right place to do it, as this operation relates to opening the file on disk rather than files within it.
For Windows, at least, the operating system can run its own validation on native modules (if you're using a feature like DeviceGuard, for example), so the hook likely isn't necessary for those purposes. I believe some configurations of Linux allow this as well? But there's likely no better option here than a combination of good ACLs and checking by filename, which at least lets you whitelist the files you know you want to allow. Similarly for the zip file - if you trust a particular file and trust your ACLs, checking by filename is fine. That said, specific audit events for "I'm about to open this zip/dlopen this file for import" are very easy to add. (The PEP proposes many examples, but is not trying to be exhaustive. If accepted, we should feel free to add new events as we identify places where they matter.) Aside: an important aspect of this per-file approach to execution is that the idea is generally to *enable* the files you trust, rather than disable the files that are bad. So the detection routines are typically "does this match a known hash" or "is this in a secure location", which for a carefully deployed system are already known values, rather than trying to figure out whether a file might do a bad thing. If you can't validate the files in your deployment match the ones you thought you were deploying, you are so far from needing this that it doesn't even matter, but most of the deployments I work with are *at least* this well controlled. Cheers, Steve

On Sat, 30 Mar 2019 at 06:14, Steve Dower <steve.dower@python.org> wrote:
I like this PEP in principle, but the specific "open_for_import" name bothers me a lot, as it implies that "importing" is the only situation where a file will be opened for code execution. Accordingly, I think proceeding with that name will increase the risk of unintentional audit bypasses, as folks (both within the core development team and outside it) use the regular open function for data that they then pass to exec(), or for some other purpose that nevertheless allows arbitrary code execution (e.g. the shelve module loading pickle files from disk). If this part of the API were lower down the stack (e.g. "_io.open_for_code_execution") then I think it would make more sense - APIs like tokenize.open(), runpy.run_path(), PyRun_SimpleFile(), shelve, etc, could use that, without having to introduce a dependency on importlib to get access to the functionality. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 30Mar.2019 0747, Nick Coghlan wrote:
It was called "open_for_exec" at one point, though I forget exactly why we changed it. But I have no problem with moving it. Something like this? PyImport_OpenForImport -> PyIO_OpenForExec PyImport_SetOpenForImportHook -> PyIO_SetOpenForExecHook importlib.util.open_for_import -> _io.open_for_exec Or more in line with Nick's suggestion: PyImport_OpenForImport -> PyIO_OpenExecutableCode PyImport_SetOpenForImportHook -> PyIO_SetOpenExecutableCodeHook importlib.util.open_for_import -> _io.open_executable_code I dropped "For", but I don't really care that much about the name. I'd be okay dropping either "executable" or "code" as well - I don't really have a good sense of which will make people more likely to use this correctly. Cheers, Steve

On 30Mar2019 0913, Steve Dower wrote:
Looking at what we already have, perhaps putting it under "PyFile_OpenForExecute" would make the most sense? We don't currently have any public "PyIO" types or functions. Bikeshedding now, but as I'm the only one really participating in it, I think it's allowed :) Cheers, Steve

On 29/03/2019 21.10, Steve Dower wrote:
+1
The Linux effort is called Integrity Measurement Architecture (IMA) and Linux Extended Verification Module (EVM). I have no practical experience with IMA yet. I don't like the fact that the PEP requires users to learn and use an additional layer to handle native code. Although we cannot provide a fully secure hook for native code, we could at least try to provide a best effort hook and document the limitations. A bit more information would make the verified open function more useful, too. PyObject *PyImport_OpenForExecution( const char *path, const char *intent, int flags, PyObject *context ) - Path is an absolute (!) file path. The PEP doesn't specify if the file name is relative or absolute. IMO it should be always absolute. - The new intent argument lets the caller pass information how it intents to use the file, e.g. pythoncode, zipimport, nativecode (for loading a shared library/DLL), ctypes, ... This allows the verify hook to react on the intent and provide different verifications for e.g. Python code and native modules. - The flags argument is for additional flags, e.g. return an opened file or None, open the file in text or binary mode, ... - Context is an optional Python object from the caller's context. For the import system, it could be the loader instance. Examples: PyImport_OpenForImport( '/usr/lib64/python3.7/__pycache__/os.cpython-37.pyc', 'bytecode', PY_IMPORT_OPENFILE, SourceFileLoader_instance ) -> fileobject PyImport_OpenForImport( '/lib64/libc.so.6', 'ctypes', PY_IMPORT_NONE, NULL ) -> None
Absolutely! On Linux, trust settings could be stored in extended file attributes. Linux has multiple namespaces for extended attributes. User attributes can be modified by every process that has write permission to an inode. The security namespace is only writable for processes with CAP_SYS_ADMIN. A secure loader could check for presence of 'user.org.python.bytecode' attribute or compare the content of the file to hashsum in 'security.org.python.bytecode.sha256'. Christian

On 31Mar2019 0538, Christian Heimes wrote:
So instead they need to learn a significantly more complicated API? :) (I was very happy to be able to say "it's the same as open(p, 'rb')").
Yeah, this is fair enough. I'll add it as a requirement.
I had an intent argument at one point and the feedback I got (from teams who wanted to implement it) is that they wouldn't trust it anyway :) In each case there should be associated audit events for tracking the intent (and interrupting at that point if it doesn't like the intended action), but for the simple case of "let me open this specific file" it doesn't really add much. And it almost certainly shouldn't impact decision making.
- The flags argument is for additional flags, e.g. return an opened file or None, open the file in text or binary mode, ...
This just makes it harder for the hook implementer - now you have to allow encoding/errors arguments and probably more. And as mentioned above, there should be an audit event showing the intent before this call, and a hook can reject it at that point (rather than verify without actually returning the verified content).
- Context is an optional Python object from the caller's context. For the import system, it could be the loader instance.
I think the audit event covers this, unless you have some way of using this context in mind that I can't think of? Cheers, Steve

Sorry, I forgot to reply. Do you think it would make sense to split the PEP into two PEPs? The auditing hook and import opener hook are related, but distinct improvements. The auditing part looks solid and ready now. The import opener may need some more refinement. I would also like to get feedback from some Linux Kernel security engineers first. On 01/04/2019 18.31, Steve Dower wrote:
There is no need to trust the intent flag that much. I would like to have a way to further narrow down the scope for an open call. This would allow the caller to tell the hook "I want to open something that should be a shared library suitable for ctypes". It would allow tighter control. Audit events are useful and powerful. But I don't want to put too much burden on the auditing framwork. I prefer to have checks that prevent operations rather than allow operations and audit them.
I retract this part of my proposal. With O_MAYEXEC it's better to always open the file, but then use the file's FD to retrieve the actual file name for dlopen(). That approach allows the Kernel to verify DAC permissions, prevents memfd_create() hacks through readlink, and simplifies the hook. * Linux: readlink("/proc/self/fd/%i") * macOS: fcntl F_GETPATH * Windows: GetFileInformationByHandleEx
To be honest I don't have a good use case yet. I just like the idea to have a way to pass some custom thing into an API and now who called an API. You seem to like it, too. Your hook has a void *userData, but it's not passed into the Python function. :) int PyImport_SetOpenForImportHook(hook_func handler, void *userData) Christian

On 16Apr2019 0532, Christian Heimes wrote:
That will make three PEPs... The only question for the security engineers is "how much context do you need from the calling process", as that's the only thing that will affect the API. It doesn't have to have any implementation right now. And so far, all the context that's been proposed is "may be executed", which is already implied in the open_code() call. I haven't heard any more requests than "give us the filename and let us return the open (and exclusive) handle/descriptor", so this feels like YAGNI.
But those don't go through open(), they'll go through dlopen(), right? It's already a totally different code path from "open and read arbitrary bytes".
Right, and this is the default position for security defenders (to try and block things) ;) Auditing has been found to be a working balance
There is no Python function for the (now named) open_code hook. It can only be set as a C function by an embedder, and that's when the userData is provided. Nothing to do with each individual call - just one value per CPython runtime. (Similarly with the audit hook, but for the Python hooks you can pass a closure or a method - in C you need a separate pointer for this.) Cheers, Steve

I don't like adding more Python callback from low level. Python runtime is very complicated already, especially __del__, shutdown process, and multi threading. Python callback from low level is source of very difficult bugs always. Additionally, if we used the PEP for logging complex application, the log will be unreliable. For example: 1. Want to open file A in C code, call callback. 2. In the callback, "A is opened" is logged. 3. In the same callback, import may be happened and logged. 4. In the same callback, other thread may be run and any thing can be logged. 5. Many many other things happens and callback is called. 6. Then, open the file A. In this example, logged event ordering and timing is very different from real event ordering and timing. I prefer low level tool to trace low level thing, although it lacks some application context. Maybe, DTrace will become more important tool. https://techcommunity.microsoft.com/t5/Windows-Kernel-Internals/DTrace-on-Wi... Regards, -- Inada Naoki <songofacandy@gmail.com>

On 29Mar.2019 2020, Inada Naoki wrote:
Asynchronous callbacks, sure. These are all synchronous and always as part of a Python operation while the GIL is held (I think putting them immediately after parameter validation is the best place, but am hesitant to make that a hard rule).
Additionally, if we used the PEP for logging complex application, the log will be unreliable.
I think "then don't do that" is a valid response here. There are plenty of ways to use Python features incorrectly and cause problems (I've diagnosed many of them :) ). This doesn't make that any more or less true.
Yep. If you implement a bad hook then you get bad results. But at least the user can fix that themselves (since they caused it themselves). I've already used these hooks for a number of useful purposes, and it isn't difficult to write one that works just fine.
See my other reply about this. DTrace on Windows is a very long way from being suitable for production environments. Cheers, Steve

On 28/03/2019 23.35, Steve Dower wrote:
Hi Steve, (memory dump before I go to bed) Steve Grubb from Red Hat security pointed me to some interesting things [1]. For instance there is some work on a new O_MAYEXEC flag for open(). Steve came to similar conclusions like we, e.g. streaming code from stdin is insecure. I think it would be also beneficial to have auditing events for the import system to track when sys.path or import loaders are changed. Christian [1] https://marc.info/?l=linux-fsdevel&m=155535414414626&w=2

On 15Apr2019 1344, Christian Heimes wrote:
Thanks for the pointer! Using this for open_code() by default on platforms that support it might be a good opportunity in the future. But I'm glad I'm not the only one who thinks this is the right approach :)
I think it would be also beneficial to have auditing events for the import system to track when sys.path or import loaders are changed.
Already in there (kind of... the "import" events include the contents of the sys properties that are about to be used to resolve it - since these are plain-old lists, and can be easily reassigned, passing them through here allows you to add a check if you really want it but otherwise not pay the cost of replacing the sys module with a special implementation and its attributes with special lists). Cheers, Steve

On 15/04/2019 23.17, Steve Dower wrote:
Here is the original patch on LWN with some links to presentations: https://lwn.net/Articles/774676/ The approach has one downside: The current user must have DAC executable permission for a file in order to open a file with O_MAYEXEC. That means we have to +x all Python files and PYC files, not just the files that are designed as entry points.
Yeah, it's complicated :/ Steve Grubb mentioned remote importers or hacks like mem_fd + dlopen() from /proc/self/fd as attack vectors. Mitigations and audit systems like IMA Appraisal only work properly if code has to hit the disk first. If an attacker or user can perform the equivalent of PROT_EXEC | PROT_WRITE, then IMA won't be able to 'see' the malicious code. https://www.tutorialspoint.com/How-to-use-remote-python-modules https://github.com/operatorequals/httpimport https://0x00sec.org/t/pure-python-in-memory-so-loading-without-shm/6453 https://github.com/nullbites/SnakeEater/blob/master/SnakeEater2.py Christian

The implementation can be viewed as a PR at https://github.com/python/cpython/pull/12613 On 28Mar2019 1535, Steve Dower wrote:

Hi, I read quickly the PEP, I'm not sure that I understood it correctly, so here are some early questions more about the usage of the PEP, than its implementation.
I don't understand well the overall security model. If malicious behaviors can still occur, what is the the purpose of auditing? For example, if an audit hook writes events into a local log file, the attacker can easily remove this log file, no? While you say that it's not a sandbox, you designed multiple protections to protect auditing, and the design is very close to a sandbox. Example: "``_PyObject_GenericSetAttr``, ``check_set_special_type_attr``, ``object_set_class``, ``func_set_code``, ``func_set_[kw]defaults``"," ``object.__setattr__``","``(object, attr, value)``","Detect monkey patching of types and objects. This event is raised for the ``__class__`` attribute and any attribute on ``type`` objects. It reminds me Python 2 Bastion module which has simply been removed because it was unsafe: https://docs.python.org/2/library/bastion.html For example, using ctypes, you can access directly the underlying dictionary of a type and modify "private" attributes. It's just an example. I wrote pysandbox in the past, and it was broken by default, as you wrote in the PEP :-) The failure of pysandbox (2013) https://lwn.net/Articles/574215/ If you want to secure Python, run Python in a sandbox, don't try to "add" a sandbox "on top" of Python (I know that it's more complicated in practice). Or use PyPy sandbox which has a very different design. Le jeu. 28 mars 2019 à 23:39, Steve Dower <steve.dower@python.org> a écrit :
Is it possible to implement these features without adding a new API or modifying Python? Short example adding logs to open(): --- import builtins import functools def add_log(name, func): @functools.wraps(func) def wrapper(*args): print(name, args) return func(*args) return wrapper builtins.open = add_log("open", builtins.open) open(__file__).close() ---
In my experience, it doesn't work just because Python has too many functions opening files indirectly or call external C libraries which open files. I vaguely recall an exploit in my pysandbox project which uses the internal code of Python which displays a traceback... to read the content of an arbitrary file on the disk :-( Game over. I would never expect that there are so many ways to read a file in Python... Even when I restricted pysandbox to the bare minimum of the Python language (with no import), multiple exploits have been found. Moreover, at the end, Python just became useful. More generally, there are a lot of codes in Python which allow arbitrary code injection :-( (Most of them are now fixed, hopefully!) I did my best to modify as much functions as possible to implement the PEP 446 "Make newly created file descriptors non-inheritable", but I know that *many* functions call directly open() or fopen() and so create inheritable file descriptors. For example, the Python ssl module takes directly filenames and OpenSSL open directly files. It's just one example. You will never be able to cover all cases. Having a single function which allows to open an arbitrary file without triggering an audit event would defeat the whole purpose of auditing, no? Again, maybe I didn't understand well the overall purpose of the PEP, sorry.
(The Linux kernel uses advance tooling to inject hooks: it has no impact on performances when no hook is used. Machine code of functions is patched to inject a hook. Impressive stuff :-)) Here I expect a small overhead. But the global overhead will be proportional to the number of hooks, no? Maybe it's not significant with the proposed list of events, but it will be more significant with 100 or 1000 events? I'm not saying that it's a blocker issue, I'm just thinking aloud to make sure that I understood correctly :-) Victor -- Night gathers, and now my watch begins. It shall not end until my death.

On 29/03/2019 01.02, Victor Stinner wrote:
An attacker may not have permission to mess with the auditing subsystem. For example an attacker may be able to modify an application like a web server or web application. Audit loggers typically run in a different, more protected context. On modern, hardened operation systems root / Adminstrator aren't all powerful, too. They are typically restricted by additional policies like SELinux. Further more, servers also send auditing data to remote nodes for analysis. Keep in mind that auditing is not primarily about preventing compromises. It's about detecting what, when, who, and how a system was compromised.
The verified open hook is not about sandboxing. It's a mechanism to prevent a class of attacks like directory traversal attacks. On Linux, the open-for-import hook could refuse access to .py and .pyc files that do not have the user.python_code or root.python_code extended file attribute. This verified open hook could have prevent the compromise of wiki.python.org many years ago.
I agree. Don't draw the wrong conclusion from your statement. PEP 578 adds hooks for auditing, which in return can be used to harden and log an application. Unlike secure sandboxing, it doesn't have to be perfect. Alex Gaynor summed this up in his blog post https://alexgaynor.net/2018/jul/20/worst-truism-in-infosec/
This case can be detected during development and QE phase. You simply have to count the amount of open syscalls and compare it to the amount of open auditing events.
The performance impact can be remedied and reduced with a simple check. If there is no audit hook installed, it's just a matter of a pointer deref + JNZ. Christian

Thanks Christian for responding - I endorse and support all your comments. (I'd hoped that by explicitly saying "this is not a sandbox" it would avoid people thinking it was a sandbox, but apparently I would have been better just to avoid the keyword completely...) On 29Mar2019 0324, Christian Heimes wrote:
Yep, the performance case we care about is when there are no hooks attached, since that's the only time a user cannot do anything to improve performance themselves. See the "Performance Impact" section in the PEP. In my implementation the cost is about as low as I can make it - see https://github.com/python/cpython/pull/12613/files#diff-f38879f4833a6b6847e5... (looking at it again I can probably avoid the exception preservation and a few conditionals at the end) Basically, PySys_Audit takes a format string and arguments, rather than making callers eagerly construct the tuple that gets passed to the hook, and only actually allocates when there is a hook to call. There aren't even any Py_INCREF's if there are no hooks. And as Christian says, it's a deref+JNZ. Now, if someone has implemented a hook and that hook has performance issues, yeah things will slow down. In general, the places where we are interested in hooks is where calls are being made into the operating system, so most of them will also involve a few syscalls and the cost of the hook should be minimal in comparison. But there isn't another way to provide the functionality - offloading it to the OS just means the OS is going to suffer the performance penalty, so it really is just moving the blame elsewhere. I dislike playing that game. Cheers, Steve

On Thu, Mar 28, 2019 at 5:03 PM Victor Stinner <vstinner@redhat.com> wrote:
Hi,
I read quickly the PEP
I would like to encourage everyone to read PEPs so that they never feel the need to write those words ever again. ;) PEPs are never decided in less than 24 hours, so there is no rush to read a PEP as quickly as possible in order to reply ASAP. We also have so much volume as it is when discussing PEPs that I think we should be encouraging people to take the time to be informed by reading thoroughly before replying so the back-and-forth is minimized and optimized for impactful discussions (personally, I would love it if we all aimed for one, thorough response/day when discussing PEPs, but that's just me). Otherwise we end up with way more time spent in replies to things that would not have been necessary to ask if we took our time reading. Remember, every email you send is read by tons of other people and so there's a real time commitment you're asking of the world every time you hit that Reply button.

On 28/03/2019 23.35, Steve Dower wrote:
Hi Steve, I wonder if the hooks could be replaced by a more efficient mechanism. These days, Linux, macOS, and most recently Windows [1] support dtrace probes. DTrace is a very powerful and efficient mechanism to trace user-space processes from Kernel space. At least we should consider to add DTrace probes to the auditing framework. Regards, Christian [1] https://techcommunity.microsoft.com/t5/Windows-Kernel-Internals/DTrace-on-Wi...

On 29Mar2019 0334, Christian Heimes wrote:
Calling into those frameworks will still require as much work as these hooks do, and also make it very difficult to do things like auditing unit tests or using pure-Python code when you're not concerned about initialization (e.g. in a long-running web server). So I'm inclined to say that if you want those probes, you can enable them by adding a hook that calls into them? A similar argument is made for using ETW on Windows (which will work on versions of Windows that have been released, unlike DTrace) (and yes, this is a real argument I've already had over this proposal ;) ), so I really think leaving it open-ended and Python-specific is the best approach. (Reading further down the link you provided, it seems DTrace in Windows will only be enabled for essentially-administrators. So that rules it out as a substitute for this proposal in my opinion.) Cheers, Steve

Like in the mktemp thread earlier, I would request a threat model (what use cases are supposed to be protected (in this case, by reporting rather than preventing) and from what threats) -- in the discussion, and eventually, in the PEP. Without one, any claims and talks about whether something would be an effective security measure are pointless -- 'cuz you would never know if you accounted for everything and would not even have the definition of that "everything". On 29.03.2019 1:35, Steve Dower wrote:
-- Regards, Ivan

PEP 551 (referenced from this one) contains information about using these hooks for security purposes, along with other approaches to minimize the risk of having Python in your production environments. Threat models have to be designed by the user; we can't predict what it looks like for the incredibly diverse user base we have. This PEP is explicitly only about the API changes. Cheers, Steve On 29Mar2019 1044, Ivan Pozdeev via Python-Dev wrote:

On 28/03/2019 23.35, Steve Dower wrote:
[...]
I think that the import hook needs to be extended. It only works for simple Python files or pyc files. There are at least two other important scenarios: zipimport and shared libraries. For example how does the importhook work in regarding of alternative importers like zipimport? What does the import hook 'see' for an import from a zipfile? Shared libraries are trickier. libc doesn't define a way to dlopen() from a file descriptor. dlopen() takes a file name, but a file name leaves the audit hook open to a TOCTOU attack. Christian

On 29Mar2019 1218, Christian Heimes wrote:
Yes, good point. I think opening the zip file with open_for_import() is the right place to do it, as this operation relates to opening the file on disk rather than files within it.
For Windows, at least, the operating system can run its own validation on native modules (if you're using a feature like DeviceGuard, for example), so the hook likely isn't necessary for those purposes. I believe some configurations of Linux allow this as well? But there's likely no better option here than a combination of good ACLs and checking by filename, which at least lets you whitelist the files you know you want to allow. Similarly for the zip file - if you trust a particular file and trust your ACLs, checking by filename is fine. That said, specific audit events for "I'm about to open this zip/dlopen this file for import" are very easy to add. (The PEP proposes many examples, but is not trying to be exhaustive. If accepted, we should feel free to add new events as we identify places where they matter.) Aside: an important aspect of this per-file approach to execution is that the idea is generally to *enable* the files you trust, rather than disable the files that are bad. So the detection routines are typically "does this match a known hash" or "is this in a secure location", which for a carefully deployed system are already known values, rather than trying to figure out whether a file might do a bad thing. If you can't validate the files in your deployment match the ones you thought you were deploying, you are so far from needing this that it doesn't even matter, but most of the deployments I work with are *at least* this well controlled. Cheers, Steve

On Sat, 30 Mar 2019 at 06:14, Steve Dower <steve.dower@python.org> wrote:
I like this PEP in principle, but the specific "open_for_import" name bothers me a lot, as it implies that "importing" is the only situation where a file will be opened for code execution. Accordingly, I think proceeding with that name will increase the risk of unintentional audit bypasses, as folks (both within the core development team and outside it) use the regular open function for data that they then pass to exec(), or for some other purpose that nevertheless allows arbitrary code execution (e.g. the shelve module loading pickle files from disk). If this part of the API were lower down the stack (e.g. "_io.open_for_code_execution") then I think it would make more sense - APIs like tokenize.open(), runpy.run_path(), PyRun_SimpleFile(), shelve, etc, could use that, without having to introduce a dependency on importlib to get access to the functionality. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 30Mar.2019 0747, Nick Coghlan wrote:
It was called "open_for_exec" at one point, though I forget exactly why we changed it. But I have no problem with moving it. Something like this? PyImport_OpenForImport -> PyIO_OpenForExec PyImport_SetOpenForImportHook -> PyIO_SetOpenForExecHook importlib.util.open_for_import -> _io.open_for_exec Or more in line with Nick's suggestion: PyImport_OpenForImport -> PyIO_OpenExecutableCode PyImport_SetOpenForImportHook -> PyIO_SetOpenExecutableCodeHook importlib.util.open_for_import -> _io.open_executable_code I dropped "For", but I don't really care that much about the name. I'd be okay dropping either "executable" or "code" as well - I don't really have a good sense of which will make people more likely to use this correctly. Cheers, Steve

On 30Mar2019 0913, Steve Dower wrote:
Looking at what we already have, perhaps putting it under "PyFile_OpenForExecute" would make the most sense? We don't currently have any public "PyIO" types or functions. Bikeshedding now, but as I'm the only one really participating in it, I think it's allowed :) Cheers, Steve

On 29/03/2019 21.10, Steve Dower wrote:
+1
The Linux effort is called Integrity Measurement Architecture (IMA) and Linux Extended Verification Module (EVM). I have no practical experience with IMA yet. I don't like the fact that the PEP requires users to learn and use an additional layer to handle native code. Although we cannot provide a fully secure hook for native code, we could at least try to provide a best effort hook and document the limitations. A bit more information would make the verified open function more useful, too. PyObject *PyImport_OpenForExecution( const char *path, const char *intent, int flags, PyObject *context ) - Path is an absolute (!) file path. The PEP doesn't specify if the file name is relative or absolute. IMO it should be always absolute. - The new intent argument lets the caller pass information how it intents to use the file, e.g. pythoncode, zipimport, nativecode (for loading a shared library/DLL), ctypes, ... This allows the verify hook to react on the intent and provide different verifications for e.g. Python code and native modules. - The flags argument is for additional flags, e.g. return an opened file or None, open the file in text or binary mode, ... - Context is an optional Python object from the caller's context. For the import system, it could be the loader instance. Examples: PyImport_OpenForImport( '/usr/lib64/python3.7/__pycache__/os.cpython-37.pyc', 'bytecode', PY_IMPORT_OPENFILE, SourceFileLoader_instance ) -> fileobject PyImport_OpenForImport( '/lib64/libc.so.6', 'ctypes', PY_IMPORT_NONE, NULL ) -> None
Absolutely! On Linux, trust settings could be stored in extended file attributes. Linux has multiple namespaces for extended attributes. User attributes can be modified by every process that has write permission to an inode. The security namespace is only writable for processes with CAP_SYS_ADMIN. A secure loader could check for presence of 'user.org.python.bytecode' attribute or compare the content of the file to hashsum in 'security.org.python.bytecode.sha256'. Christian

On 31Mar2019 0538, Christian Heimes wrote:
So instead they need to learn a significantly more complicated API? :) (I was very happy to be able to say "it's the same as open(p, 'rb')").
Yeah, this is fair enough. I'll add it as a requirement.
I had an intent argument at one point and the feedback I got (from teams who wanted to implement it) is that they wouldn't trust it anyway :) In each case there should be associated audit events for tracking the intent (and interrupting at that point if it doesn't like the intended action), but for the simple case of "let me open this specific file" it doesn't really add much. And it almost certainly shouldn't impact decision making.
- The flags argument is for additional flags, e.g. return an opened file or None, open the file in text or binary mode, ...
This just makes it harder for the hook implementer - now you have to allow encoding/errors arguments and probably more. And as mentioned above, there should be an audit event showing the intent before this call, and a hook can reject it at that point (rather than verify without actually returning the verified content).
- Context is an optional Python object from the caller's context. For the import system, it could be the loader instance.
I think the audit event covers this, unless you have some way of using this context in mind that I can't think of? Cheers, Steve

Sorry, I forgot to reply. Do you think it would make sense to split the PEP into two PEPs? The auditing hook and import opener hook are related, but distinct improvements. The auditing part looks solid and ready now. The import opener may need some more refinement. I would also like to get feedback from some Linux Kernel security engineers first. On 01/04/2019 18.31, Steve Dower wrote:
There is no need to trust the intent flag that much. I would like to have a way to further narrow down the scope for an open call. This would allow the caller to tell the hook "I want to open something that should be a shared library suitable for ctypes". It would allow tighter control. Audit events are useful and powerful. But I don't want to put too much burden on the auditing framwork. I prefer to have checks that prevent operations rather than allow operations and audit them.
I retract this part of my proposal. With O_MAYEXEC it's better to always open the file, but then use the file's FD to retrieve the actual file name for dlopen(). That approach allows the Kernel to verify DAC permissions, prevents memfd_create() hacks through readlink, and simplifies the hook. * Linux: readlink("/proc/self/fd/%i") * macOS: fcntl F_GETPATH * Windows: GetFileInformationByHandleEx
To be honest I don't have a good use case yet. I just like the idea to have a way to pass some custom thing into an API and now who called an API. You seem to like it, too. Your hook has a void *userData, but it's not passed into the Python function. :) int PyImport_SetOpenForImportHook(hook_func handler, void *userData) Christian

On 16Apr2019 0532, Christian Heimes wrote:
That will make three PEPs... The only question for the security engineers is "how much context do you need from the calling process", as that's the only thing that will affect the API. It doesn't have to have any implementation right now. And so far, all the context that's been proposed is "may be executed", which is already implied in the open_code() call. I haven't heard any more requests than "give us the filename and let us return the open (and exclusive) handle/descriptor", so this feels like YAGNI.
But those don't go through open(), they'll go through dlopen(), right? It's already a totally different code path from "open and read arbitrary bytes".
Right, and this is the default position for security defenders (to try and block things) ;) Auditing has been found to be a working balance
There is no Python function for the (now named) open_code hook. It can only be set as a C function by an embedder, and that's when the userData is provided. Nothing to do with each individual call - just one value per CPython runtime. (Similarly with the audit hook, but for the Python hooks you can pass a closure or a method - in C you need a separate pointer for this.) Cheers, Steve

I don't like adding more Python callback from low level. Python runtime is very complicated already, especially __del__, shutdown process, and multi threading. Python callback from low level is source of very difficult bugs always. Additionally, if we used the PEP for logging complex application, the log will be unreliable. For example: 1. Want to open file A in C code, call callback. 2. In the callback, "A is opened" is logged. 3. In the same callback, import may be happened and logged. 4. In the same callback, other thread may be run and any thing can be logged. 5. Many many other things happens and callback is called. 6. Then, open the file A. In this example, logged event ordering and timing is very different from real event ordering and timing. I prefer low level tool to trace low level thing, although it lacks some application context. Maybe, DTrace will become more important tool. https://techcommunity.microsoft.com/t5/Windows-Kernel-Internals/DTrace-on-Wi... Regards, -- Inada Naoki <songofacandy@gmail.com>

On 29Mar.2019 2020, Inada Naoki wrote:
Asynchronous callbacks, sure. These are all synchronous and always as part of a Python operation while the GIL is held (I think putting them immediately after parameter validation is the best place, but am hesitant to make that a hard rule).
Additionally, if we used the PEP for logging complex application, the log will be unreliable.
I think "then don't do that" is a valid response here. There are plenty of ways to use Python features incorrectly and cause problems (I've diagnosed many of them :) ). This doesn't make that any more or less true.
Yep. If you implement a bad hook then you get bad results. But at least the user can fix that themselves (since they caused it themselves). I've already used these hooks for a number of useful purposes, and it isn't difficult to write one that works just fine.
See my other reply about this. DTrace on Windows is a very long way from being suitable for production environments. Cheers, Steve

On 28/03/2019 23.35, Steve Dower wrote:
Hi Steve, (memory dump before I go to bed) Steve Grubb from Red Hat security pointed me to some interesting things [1]. For instance there is some work on a new O_MAYEXEC flag for open(). Steve came to similar conclusions like we, e.g. streaming code from stdin is insecure. I think it would be also beneficial to have auditing events for the import system to track when sys.path or import loaders are changed. Christian [1] https://marc.info/?l=linux-fsdevel&m=155535414414626&w=2

On 15Apr2019 1344, Christian Heimes wrote:
Thanks for the pointer! Using this for open_code() by default on platforms that support it might be a good opportunity in the future. But I'm glad I'm not the only one who thinks this is the right approach :)
I think it would be also beneficial to have auditing events for the import system to track when sys.path or import loaders are changed.
Already in there (kind of... the "import" events include the contents of the sys properties that are about to be used to resolve it - since these are plain-old lists, and can be easily reassigned, passing them through here allows you to add a check if you really want it but otherwise not pay the cost of replacing the sys module with a special implementation and its attributes with special lists). Cheers, Steve

On 15/04/2019 23.17, Steve Dower wrote:
Here is the original patch on LWN with some links to presentations: https://lwn.net/Articles/774676/ The approach has one downside: The current user must have DAC executable permission for a file in order to open a file with O_MAYEXEC. That means we have to +x all Python files and PYC files, not just the files that are designed as entry points.
Yeah, it's complicated :/ Steve Grubb mentioned remote importers or hacks like mem_fd + dlopen() from /proc/self/fd as attack vectors. Mitigations and audit systems like IMA Appraisal only work properly if code has to hit the disk first. If an attacker or user can perform the equivalent of PROT_EXEC | PROT_WRITE, then IMA won't be able to 'see' the malicious code. https://www.tutorialspoint.com/How-to-use-remote-python-modules https://github.com/operatorequals/httpimport https://0x00sec.org/t/pure-python-in-memory-so-loading-without-shm/6453 https://github.com/nullbites/SnakeEater/blob/master/SnakeEater2.py Christian
participants (7)
-
Brett Cannon
-
Christian Heimes
-
Inada Naoki
-
Ivan Pozdeev
-
Nick Coghlan
-
Steve Dower
-
Victor Stinner