“__pysource__” file layout for installed modules

Hello, Here's a proposal to fix several niggles we found when distributing Python libraries in Fedora. What do you think? Do you face similar issues in other distros? You can also discuss at: https://discuss.python.org/t/pysource-file-layout-for-installed-modules/1459... Abstract ======== For modules loaded directly from bytecode cache (``*.pyc``) files, Python will look for corresponding source in a ``__pysource__`` directory. The existing ability to load modules from ``*.pyc`` files *only* is unchanged, but conceptually it becomes a special case of a “pyc-first” file layout. Motivation ========== Most pure Python code is installed as a source file (``*.py``), combined with a bytecode cache file (``__pycache__/*.pyc``), which is created/updated ahead of time or on demand. This layout is designed for rapid iteration. Each time a module is imported, Python assumes the source might have changed: if a bytecode cache is present, Python normally checks whether it still corresponds to the source. :pep:`552` introduced an “unchecked” mode, in which this check is skipped. However, this causes updates to the source to be silently ignored, possibly confusing users that aren't aware of this rarely used mode. The remaining checking modes have their own disadvantages. In both, the best case scenario (the cache is present and fresh), Python must access at least two files (the source and the cache). Further: * In the timestamp-based mode, the source file's last-modification time is used as part of the cache key, causing issues with reproducible builds as described in :pep:`552`. * In the hash-based mode, the entire source file is read and hashed. This is potentially a slow operation. [XXX data needed.] Another way to install Python modules is to not install the source, and use the ``*.pyc`` file directly in place of the ``*.py`` file (removing Python version tag from the filename and moving the file out of the ``__pycache__`` directory). This layout has two main issues: * The Python version tag is not used, meaning that modules using this layout are only usable by a specific version, and * the source is not available, making it hard to debug (tracebacks and the ``inspect`` module don't show code; file is unreadable to the debugging human). The first issue is usually not relevant, as most installations are tightly tied to a specific interpreter. [XXX any examples where this isn't the case?] This PEP proposes to solve the second issue by allowing installers to distribute the source file alongside the file with the bytecode. Rationale ========= The new file layout is optimized for “installed libraries”: third-party libraries installed on a user's system. This can include the Python standard library. We assume that these files will most likely not be edited after installation. Python will only consult the bytecode file (``*.pyc``) when loading a module, and not check whether a ``*.py`` file was edited. We assume than retreiving a module's source is useful, but it is not a performance-sensitive operation. It is used when displaying tracebacks or debugging. This makes it more palatable for distributors to use the resource-intensive “checked hash” bytecode files and enjoy their benefits (explained in :pep:552). On the other hand, we believe that Python should remain “hackable”: if a source file is available, it should be possible to modify it and use the result -- for example, to add a few ``print`` calls to a library for some quick-and-dirty debugging (in a throwaway virtual environment, of course), or even to explore the standard library by breaking it. The proposed file layout makes this relatively straightforward: when the source (``*.py``) file is moved out of the ``__pysource__`` directory, Python will ignore the bytecode file and load the source instead, producing a cache in ``__pycache__``. (This is the existing behavior when both a ``*.py`` and ``*.pyc`` are present for a given name.) We hope that users who'd like to do this, but aren't familiar with the proposed mechanics, will notice the extra directory, search the Web for ``__pysource__`` and find relevant instructions. The proposed layout makes it easy to omit the source files, which will be useful in resource-constrained environments (e.g. minimal Linux containers). Omiting them should not affect non-debug functionality. Adding the sources to an installation that omits them involves only creating directories and copying source files to the right places, which is relatively easy even for non-Python-specific tools (like Linux package managers). This PEP does not propose that any particular distributor or installer (including Python's build system) should immediately switch to the new layout. The PEP will be implemented when ``importlib`` supports reading the layout and stdlib tools like ``py_compile`` can generate it. Switching to it should be a separate decision -- although one that might not need a PEP. Specification ============= ``importlib.machinery.SourcelessFileLoader``, the loader that handles stand-alone ``*.pyc`` files, will be renamed to ``BytecodeFileLoader``. The old name will remain as an alias for the foreseeable future, with no ``DeprecationWarning``. However, third-party linters and code-quality tools are encouraged to treat the old name as suboptimal. The ``get_source_filename`` method of ``BytecodeFileLoader`` will be changed to return the expected location of an auxiliary source file, e.g. ``dir/__pysource__/module.py`` for ``dir/module.pyc``. The ``get_source`` method of ``BytecodeFileLoader`` will check if the auxiliary source file corresponds to the bytecode file (as returned by ``get_filename``). .. note:: This check is done at the time of the call. There is no check that the source file corresponds to an in-memory module loaded by the ``BytecodeFileLoader``. For example, if both ``*.pyc`` and ``*.py`` are changed after a module is loaded, tracebacks will show lines of the updated source, which might not correspond to the running code. The same “gotcha” applies to current handling of ``*.py`` files. The ``py_compile`` and ``compileall`` modules will gain arguments and CLI options for compiling to the new layout. [XXX: This needs fleshing out. The original source needs to be moved. Need to ensure that compilation is still idempotent.] Implications ------------ The following follows naturally [XXX verify this!] from the changes above, but will be tested separately. ``inspect.getsource``, ``inspect.getsourcefile``, ``inspect.getsourcelines``, the ``python -m inspect`` CLI will retreive source for modules using the new layout (if the ``__pysource__/*.py`` file is available and current). Tracebacks will show source lines for modules using the new layout (if the ``__pysource__/*.py`` file is available and current). Backwards Compatibility ======================= The proposal is backwards compatible. However, once an installer (including Python's build process) switches to the new layout, tools that are not prepared for it may stop working. This affects tools like IDEs, debuggers, API doc generators, etc. if they either don't use ``importlib`` or ``inspect``, or use these modules from a different version of Python than the code they are handling. Even in that case, the failure -- not being able to retreive source code for a third-party module -- is usually a quality-of-life issue rather than a serious flaw. Security Implications ===================== None known. The proposal adds source code information to modules that can already be loaded and executed. How to Teach This ================= This change does not affect code that users write directly. Most teaching materials can stay unchanged. Authors of existing installer tools should read this PEP. Authors of future installer tools should read documentation that will be added. Searching for the ``__pysource__`` directory name in Python's documentation should yield relevant documentation. We hope that people exploring the libraries installed on their system will naturally reach relevant docs by searching for ``__pysource__``. Reference Implementation ======================== https://github.com/encukou/cpython/tree/pysource Rejected Ideas ============== Nothing yet. Open Issues =========== See XXX's above.
participants (1)
-
encukou@gmail.com