[Import-SIG] PEP for the removal of PYO files

Brett Cannon bcannon at gmail.com
Fri Feb 27 19:06:33 CET 2015


On Fri, Feb 27, 2015 at 1:02 PM Guido van Rossum <guido at python.org> wrote:

> I'm in a good mood today and I think this is a great idea!
>

Just that mean if you were in a bad mood this would be a bad idea? ;)


> That's not to say that I'm accepting it as-is (I haven't read it fully)
> but I expect that there are very few downsides and it won't break much.
>

There is a section in the PEP discussing backwards-compatibility. Basically
the potential breakage seems fairly minimal to me.


> (There's of course always going to be someone who always uses -O and
> somehow depends on the existence of .pyo files, but they should have seen
> it coming with __pycache__ and the new version-specific extensions. :-)
>

Yep! PEP 3147 makes this much easier to do without breaking the world.

-Brett


>
> On Fri, Feb 27, 2015 at 9:06 AM, Brett Cannon <bcannon at gmail.com> wrote:
>
>> Here is my proposed PEP to drop .pyo files from Python. Thanks to Barry's
>> work in PEP 3147 this really shouldn't have much impact on user's code
>> (then again, bytecode files are basically an implementation detail so it
>> shouldn't impact hardly anyone directly).
>>
>> One thing I would appreciate is if people have more motivation for this.
>> While the maintainer of importlib in me wants to see this happen, the core
>> developer in me thinks the arguments are a little weak. So if people can
>> provide more reasons why this is a good thing that would be appreciated.
>>
>>
>> PEP: 487
>> Title: Elimination of PYO files
>> Version: $Revision$
>> Last-Modified: $Date$
>> Author: Brett Cannon <brett at python.org>
>> Status: Draft
>> Type: Standards Track
>> Content-Type: text/x-rst
>> Created: 20-Feb-2015
>> Post-History:
>>
>> Abstract
>> ========
>>
>> This PEP proposes eliminating the concept of PYO files from Python.
>> To continue the support of the separation of bytecode files based on
>> their optimization level, this PEP proposes extending the PYC file
>> name to include the optimization level in bytecode repository
>> directory (i.e., the ``__pycache__`` directory).
>>
>>
>> Rationale
>> =========
>>
>> As of today, bytecode files come in two flavours: PYC and PYO. A PYC
>> file is the bytecode file generated and read from when no
>> optimization level is specified at interpreter startup (i.e., ``-O``
>> is not specified). A PYO file represents the bytecode file that is
>> read/written when **any** optimization level is specified (i.e., when
>> ``-O`` is specified, including ``-OO``). This means that while PYC
>> files clearly delineate the optimization level used when they were
>> generated -- namely no optimizations beyond the peepholer -- the same
>> is not true for PYO files. Put in terms of optimization levels and
>> the file extension:
>>
>>   - 0: ``.pyc``
>>   - 1 (``-O``): ``.pyo``
>>   - 2 (``-OO``): ``.pyo``
>>
>> The reuse of the ``.pyo`` file extension for both level 1 and 2
>> optimizations means that there is no clear way to tell what
>> optimization level was used to generate the bytecode file. In terms
>> of reading PYO files, this can lead to an interpreter using a mixture
>> of optimization levels with its code if the user was not careful to
>> make sure all PYO files were generated using the same optimization
>> level (typically done by blindly deleting all PYO files and then
>> using the `compileall` module to compile all-new PYO files [1]_).
>> This issue is only compounded when people optimize Python code beyond
>> what the interpreter natively supports, e.g., using the astoptimizer
>> project [2]_.
>>
>> In terms of writing PYO files, the need to delete all PYO files
>> every time one either changes the optimization level they want to use
>> or are unsure of what optimization was used the last time PYO files
>> were generated leads to unnecessary file churn.
>>
>> As for distributing bytecode-only modules, having to distribute both
>> ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case
>> of code obfuscation and smaller file deployments.
>>
>>
>> Proposal
>> ========
>>
>> To eliminate the ambiguity that PYO files present, this PEP proposes
>> eliminating the concept of PYO files and their accompanying ``.pyo``
>> file extension. To allow for the optimization level to be unambiguous
>> as well as to avoid having to regenerate optimized bytecode files
>> needlessly in the `__pycache__` directory, the optimization level
>> used to generate a PYC file will be incorporated into the bytecode
>> file name. Currently bytecode file names are created by
>> ``importlib.util.cache_from_source()``, approximately using the
>> following expression defined by PEP 3147 [3]_, [4]_, [5]_::
>>
>>     '{name}.{cache_tag}.pyc'.format(name=module_name,
>>
>> cache_tag=sys.implementation.cache_tag)
>>
>> This PEP proposes to change the expression to::
>>
>>     '{name}.{cache_tag}.opt-{optimization}.pyc'.format(
>>             name=module_name,
>>             cache_tag=sys.implementation.cache_tag,
>>             optimization=str(sys.flags.optimize))
>>
>> The "opt-" prefix was chosen so as to provide a visual separator
>> from the cache tag. The placement of the optimization level after
>> the cache tag was chosen to preserve lexicographic sort order of
>> bytecode file names based on module name and cache tag which will
>> not vary for a single interpreter. The "opt-" prefix was chosen over
>> "o" so as to be somewhat self-documenting. The "opt-" prefix was
>> chosen over "O" so as to not have any confusion with "0" while being
>> so close to the interpreter version number.
>>
>> A period was chosen over a hyphen as a separator so as to distinguish
>> clearly that the optimization level is not part of the interpreter
>> version as specified by the cache tag. It also lends to the use of
>> the period in the file name to delineate semantically different
>> concepts.
>>
>> For example, the bytecode file name of ``importlib.cpython-35.pyc``
>> would become ``importlib.cpython-35.opt-0.pyc``. If ``-OO`` had been
>> passed to the interpreter then instead of
>> ``importlib.cpython-35.pyo`` the file name would be
>> ``importlib.cpython-35.opt-2.pyc``.
>>
>>
>> Implementation
>> ==============
>>
>> importlib
>> ---------
>>
>> As ``importlib.util.cache_from_source()`` is the API that exposes
>> bytecode file paths as while as being directly used by importlib, it
>> requires the most critical change. As of Python 3.4, the function's
>> signature is::
>>
>>   importlib.util.cache_from_source(path, debug_override=None)
>>
>> This PEP proposes changing the signature in Python 3.5 to::
>>
>>   importlib.util.cache_from_source(path, debug_override=None, *,
>> optimization=None)
>>
>> The introduced ``optimization`` keyword-only parameter will control
>> what optimization level is specified in the file name. If the
>> argument is ``None`` then the current optimization level of the
>> interpreter will be assumed. Any argument given for ``optimization``
>> will be passed to ``str()`` and must have ``str.isalnum()`` be true,
>> else ``ValueError`` will be raised (this prevents invalid characters
>> being used in the file name). It is expected that beyond Python's own
>> 0-2 optimization levels, third-party code will use a hash of
>> optimization names to specify the optimization level, e.g.
>> ``hashlib.sha256(','.join(['dead code elimination', 'constant
>> folding'])).hexdigest()``.
>>
>> The ``debug_override`` parameter will be deprecated. As the parameter
>> expects a boolean, the integer value of the boolean will be used as
>> if it had been provided as the argument to ``optimization`` (a
>> ``None`` argument will mean the same as for ``optimization``). A
>> deprecation warning will be raised when ``debug_override`` is given a
>> value other than ``None``, but there are no plans for the complete
>> removal of the parameter as this time (but removal will be no later
>> than Python 4).
>>
>> The various module attributes for importlib.machinery which relate to
>> bytecode file suffixes will be updated [7]_. The
>> ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will
>> both be documented as deprecated and set to the same value as
>> ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and
>> ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be
>> not later than Python 4).
>>
>> All various finders and loaders will also be updated as necessary,
>> but updating the previous mentioned parts of importlib should be all
>> that is required.
>>
>>
>> Rest of the standard library
>> ----------------------------
>>
>> The various functions exposed by the ``py_compile`` and
>> ``compileall`` functions will be updated as necessary to make sure
>> they follow the new bytecode file name semantics [6]_, [1]_.
>>
>>
>> Compatibility Considerations
>> ============================
>>
>> Any code directly manipulating bytecode files from Python 3.2 on
>> will need to consider the impact of this change on their code (prior
>> to Python 3.2 -- including all of Python 2 -- there was no
>> __pycache__ which already necessitates bifurcating bytecode file
>> handling support). If code was setting the ``debug_override``
>> argument to ``importlib.util.cache_from_source()`` then care will be
>> needed if they want the path to a bytecode file with an optimization
>> level of 2. Otherwise only code **not** using
>> ``importlib.util.cache_from_source()`` will need updating.
>>
>> As for people who distribute bytecode-only modules, they will have
>> to choose which optimization level they want their bytecode files to
>> be since distributing a ``.pyo`` file with a ``.pyc`` file will no
>> longer be of any use. Since people typically only distribute bytecode
>> files for code obfuscation purposes or smaller distribution size
>> then only having to distribute a single ``.pyc`` should actually be
>> beneficial to these use-cases.
>>
>>
>> Rejected Ideas
>> ==============
>>
>> N/A
>>
>>
>> Open Issues
>> ===========
>>
>> Formatting of the optimization level in the file name
>> -----------------------------------------------------
>>
>> Using the "opt-" prefix and placing the optimization level between
>> the cache tag and file extension is not critical. Other options which
>> were considered are:
>>
>> * ``importlib.cpython-35.o0.pyc``
>> * ``importlib.cpython-35.O0.pyc``
>> * ``importlib.cpython-35.0.pyc``
>> * ``importlib.cpython-35-O0.pyc``
>> * ``importlib.O0.cpython-35.pyc``
>> * ``importlib.o0.cpython-35.pyc``
>> * ``importlib.0.cpython-35.pyc``
>>
>> These were initially rejected either because they would change the
>> sort order of bytecode files, possible ambiguity with the cache tag,
>> or were not self-documenting enough.
>>
>>
>> References
>> ==========
>>
>> .. [1] The compileall module
>>    (https://docs.python.org/3/library/compileall.html#module-compileall)
>>
>> .. [2] The astoptimizer project
>>    (https://pypi.python.org/pypi/astoptimizer)
>>
>> .. [3] ``importlib.util.cache_from_source()``
>>    (
>> https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source
>> )
>>
>> .. [4] Implementation of ``importlib.util.cache_from_source()`` from
>> CPython 3.4.3rc1
>>    (
>> https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437
>> )
>>
>> .. [5] PEP 3147, PYC Repository Directories, Warsaw
>>    (http://www.python.org/dev/peps/pep-3147)
>>
>> .. [6] The py_compile module
>>    (https://docs.python.org/3/library/compileall.html#module-compileall)
>>
>> .. [7] The importlib.machinery module
>>    (
>> https://docs.python.org/3/library/importlib.html#module-importlib.machinery
>> )
>>
>>
>> Copyright
>> =========
>>
>> This document has been placed in the public domain.
>>
>>
>> ..
>>    Local Variables:
>>    mode: indented-text
>>    indent-tabs-mode: nil
>>    sentence-end-double-space: t
>>    fill-column: 70
>>    coding: utf-8
>>    End:
>>
>>
>> _______________________________________________
>> Import-SIG mailing list
>> Import-SIG at python.org
>> https://mail.python.org/mailman/listinfo/import-sig
>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150227/15424e1b/attachment-0001.html>


More information about the Import-SIG mailing list