[Python-Dev] Final call for PEP 488: eliminating PYO files
Brett Cannon
bcannon at gmail.com
Sat Mar 21 14:34:19 CET 2015
Thanks! PEP 488 is now marked as accepted. I expect I will have PEP 488
implemented before the PyCon sprints are over (work will be tracked in
http://bugs.python.org/issue23731).
On Fri, Mar 20, 2015 at 8:06 PM Guido van Rossum <guido at python.org> wrote:
> Awesome, that's what I was hoping. Accepted! Congrats and thank you very
> much for writing the PEP and guiding the discussion.
>
> On Fri, Mar 20, 2015 at 4:00 PM, Brett Cannon <bcannon at gmail.com> wrote:
>
>>
>>
>> On Fri, Mar 20, 2015 at 4:41 PM Guido van Rossum <guido at python.org>
>> wrote:
>>
>>> I am willing to be the BDFL for this PEP. I have tried to skim the
>>> recent discussion (only python-dev) and I don't see much remaining
>>> controversy. HOWEVER... The PEP is not clear (or at least too subtle) about
>>> the actual name for optimization level 0. If I have foo.py, and I compile
>>> it three times with three different optimization levels (no optimization;
>>> -O; -OO), and then I look in __pycache__, would I see this:
>>>
>>> # (1)
>>> foo.cpython-35.pyc
>>> foo.cpython-35.opt-1.pyc
>>> foo.cpython-35.opt-2.pyc
>>>
>>> Or would I see this?
>>>
>>> # (2)
>>> foo.cpython-35.opt-0.pyc
>>> foo.cpython-35.opt-1.pyc
>>> foo.cpython-35.opt-2.pyc
>>>
>>
>> #1
>>
>>
>>>
>>> Your lead-in ("I have decided to have the default case of no
>>> optimization levels mean that the .pyc file name will have *no* optimization
>>> level specified in the name and thus be just as it is today.") makes me
>>> think I should expect (1), but I can't actually pinpoint where the language
>>> of the PEP says this.
>>>
>>
>> It was meant to be explained by "When no optimization level is
>> specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change
>> in file name
>> semantics)", but obviously it's a bit too subtle. I just updated the PEP
>> with an explicit list of bytecode file name examples based on no -O, -O,
>> and -OO.
>>
>> -Brett
>>
>>
>>>
>>>
>>> On Fri, Mar 20, 2015 at 11:34 AM, Brett Cannon <bcannon at gmail.com>
>>> wrote:
>>>
>>>> I have decided to have the default case of no optimization levels mean
>>>> that the .pyc file name will have *no* optimization level specified in
>>>> the name and thus be just as it is today. I made this decision due to
>>>> potential backwards-compatibility issues -- although I expect them to be
>>>> minutes -- and to not force other implementations like PyPy to have some
>>>> bogus value set since they don't have .pyo files to begin with (PyPy
>>>> actually uses bytecode for -O and don't bother with -OO since PyPy already
>>>> uses a bunch of memory when running).
>>>>
>>>> Since this closes out the last open issue, I need either a BDFL
>>>> decision or a BDFAP to be assigned to make a decision. Guido?
>>>>
>>>> ======================================
>>>>
>>>> PEP: 488
>>>> Title: Elimination of PYO files
>>>> Version: $Revision$
>>>> Last-Modified: $Date$
>>>> Author: Brett Cannon <brett at python.org>
>>>> Status: Draft
>>>> Type: Standards Track
>>>> Content-Type: text/x-rst
>>>> Created: 20-Feb-2015
>>>> Post-History:
>>>> 2015-03-06
>>>> 2015-03-13
>>>> 2015-03-20
>>>>
>>>> Abstract
>>>> ========
>>>>
>>>> This PEP proposes eliminating the concept of PYO files from Python.
>>>> To continue the support of the separation of bytecode files based on
>>>> their optimization level, this PEP proposes extending the PYC file
>>>> name to include the optimization level in the bytecode repository
>>>> directory when it's called for (i.e., the ``__pycache__`` directory).
>>>>
>>>>
>>>> Rationale
>>>> =========
>>>>
>>>> As of today, bytecode files come in two flavours: PYC and PYO. A PYC
>>>> file is the bytecode file generated and read from when no
>>>> optimization level is specified at interpreter startup (i.e., ``-O``
>>>> is not specified). A PYO file represents the bytecode file that is
>>>> read/written when **any** optimization level is specified (i.e., when
>>>> ``-O`` **or** ``-OO`` is specified). This means that while PYC
>>>> files clearly delineate the optimization level used when they were
>>>> generated -- namely no optimizations beyond the peepholer -- the same
>>>> is not true for PYO files. To put this in terms of optimization
>>>> levels and the file extension:
>>>>
>>>> - 0: ``.pyc``
>>>> - 1 (``-O``): ``.pyo``
>>>> - 2 (``-OO``): ``.pyo``
>>>>
>>>> The reuse of the ``.pyo`` file extension for both level 1 and 2
>>>> optimizations means that there is no clear way to tell what
>>>> optimization level was used to generate the bytecode file. In terms
>>>> of reading PYO files, this can lead to an interpreter using a mixture
>>>> of optimization levels with its code if the user was not careful to
>>>> make sure all PYO files were generated using the same optimization
>>>> level (typically done by blindly deleting all PYO files and then
>>>> using the `compileall` module to compile all-new PYO files [1]_).
>>>> This issue is only compounded when people optimize Python code beyond
>>>> what the interpreter natively supports, e.g., using the astoptimizer
>>>> project [2]_.
>>>>
>>>> In terms of writing PYO files, the need to delete all PYO files
>>>> every time one either changes the optimization level they want to use
>>>> or are unsure of what optimization was used the last time PYO files
>>>> were generated leads to unnecessary file churn. The change proposed
>>>> by this PEP also allows for **all** optimization levels to be
>>>> pre-compiled for bytecode files ahead of time, something that is
>>>> currently impossible thanks to the reuse of the ``.pyo`` file
>>>> extension for multiple optimization levels.
>>>>
>>>> As for distributing bytecode-only modules, having to distribute both
>>>> ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case
>>>> of code obfuscation and smaller file deployments. This means that
>>>> bytecode-only modules will only load from their non-optimized
>>>> ``.pyc`` file name.
>>>>
>>>>
>>>> Proposal
>>>> ========
>>>>
>>>> To eliminate the ambiguity that PYO files present, this PEP proposes
>>>> eliminating the concept of PYO files and their accompanying ``.pyo``
>>>> file extension. To allow for the optimization level to be unambiguous
>>>> as well as to avoid having to regenerate optimized bytecode files
>>>> needlessly in the `__pycache__` directory, the optimization level
>>>> used to generate the bytecode file will be incorporated into the
>>>> bytecode file name. When no optimization level is specified, the
>>>> pre-PEP ``.pyc`` file name will be used (i.e., no change in file name
>>>> semantics). This increases backwards-compatibility while also being
>>>> more understanding of Python implementations which have no use for
>>>> optimization levels (e.g., PyPy[10]_).
>>>>
>>>> Currently bytecode file names are created by
>>>> ``importlib.util.cache_from_source()``, approximately using the
>>>> following expression defined by PEP 3147 [3]_, [4]_, [5]_::
>>>>
>>>> '{name}.{cache_tag}.pyc'.format(name=module_name,
>>>>
>>>> cache_tag=sys.implementation.cache_tag)
>>>>
>>>> This PEP proposes to change the expression when an optimization
>>>> level is specified to::
>>>>
>>>> '{name}.{cache_tag}.opt-{optimization}.pyc'.format(
>>>> name=module_name,
>>>> cache_tag=sys.implementation.cache_tag,
>>>> optimization=str(sys.flags.optimize))
>>>>
>>>> The "opt-" prefix was chosen so as to provide a visual separator
>>>> from the cache tag. The placement of the optimization level after
>>>> the cache tag was chosen to preserve lexicographic sort order of
>>>> bytecode file names based on module name and cache tag which will
>>>> not vary for a single interpreter. The "opt-" prefix was chosen over
>>>> "o" so as to be somewhat self-documenting. The "opt-" prefix was
>>>> chosen over "O" so as to not have any confusion in case "0" was the
>>>> leading prefix of the optimization level.
>>>>
>>>> A period was chosen over a hyphen as a separator so as to distinguish
>>>> clearly that the optimization level is not part of the interpreter
>>>> version as specified by the cache tag. It also lends to the use of
>>>> the period in the file name to delineate semantically different
>>>> concepts.
>>>>
>>>> For example, if ``-OO`` had been passed to the interpreter then instead
>>>> of ``importlib.cpython-35.pyo`` the file name would be
>>>> ``importlib.cpython-35.opt-2.pyc``.
>>>>
>>>> It should be noted that this change in no way affects the performance
>>>> of import. Since the import system looks for a single bytecode file
>>>> based on the optimization level of the interpreter already and
>>>> generates a new bytecode file if it doesn't exist, the introduction
>>>> of potentially more bytecode files in the ``__pycache__`` directory
>>>> has no effect in terms of stat calls. The interpreter will continue
>>>> to look for only a single bytecode file based on the optimization
>>>> level and thus no increase in stat calls will occur.
>>>>
>>>> The only potentially negative result of this PEP is the probable
>>>> increase in the number of ``.pyc`` files and thus increase in storage
>>>> use. But for platforms where this is an issue,
>>>> ``sys.dont_write_bytecode`` exists to turn off bytecode generation so
>>>> that it can be controlled offline.
>>>>
>>>>
>>>> Implementation
>>>> ==============
>>>>
>>>> importlib
>>>> ---------
>>>>
>>>> As ``importlib.util.cache_from_source()`` is the API that exposes
>>>> bytecode file paths as well as being directly used by importlib, it
>>>> requires the most critical change. As of Python 3.4, the function's
>>>> signature is::
>>>>
>>>> importlib.util.cache_from_source(path, debug_override=None)
>>>>
>>>> This PEP proposes changing the signature in Python 3.5 to::
>>>>
>>>> importlib.util.cache_from_source(path, debug_override=None, *,
>>>> optimization=None)
>>>>
>>>> The introduced ``optimization`` keyword-only parameter will control
>>>> what optimization level is specified in the file name. If the
>>>> argument is ``None`` then the current optimization level of the
>>>> interpreter will be assumed (including no optimization). Any argument
>>>> given for ``optimization`` will be passed to ``str()`` and must have
>>>> ``str.isalnum()`` be true, else ``ValueError`` will be raised (this
>>>> prevents invalid characters being used in the file name). If the
>>>> empty string is passed in for ``optimization`` then the addition of
>>>> the optimization will be suppressed, reverting to the file name
>>>> format which predates this PEP.
>>>>
>>>> It is expected that beyond Python's own two optimization levels,
>>>> third-party code will use a hash of optimization names to specify the
>>>> optimization level, e.g.
>>>> ``hashlib.sha256(','.join(['no dead code', 'const
>>>> folding'])).hexdigest()``.
>>>> While this might lead to long file names, it is assumed that most
>>>> users never look at the contents of the __pycache__ directory and so
>>>> this won't be an issue.
>>>>
>>>> The ``debug_override`` parameter will be deprecated. As the parameter
>>>> expects a boolean, the integer value of the boolean will be used as
>>>> if it had been provided as the argument to ``optimization`` (a
>>>> ``None`` argument will mean the same as for ``optimization``). A
>>>> deprecation warning will be raised when ``debug_override`` is given a
>>>> value other than ``None``, but there are no plans for the complete
>>>> removal of the parameter at this time (but removal will be no later
>>>> than Python 4).
>>>>
>>>> The various module attributes for importlib.machinery which relate to
>>>> bytecode file suffixes will be updated [7]_. The
>>>> ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will
>>>> both be documented as deprecated and set to the same value as
>>>> ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and
>>>> ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be
>>>> not later than Python 4).
>>>>
>>>> All various finders and loaders will also be updated as necessary,
>>>> but updating the previous mentioned parts of importlib should be all
>>>> that is required.
>>>>
>>>>
>>>> Rest of the standard library
>>>> ----------------------------
>>>>
>>>> The various functions exposed by the ``py_compile`` and
>>>> ``compileall`` functions will be updated as necessary to make sure
>>>> they follow the new bytecode file name semantics [6]_, [1]_. The CLI
>>>> for the ``compileall`` module will not be directly affected (the
>>>> ``-b`` flag will be implicit as it will no longer generate ``.pyo``
>>>> files when ``-O`` is specified).
>>>>
>>>>
>>>> Compatibility Considerations
>>>> ============================
>>>>
>>>> Any code directly manipulating bytecode files from Python 3.2 on
>>>> will need to consider the impact of this change on their code (prior
>>>> to Python 3.2 -- including all of Python 2 -- there was no
>>>> __pycache__ which already necessitates bifurcating bytecode file
>>>> handling support). If code was setting the ``debug_override``
>>>> argument to ``importlib.util.cache_from_source()`` then care will be
>>>> needed if they want the path to a bytecode file with an optimization
>>>> level of 2. Otherwise only code **not** using
>>>> ``importlib.util.cache_from_source()`` will need updating.
>>>>
>>>> As for people who distribute bytecode-only modules (i.e., use a
>>>> bytecode file instead of a source file), they will have to choose
>>>> which optimization level they want their bytecode files to be since
>>>> distributing a ``.pyo`` file with a ``.pyc`` file will no longer be
>>>> of any use. Since people typically only distribute bytecode files for
>>>> code obfuscation purposes or smaller distribution size then only
>>>> having to distribute a single ``.pyc`` should actually be beneficial
>>>> to these use-cases. And since the magic number for bytecode files
>>>> changed in Python 3.5 to support PEP 465 there is no need to support
>>>> pre-existing ``.pyo`` files [8]_.
>>>>
>>>>
>>>> Rejected Ideas
>>>> ==============
>>>>
>>>> Completely dropping optimization levels from CPython
>>>> ----------------------------------------------------
>>>>
>>>> Some have suggested that instead of accommodating the various
>>>> optimization levels in CPython, we should instead drop them
>>>> entirely. The argument is that significant performance gains would
>>>> occur from runtime optimizations through something like a JIT and not
>>>> through pre-execution bytecode optimizations.
>>>>
>>>> This idea is rejected for this PEP as that ignores the fact that
>>>> there are people who do find the pre-existing optimization levels for
>>>> CPython useful. It also assumes that no other Python interpreter
>>>> would find what this PEP proposes useful.
>>>>
>>>>
>>>> Alternative formatting of the optimization level in the file name
>>>> -----------------------------------------------------------------
>>>>
>>>> Using the "opt-" prefix and placing the optimization level between
>>>> the cache tag and file extension is not critical. All options which
>>>> have been considered are:
>>>>
>>>> * ``importlib.cpython-35.opt-1.pyc``
>>>> * ``importlib.cpython-35.opt1.pyc``
>>>> * ``importlib.cpython-35.o1.pyc``
>>>> * ``importlib.cpython-35.O1.pyc``
>>>> * ``importlib.cpython-35.1.pyc``
>>>> * ``importlib.cpython-35-O1.pyc``
>>>> * ``importlib.O1.cpython-35.pyc``
>>>> * ``importlib.o1.cpython-35.pyc``
>>>> * ``importlib.1.cpython-35.pyc``
>>>>
>>>> These were initially rejected either because they would change the
>>>> sort order of bytecode files, possible ambiguity with the cache tag,
>>>> or were not self-documenting enough. An informal poll was taken and
>>>> people clearly preferred the formatting proposed by the PEP [9]_.
>>>> Since this topic is non-technical and of personal choice, the issue
>>>> is considered solved.
>>>>
>>>>
>>>> Embedding the optimization level in the bytecode metadata
>>>> ---------------------------------------------------------
>>>>
>>>> Some have suggested that rather than embedding the optimization level
>>>> of bytecode in the file name that it be included in the file's
>>>> metadata instead. This would mean every interpreter had a single copy
>>>> of bytecode at any time. Changing the optimization level would thus
>>>> require rewriting the bytecode, but there would also only be a single
>>>> file to care about.
>>>>
>>>> This has been rejected due to the fact that Python is often installed
>>>> as a root-level application and thus modifying the bytecode file for
>>>> modules in the standard library are always possible. In this
>>>> situation integrators would need to guess at what a reasonable
>>>> optimization level was for users for any/all situations. By
>>>> allowing multiple optimization levels to co-exist simultaneously it
>>>> frees integrators from having to guess what users want and allows
>>>> users to utilize the optimization level they want.
>>>>
>>>>
>>>> References
>>>> ==========
>>>>
>>>> .. [1] The compileall module
>>>> (https://docs.python.org/3/library/compileall.html#module-compileall
>>>> )
>>>>
>>>> .. [2] The astoptimizer project
>>>> (https://pypi.python.org/pypi/astoptimizer)
>>>>
>>>> .. [3] ``importlib.util.cache_from_source()``
>>>> (
>>>> https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source
>>>> )
>>>>
>>>> .. [4] Implementation of ``importlib.util.cache_from_source()`` from
>>>> CPython 3.4.3rc1
>>>> (
>>>> https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437
>>>> )
>>>>
>>>> .. [5] PEP 3147, PYC Repository Directories, Warsaw
>>>> (http://www.python.org/dev/peps/pep-3147)
>>>>
>>>> .. [6] The py_compile module
>>>> (https://docs.python.org/3/library/compileall.html#module-compileall
>>>> )
>>>>
>>>> .. [7] The importlib.machinery module
>>>> (
>>>> https://docs.python.org/3/library/importlib.html#module-importlib.machinery
>>>> )
>>>>
>>>> .. [8] ``importlib.util.MAGIC_NUMBER``
>>>> (
>>>> https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER
>>>> )
>>>>
>>>> .. [9] Informal poll of file name format options on Google+
>>>> (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm)
>>>>
>>>> .. [10] The PyPy Project
>>>> (http://pypy.org/)
>>>>
>>>>
>>>> Copyright
>>>> =========
>>>>
>>>> This document has been placed in the public domain.
>>>>
>>>>
>>>> ..
>>>> Local Variables:
>>>> mode: indented-text
>>>> indent-tabs-mode: nil
>>>> sentence-end-double-space: t
>>>> fill-column: 70
>>>> coding: utf-8
>>>> End:
>>>>
>>>>
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> https://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe:
>>>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>>
>>>>
>>>
>>>
>>> --
>>> --Guido van Rossum (python.org/~guido)
>>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150321/f26d8a0f/attachment-0001.html>
More information about the Python-Dev
mailing list