Final call for PEP 488: eliminating PYO files
data:image/s3,"s3://crabby-images/62bd4/62bd40343bc747a5d19d8cbed8ec1ec5ba7199d4" alt=""
I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today. I made this decision due to potential backwards-compatibility issues -- although I expect them to be minutes -- and to not force other implementations like PyPy to have some bogus value set since they don't have .pyo files to begin with (PyPy actually uses bytecode for -O and don't bother with -OO since PyPy already uses a bunch of memory when running). Since this closes out the last open issue, I need either a BDFL decision or a BDFAP to be assigned to make a decision. Guido? ====================================== PEP: 488 Title: Elimination of PYO files Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon <brett@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Feb-2015 Post-History: 2015-03-06 2015-03-13 2015-03-20 Abstract ======== This PEP proposes eliminating the concept of PYO files from Python. To continue the support of the separation of bytecode files based on their optimization level, this PEP proposes extending the PYC file name to include the optimization level in the bytecode repository directory when it's called for (i.e., the ``__pycache__`` directory). Rationale ========= As of today, bytecode files come in two flavours: PYC and PYO. A PYC file is the bytecode file generated and read from when no optimization level is specified at interpreter startup (i.e., ``-O`` is not specified). A PYO file represents the bytecode file that is read/written when **any** optimization level is specified (i.e., when ``-O`` **or** ``-OO`` is specified). This means that while PYC files clearly delineate the optimization level used when they were generated -- namely no optimizations beyond the peepholer -- the same is not true for PYO files. To put this in terms of optimization levels and the file extension: - 0: ``.pyc`` - 1 (``-O``): ``.pyo`` - 2 (``-OO``): ``.pyo`` The reuse of the ``.pyo`` file extension for both level 1 and 2 optimizations means that there is no clear way to tell what optimization level was used to generate the bytecode file. In terms of reading PYO files, this can lead to an interpreter using a mixture of optimization levels with its code if the user was not careful to make sure all PYO files were generated using the same optimization level (typically done by blindly deleting all PYO files and then using the `compileall` module to compile all-new PYO files [1]_). This issue is only compounded when people optimize Python code beyond what the interpreter natively supports, e.g., using the astoptimizer project [2]_. In terms of writing PYO files, the need to delete all PYO files every time one either changes the optimization level they want to use or are unsure of what optimization was used the last time PYO files were generated leads to unnecessary file churn. The change proposed by this PEP also allows for **all** optimization levels to be pre-compiled for bytecode files ahead of time, something that is currently impossible thanks to the reuse of the ``.pyo`` file extension for multiple optimization levels. As for distributing bytecode-only modules, having to distribute both ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case of code obfuscation and smaller file deployments. This means that bytecode-only modules will only load from their non-optimized ``.pyc`` file name. Proposal ======== To eliminate the ambiguity that PYO files present, this PEP proposes eliminating the concept of PYO files and their accompanying ``.pyo`` file extension. To allow for the optimization level to be unambiguous as well as to avoid having to regenerate optimized bytecode files needlessly in the `__pycache__` directory, the optimization level used to generate the bytecode file will be incorporated into the bytecode file name. When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics). This increases backwards-compatibility while also being more understanding of Python implementations which have no use for optimization levels (e.g., PyPy[10]_). Currently bytecode file names are created by ``importlib.util.cache_from_source()``, approximately using the following expression defined by PEP 3147 [3]_, [4]_, [5]_:: '{name}.{cache_tag}.pyc'.format(name=module_name, cache_tag=sys.implementation.cache_tag) This PEP proposes to change the expression when an optimization level is specified to:: '{name}.{cache_tag}.opt-{optimization}.pyc'.format( name=module_name, cache_tag=sys.implementation.cache_tag, optimization=str(sys.flags.optimize)) The "opt-" prefix was chosen so as to provide a visual separator from the cache tag. The placement of the optimization level after the cache tag was chosen to preserve lexicographic sort order of bytecode file names based on module name and cache tag which will not vary for a single interpreter. The "opt-" prefix was chosen over "o" so as to be somewhat self-documenting. The "opt-" prefix was chosen over "O" so as to not have any confusion in case "0" was the leading prefix of the optimization level. A period was chosen over a hyphen as a separator so as to distinguish clearly that the optimization level is not part of the interpreter version as specified by the cache tag. It also lends to the use of the period in the file name to delineate semantically different concepts. For example, if ``-OO`` had been passed to the interpreter then instead of ``importlib.cpython-35.pyo`` the file name would be ``importlib.cpython-35.opt-2.pyc``. It should be noted that this change in no way affects the performance of import. Since the import system looks for a single bytecode file based on the optimization level of the interpreter already and generates a new bytecode file if it doesn't exist, the introduction of potentially more bytecode files in the ``__pycache__`` directory has no effect in terms of stat calls. The interpreter will continue to look for only a single bytecode file based on the optimization level and thus no increase in stat calls will occur. The only potentially negative result of this PEP is the probable increase in the number of ``.pyc`` files and thus increase in storage use. But for platforms where this is an issue, ``sys.dont_write_bytecode`` exists to turn off bytecode generation so that it can be controlled offline. Implementation ============== importlib --------- As ``importlib.util.cache_from_source()`` is the API that exposes bytecode file paths as well as being directly used by importlib, it requires the most critical change. As of Python 3.4, the function's signature is:: importlib.util.cache_from_source(path, debug_override=None) This PEP proposes changing the signature in Python 3.5 to:: importlib.util.cache_from_source(path, debug_override=None, *, optimization=None) The introduced ``optimization`` keyword-only parameter will control what optimization level is specified in the file name. If the argument is ``None`` then the current optimization level of the interpreter will be assumed (including no optimization). Any argument given for ``optimization`` will be passed to ``str()`` and must have ``str.isalnum()`` be true, else ``ValueError`` will be raised (this prevents invalid characters being used in the file name). If the empty string is passed in for ``optimization`` then the addition of the optimization will be suppressed, reverting to the file name format which predates this PEP. It is expected that beyond Python's own two optimization levels, third-party code will use a hash of optimization names to specify the optimization level, e.g. ``hashlib.sha256(','.join(['no dead code', 'const folding'])).hexdigest()``. While this might lead to long file names, it is assumed that most users never look at the contents of the __pycache__ directory and so this won't be an issue. The ``debug_override`` parameter will be deprecated. As the parameter expects a boolean, the integer value of the boolean will be used as if it had been provided as the argument to ``optimization`` (a ``None`` argument will mean the same as for ``optimization``). A deprecation warning will be raised when ``debug_override`` is given a value other than ``None``, but there are no plans for the complete removal of the parameter at this time (but removal will be no later than Python 4). The various module attributes for importlib.machinery which relate to bytecode file suffixes will be updated [7]_. The ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will both be documented as deprecated and set to the same value as ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be not later than Python 4). All various finders and loaders will also be updated as necessary, but updating the previous mentioned parts of importlib should be all that is required. Rest of the standard library ---------------------------- The various functions exposed by the ``py_compile`` and ``compileall`` functions will be updated as necessary to make sure they follow the new bytecode file name semantics [6]_, [1]_. The CLI for the ``compileall`` module will not be directly affected (the ``-b`` flag will be implicit as it will no longer generate ``.pyo`` files when ``-O`` is specified). Compatibility Considerations ============================ Any code directly manipulating bytecode files from Python 3.2 on will need to consider the impact of this change on their code (prior to Python 3.2 -- including all of Python 2 -- there was no __pycache__ which already necessitates bifurcating bytecode file handling support). If code was setting the ``debug_override`` argument to ``importlib.util.cache_from_source()`` then care will be needed if they want the path to a bytecode file with an optimization level of 2. Otherwise only code **not** using ``importlib.util.cache_from_source()`` will need updating. As for people who distribute bytecode-only modules (i.e., use a bytecode file instead of a source file), they will have to choose which optimization level they want their bytecode files to be since distributing a ``.pyo`` file with a ``.pyc`` file will no longer be of any use. Since people typically only distribute bytecode files for code obfuscation purposes or smaller distribution size then only having to distribute a single ``.pyc`` should actually be beneficial to these use-cases. And since the magic number for bytecode files changed in Python 3.5 to support PEP 465 there is no need to support pre-existing ``.pyo`` files [8]_. Rejected Ideas ============== Completely dropping optimization levels from CPython ---------------------------------------------------- Some have suggested that instead of accommodating the various optimization levels in CPython, we should instead drop them entirely. The argument is that significant performance gains would occur from runtime optimizations through something like a JIT and not through pre-execution bytecode optimizations. This idea is rejected for this PEP as that ignores the fact that there are people who do find the pre-existing optimization levels for CPython useful. It also assumes that no other Python interpreter would find what this PEP proposes useful. Alternative formatting of the optimization level in the file name ----------------------------------------------------------------- Using the "opt-" prefix and placing the optimization level between the cache tag and file extension is not critical. All options which have been considered are: * ``importlib.cpython-35.opt-1.pyc`` * ``importlib.cpython-35.opt1.pyc`` * ``importlib.cpython-35.o1.pyc`` * ``importlib.cpython-35.O1.pyc`` * ``importlib.cpython-35.1.pyc`` * ``importlib.cpython-35-O1.pyc`` * ``importlib.O1.cpython-35.pyc`` * ``importlib.o1.cpython-35.pyc`` * ``importlib.1.cpython-35.pyc`` These were initially rejected either because they would change the sort order of bytecode files, possible ambiguity with the cache tag, or were not self-documenting enough. An informal poll was taken and people clearly preferred the formatting proposed by the PEP [9]_. Since this topic is non-technical and of personal choice, the issue is considered solved. Embedding the optimization level in the bytecode metadata --------------------------------------------------------- Some have suggested that rather than embedding the optimization level of bytecode in the file name that it be included in the file's metadata instead. This would mean every interpreter had a single copy of bytecode at any time. Changing the optimization level would thus require rewriting the bytecode, but there would also only be a single file to care about. This has been rejected due to the fact that Python is often installed as a root-level application and thus modifying the bytecode file for modules in the standard library are always possible. In this situation integrators would need to guess at what a reasonable optimization level was for users for any/all situations. By allowing multiple optimization levels to co-exist simultaneously it frees integrators from having to guess what users want and allows users to utilize the optimization level they want. References ========== .. [1] The compileall module (https://docs.python.org/3/library/compileall.html#module-compileall) .. [2] The astoptimizer project (https://pypi.python.org/pypi/astoptimizer) .. [3] ``importlib.util.cache_from_source()`` ( https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from... ) .. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1 ( https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#... ) .. [5] PEP 3147, PYC Repository Directories, Warsaw (http://www.python.org/dev/peps/pep-3147) .. [6] The py_compile module (https://docs.python.org/3/library/compileall.html#module-compileall) .. [7] The importlib.machinery module ( https://docs.python.org/3/library/importlib.html#module-importlib.machinery) .. [8] ``importlib.util.MAGIC_NUMBER`` ( https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER ) .. [9] Informal poll of file name format options on Google+ (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm) .. [10] The PyPy Project (http://pypy.org/) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I am willing to be the BDFL for this PEP. I have tried to skim the recent discussion (only python-dev) and I don't see much remaining controversy. HOWEVER... The PEP is not clear (or at least too subtle) about the actual name for optimization level 0. If I have foo.py, and I compile it three times with three different optimization levels (no optimization; -O; -OO), and then I look in __pycache__, would I see this: # (1) foo.cpython-35.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc Or would I see this? # (2) foo.cpython-35.opt-0.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc Your lead-in ("I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today.") makes me think I should expect (1), but I can't actually pinpoint where the language of the PEP says this. On Fri, Mar 20, 2015 at 11:34 AM, Brett Cannon <bcannon@gmail.com> wrote:
I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today. I made this decision due to potential backwards-compatibility issues -- although I expect them to be minutes -- and to not force other implementations like PyPy to have some bogus value set since they don't have .pyo files to begin with (PyPy actually uses bytecode for -O and don't bother with -OO since PyPy already uses a bunch of memory when running).
Since this closes out the last open issue, I need either a BDFL decision or a BDFAP to be assigned to make a decision. Guido?
======================================
PEP: 488 Title: Elimination of PYO files Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon <brett@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Feb-2015 Post-History: 2015-03-06 2015-03-13 2015-03-20
Abstract ========
This PEP proposes eliminating the concept of PYO files from Python. To continue the support of the separation of bytecode files based on their optimization level, this PEP proposes extending the PYC file name to include the optimization level in the bytecode repository directory when it's called for (i.e., the ``__pycache__`` directory).
Rationale =========
As of today, bytecode files come in two flavours: PYC and PYO. A PYC file is the bytecode file generated and read from when no optimization level is specified at interpreter startup (i.e., ``-O`` is not specified). A PYO file represents the bytecode file that is read/written when **any** optimization level is specified (i.e., when ``-O`` **or** ``-OO`` is specified). This means that while PYC files clearly delineate the optimization level used when they were generated -- namely no optimizations beyond the peepholer -- the same is not true for PYO files. To put this in terms of optimization levels and the file extension:
- 0: ``.pyc`` - 1 (``-O``): ``.pyo`` - 2 (``-OO``): ``.pyo``
The reuse of the ``.pyo`` file extension for both level 1 and 2 optimizations means that there is no clear way to tell what optimization level was used to generate the bytecode file. In terms of reading PYO files, this can lead to an interpreter using a mixture of optimization levels with its code if the user was not careful to make sure all PYO files were generated using the same optimization level (typically done by blindly deleting all PYO files and then using the `compileall` module to compile all-new PYO files [1]_). This issue is only compounded when people optimize Python code beyond what the interpreter natively supports, e.g., using the astoptimizer project [2]_.
In terms of writing PYO files, the need to delete all PYO files every time one either changes the optimization level they want to use or are unsure of what optimization was used the last time PYO files were generated leads to unnecessary file churn. The change proposed by this PEP also allows for **all** optimization levels to be pre-compiled for bytecode files ahead of time, something that is currently impossible thanks to the reuse of the ``.pyo`` file extension for multiple optimization levels.
As for distributing bytecode-only modules, having to distribute both ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case of code obfuscation and smaller file deployments. This means that bytecode-only modules will only load from their non-optimized ``.pyc`` file name.
Proposal ========
To eliminate the ambiguity that PYO files present, this PEP proposes eliminating the concept of PYO files and their accompanying ``.pyo`` file extension. To allow for the optimization level to be unambiguous as well as to avoid having to regenerate optimized bytecode files needlessly in the `__pycache__` directory, the optimization level used to generate the bytecode file will be incorporated into the bytecode file name. When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics). This increases backwards-compatibility while also being more understanding of Python implementations which have no use for optimization levels (e.g., PyPy[10]_).
Currently bytecode file names are created by ``importlib.util.cache_from_source()``, approximately using the following expression defined by PEP 3147 [3]_, [4]_, [5]_::
'{name}.{cache_tag}.pyc'.format(name=module_name, cache_tag=sys.implementation.cache_tag)
This PEP proposes to change the expression when an optimization level is specified to::
'{name}.{cache_tag}.opt-{optimization}.pyc'.format( name=module_name, cache_tag=sys.implementation.cache_tag, optimization=str(sys.flags.optimize))
The "opt-" prefix was chosen so as to provide a visual separator from the cache tag. The placement of the optimization level after the cache tag was chosen to preserve lexicographic sort order of bytecode file names based on module name and cache tag which will not vary for a single interpreter. The "opt-" prefix was chosen over "o" so as to be somewhat self-documenting. The "opt-" prefix was chosen over "O" so as to not have any confusion in case "0" was the leading prefix of the optimization level.
A period was chosen over a hyphen as a separator so as to distinguish clearly that the optimization level is not part of the interpreter version as specified by the cache tag. It also lends to the use of the period in the file name to delineate semantically different concepts.
For example, if ``-OO`` had been passed to the interpreter then instead of ``importlib.cpython-35.pyo`` the file name would be ``importlib.cpython-35.opt-2.pyc``.
It should be noted that this change in no way affects the performance of import. Since the import system looks for a single bytecode file based on the optimization level of the interpreter already and generates a new bytecode file if it doesn't exist, the introduction of potentially more bytecode files in the ``__pycache__`` directory has no effect in terms of stat calls. The interpreter will continue to look for only a single bytecode file based on the optimization level and thus no increase in stat calls will occur.
The only potentially negative result of this PEP is the probable increase in the number of ``.pyc`` files and thus increase in storage use. But for platforms where this is an issue, ``sys.dont_write_bytecode`` exists to turn off bytecode generation so that it can be controlled offline.
Implementation ==============
importlib ---------
As ``importlib.util.cache_from_source()`` is the API that exposes bytecode file paths as well as being directly used by importlib, it requires the most critical change. As of Python 3.4, the function's signature is::
importlib.util.cache_from_source(path, debug_override=None)
This PEP proposes changing the signature in Python 3.5 to::
importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
The introduced ``optimization`` keyword-only parameter will control what optimization level is specified in the file name. If the argument is ``None`` then the current optimization level of the interpreter will be assumed (including no optimization). Any argument given for ``optimization`` will be passed to ``str()`` and must have ``str.isalnum()`` be true, else ``ValueError`` will be raised (this prevents invalid characters being used in the file name). If the empty string is passed in for ``optimization`` then the addition of the optimization will be suppressed, reverting to the file name format which predates this PEP.
It is expected that beyond Python's own two optimization levels, third-party code will use a hash of optimization names to specify the optimization level, e.g. ``hashlib.sha256(','.join(['no dead code', 'const folding'])).hexdigest()``. While this might lead to long file names, it is assumed that most users never look at the contents of the __pycache__ directory and so this won't be an issue.
The ``debug_override`` parameter will be deprecated. As the parameter expects a boolean, the integer value of the boolean will be used as if it had been provided as the argument to ``optimization`` (a ``None`` argument will mean the same as for ``optimization``). A deprecation warning will be raised when ``debug_override`` is given a value other than ``None``, but there are no plans for the complete removal of the parameter at this time (but removal will be no later than Python 4).
The various module attributes for importlib.machinery which relate to bytecode file suffixes will be updated [7]_. The ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will both be documented as deprecated and set to the same value as ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be not later than Python 4).
All various finders and loaders will also be updated as necessary, but updating the previous mentioned parts of importlib should be all that is required.
Rest of the standard library ----------------------------
The various functions exposed by the ``py_compile`` and ``compileall`` functions will be updated as necessary to make sure they follow the new bytecode file name semantics [6]_, [1]_. The CLI for the ``compileall`` module will not be directly affected (the ``-b`` flag will be implicit as it will no longer generate ``.pyo`` files when ``-O`` is specified).
Compatibility Considerations ============================
Any code directly manipulating bytecode files from Python 3.2 on will need to consider the impact of this change on their code (prior to Python 3.2 -- including all of Python 2 -- there was no __pycache__ which already necessitates bifurcating bytecode file handling support). If code was setting the ``debug_override`` argument to ``importlib.util.cache_from_source()`` then care will be needed if they want the path to a bytecode file with an optimization level of 2. Otherwise only code **not** using ``importlib.util.cache_from_source()`` will need updating.
As for people who distribute bytecode-only modules (i.e., use a bytecode file instead of a source file), they will have to choose which optimization level they want their bytecode files to be since distributing a ``.pyo`` file with a ``.pyc`` file will no longer be of any use. Since people typically only distribute bytecode files for code obfuscation purposes or smaller distribution size then only having to distribute a single ``.pyc`` should actually be beneficial to these use-cases. And since the magic number for bytecode files changed in Python 3.5 to support PEP 465 there is no need to support pre-existing ``.pyo`` files [8]_.
Rejected Ideas ==============
Completely dropping optimization levels from CPython ----------------------------------------------------
Some have suggested that instead of accommodating the various optimization levels in CPython, we should instead drop them entirely. The argument is that significant performance gains would occur from runtime optimizations through something like a JIT and not through pre-execution bytecode optimizations.
This idea is rejected for this PEP as that ignores the fact that there are people who do find the pre-existing optimization levels for CPython useful. It also assumes that no other Python interpreter would find what this PEP proposes useful.
Alternative formatting of the optimization level in the file name -----------------------------------------------------------------
Using the "opt-" prefix and placing the optimization level between the cache tag and file extension is not critical. All options which have been considered are:
* ``importlib.cpython-35.opt-1.pyc`` * ``importlib.cpython-35.opt1.pyc`` * ``importlib.cpython-35.o1.pyc`` * ``importlib.cpython-35.O1.pyc`` * ``importlib.cpython-35.1.pyc`` * ``importlib.cpython-35-O1.pyc`` * ``importlib.O1.cpython-35.pyc`` * ``importlib.o1.cpython-35.pyc`` * ``importlib.1.cpython-35.pyc``
These were initially rejected either because they would change the sort order of bytecode files, possible ambiguity with the cache tag, or were not self-documenting enough. An informal poll was taken and people clearly preferred the formatting proposed by the PEP [9]_. Since this topic is non-technical and of personal choice, the issue is considered solved.
Embedding the optimization level in the bytecode metadata ---------------------------------------------------------
Some have suggested that rather than embedding the optimization level of bytecode in the file name that it be included in the file's metadata instead. This would mean every interpreter had a single copy of bytecode at any time. Changing the optimization level would thus require rewriting the bytecode, but there would also only be a single file to care about.
This has been rejected due to the fact that Python is often installed as a root-level application and thus modifying the bytecode file for modules in the standard library are always possible. In this situation integrators would need to guess at what a reasonable optimization level was for users for any/all situations. By allowing multiple optimization levels to co-exist simultaneously it frees integrators from having to guess what users want and allows users to utilize the optimization level they want.
References ==========
.. [1] The compileall module (https://docs.python.org/3/library/compileall.html#module-compileall)
.. [2] The astoptimizer project (https://pypi.python.org/pypi/astoptimizer)
.. [3] ``importlib.util.cache_from_source()`` ( https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from... )
.. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1 ( https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#... )
.. [5] PEP 3147, PYC Repository Directories, Warsaw (http://www.python.org/dev/peps/pep-3147)
.. [6] The py_compile module (https://docs.python.org/3/library/compileall.html#module-compileall)
.. [7] The importlib.machinery module ( https://docs.python.org/3/library/importlib.html#module-importlib.machinery )
.. [8] ``importlib.util.MAGIC_NUMBER`` ( https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER )
.. [9] Informal poll of file name format options on Google+ (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm)
.. [10] The PyPy Project (http://pypy.org/)
Copyright =========
This document has been placed in the public domain.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/62bd4/62bd40343bc747a5d19d8cbed8ec1ec5ba7199d4" alt=""
On Fri, Mar 20, 2015 at 4:41 PM Guido van Rossum <guido@python.org> wrote:
I am willing to be the BDFL for this PEP. I have tried to skim the recent discussion (only python-dev) and I don't see much remaining controversy. HOWEVER... The PEP is not clear (or at least too subtle) about the actual name for optimization level 0. If I have foo.py, and I compile it three times with three different optimization levels (no optimization; -O; -OO), and then I look in __pycache__, would I see this:
# (1) foo.cpython-35.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc
Or would I see this?
# (2) foo.cpython-35.opt-0.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc
#1
Your lead-in ("I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today.") makes me think I should expect (1), but I can't actually pinpoint where the language of the PEP says this.
It was meant to be explained by "When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics)", but obviously it's a bit too subtle. I just updated the PEP with an explicit list of bytecode file name examples based on no -O, -O, and -OO. -Brett
On Fri, Mar 20, 2015 at 11:34 AM, Brett Cannon <bcannon@gmail.com> wrote:
I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today. I made this decision due to potential backwards-compatibility issues -- although I expect them to be minutes -- and to not force other implementations like PyPy to have some bogus value set since they don't have .pyo files to begin with (PyPy actually uses bytecode for -O and don't bother with -OO since PyPy already uses a bunch of memory when running).
Since this closes out the last open issue, I need either a BDFL decision or a BDFAP to be assigned to make a decision. Guido?
======================================
PEP: 488 Title: Elimination of PYO files Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon <brett@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Feb-2015 Post-History: 2015-03-06 2015-03-13 2015-03-20
Abstract ========
This PEP proposes eliminating the concept of PYO files from Python. To continue the support of the separation of bytecode files based on their optimization level, this PEP proposes extending the PYC file name to include the optimization level in the bytecode repository directory when it's called for (i.e., the ``__pycache__`` directory).
Rationale =========
As of today, bytecode files come in two flavours: PYC and PYO. A PYC file is the bytecode file generated and read from when no optimization level is specified at interpreter startup (i.e., ``-O`` is not specified). A PYO file represents the bytecode file that is read/written when **any** optimization level is specified (i.e., when ``-O`` **or** ``-OO`` is specified). This means that while PYC files clearly delineate the optimization level used when they were generated -- namely no optimizations beyond the peepholer -- the same is not true for PYO files. To put this in terms of optimization levels and the file extension:
- 0: ``.pyc`` - 1 (``-O``): ``.pyo`` - 2 (``-OO``): ``.pyo``
The reuse of the ``.pyo`` file extension for both level 1 and 2 optimizations means that there is no clear way to tell what optimization level was used to generate the bytecode file. In terms of reading PYO files, this can lead to an interpreter using a mixture of optimization levels with its code if the user was not careful to make sure all PYO files were generated using the same optimization level (typically done by blindly deleting all PYO files and then using the `compileall` module to compile all-new PYO files [1]_). This issue is only compounded when people optimize Python code beyond what the interpreter natively supports, e.g., using the astoptimizer project [2]_.
In terms of writing PYO files, the need to delete all PYO files every time one either changes the optimization level they want to use or are unsure of what optimization was used the last time PYO files were generated leads to unnecessary file churn. The change proposed by this PEP also allows for **all** optimization levels to be pre-compiled for bytecode files ahead of time, something that is currently impossible thanks to the reuse of the ``.pyo`` file extension for multiple optimization levels.
As for distributing bytecode-only modules, having to distribute both ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case of code obfuscation and smaller file deployments. This means that bytecode-only modules will only load from their non-optimized ``.pyc`` file name.
Proposal ========
To eliminate the ambiguity that PYO files present, this PEP proposes eliminating the concept of PYO files and their accompanying ``.pyo`` file extension. To allow for the optimization level to be unambiguous as well as to avoid having to regenerate optimized bytecode files needlessly in the `__pycache__` directory, the optimization level used to generate the bytecode file will be incorporated into the bytecode file name. When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics). This increases backwards-compatibility while also being more understanding of Python implementations which have no use for optimization levels (e.g., PyPy[10]_).
Currently bytecode file names are created by ``importlib.util.cache_from_source()``, approximately using the following expression defined by PEP 3147 [3]_, [4]_, [5]_::
'{name}.{cache_tag}.pyc'.format(name=module_name,
cache_tag=sys.implementation.cache_tag)
This PEP proposes to change the expression when an optimization level is specified to::
'{name}.{cache_tag}.opt-{optimization}.pyc'.format( name=module_name, cache_tag=sys.implementation.cache_tag, optimization=str(sys.flags.optimize))
The "opt-" prefix was chosen so as to provide a visual separator from the cache tag. The placement of the optimization level after the cache tag was chosen to preserve lexicographic sort order of bytecode file names based on module name and cache tag which will not vary for a single interpreter. The "opt-" prefix was chosen over "o" so as to be somewhat self-documenting. The "opt-" prefix was chosen over "O" so as to not have any confusion in case "0" was the leading prefix of the optimization level.
A period was chosen over a hyphen as a separator so as to distinguish clearly that the optimization level is not part of the interpreter version as specified by the cache tag. It also lends to the use of the period in the file name to delineate semantically different concepts.
For example, if ``-OO`` had been passed to the interpreter then instead of ``importlib.cpython-35.pyo`` the file name would be ``importlib.cpython-35.opt-2.pyc``.
It should be noted that this change in no way affects the performance of import. Since the import system looks for a single bytecode file based on the optimization level of the interpreter already and generates a new bytecode file if it doesn't exist, the introduction of potentially more bytecode files in the ``__pycache__`` directory has no effect in terms of stat calls. The interpreter will continue to look for only a single bytecode file based on the optimization level and thus no increase in stat calls will occur.
The only potentially negative result of this PEP is the probable increase in the number of ``.pyc`` files and thus increase in storage use. But for platforms where this is an issue, ``sys.dont_write_bytecode`` exists to turn off bytecode generation so that it can be controlled offline.
Implementation ==============
importlib ---------
As ``importlib.util.cache_from_source()`` is the API that exposes bytecode file paths as well as being directly used by importlib, it requires the most critical change. As of Python 3.4, the function's signature is::
importlib.util.cache_from_source(path, debug_override=None)
This PEP proposes changing the signature in Python 3.5 to::
importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
The introduced ``optimization`` keyword-only parameter will control what optimization level is specified in the file name. If the argument is ``None`` then the current optimization level of the interpreter will be assumed (including no optimization). Any argument given for ``optimization`` will be passed to ``str()`` and must have ``str.isalnum()`` be true, else ``ValueError`` will be raised (this prevents invalid characters being used in the file name). If the empty string is passed in for ``optimization`` then the addition of the optimization will be suppressed, reverting to the file name format which predates this PEP.
It is expected that beyond Python's own two optimization levels, third-party code will use a hash of optimization names to specify the optimization level, e.g. ``hashlib.sha256(','.join(['no dead code', 'const folding'])).hexdigest()``. While this might lead to long file names, it is assumed that most users never look at the contents of the __pycache__ directory and so this won't be an issue.
The ``debug_override`` parameter will be deprecated. As the parameter expects a boolean, the integer value of the boolean will be used as if it had been provided as the argument to ``optimization`` (a ``None`` argument will mean the same as for ``optimization``). A deprecation warning will be raised when ``debug_override`` is given a value other than ``None``, but there are no plans for the complete removal of the parameter at this time (but removal will be no later than Python 4).
The various module attributes for importlib.machinery which relate to bytecode file suffixes will be updated [7]_. The ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will both be documented as deprecated and set to the same value as ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be not later than Python 4).
All various finders and loaders will also be updated as necessary, but updating the previous mentioned parts of importlib should be all that is required.
Rest of the standard library ----------------------------
The various functions exposed by the ``py_compile`` and ``compileall`` functions will be updated as necessary to make sure they follow the new bytecode file name semantics [6]_, [1]_. The CLI for the ``compileall`` module will not be directly affected (the ``-b`` flag will be implicit as it will no longer generate ``.pyo`` files when ``-O`` is specified).
Compatibility Considerations ============================
Any code directly manipulating bytecode files from Python 3.2 on will need to consider the impact of this change on their code (prior to Python 3.2 -- including all of Python 2 -- there was no __pycache__ which already necessitates bifurcating bytecode file handling support). If code was setting the ``debug_override`` argument to ``importlib.util.cache_from_source()`` then care will be needed if they want the path to a bytecode file with an optimization level of 2. Otherwise only code **not** using ``importlib.util.cache_from_source()`` will need updating.
As for people who distribute bytecode-only modules (i.e., use a bytecode file instead of a source file), they will have to choose which optimization level they want their bytecode files to be since distributing a ``.pyo`` file with a ``.pyc`` file will no longer be of any use. Since people typically only distribute bytecode files for code obfuscation purposes or smaller distribution size then only having to distribute a single ``.pyc`` should actually be beneficial to these use-cases. And since the magic number for bytecode files changed in Python 3.5 to support PEP 465 there is no need to support pre-existing ``.pyo`` files [8]_.
Rejected Ideas ==============
Completely dropping optimization levels from CPython ----------------------------------------------------
Some have suggested that instead of accommodating the various optimization levels in CPython, we should instead drop them entirely. The argument is that significant performance gains would occur from runtime optimizations through something like a JIT and not through pre-execution bytecode optimizations.
This idea is rejected for this PEP as that ignores the fact that there are people who do find the pre-existing optimization levels for CPython useful. It also assumes that no other Python interpreter would find what this PEP proposes useful.
Alternative formatting of the optimization level in the file name -----------------------------------------------------------------
Using the "opt-" prefix and placing the optimization level between the cache tag and file extension is not critical. All options which have been considered are:
* ``importlib.cpython-35.opt-1.pyc`` * ``importlib.cpython-35.opt1.pyc`` * ``importlib.cpython-35.o1.pyc`` * ``importlib.cpython-35.O1.pyc`` * ``importlib.cpython-35.1.pyc`` * ``importlib.cpython-35-O1.pyc`` * ``importlib.O1.cpython-35.pyc`` * ``importlib.o1.cpython-35.pyc`` * ``importlib.1.cpython-35.pyc``
These were initially rejected either because they would change the sort order of bytecode files, possible ambiguity with the cache tag, or were not self-documenting enough. An informal poll was taken and people clearly preferred the formatting proposed by the PEP [9]_. Since this topic is non-technical and of personal choice, the issue is considered solved.
Embedding the optimization level in the bytecode metadata ---------------------------------------------------------
Some have suggested that rather than embedding the optimization level of bytecode in the file name that it be included in the file's metadata instead. This would mean every interpreter had a single copy of bytecode at any time. Changing the optimization level would thus require rewriting the bytecode, but there would also only be a single file to care about.
This has been rejected due to the fact that Python is often installed as a root-level application and thus modifying the bytecode file for modules in the standard library are always possible. In this situation integrators would need to guess at what a reasonable optimization level was for users for any/all situations. By allowing multiple optimization levels to co-exist simultaneously it frees integrators from having to guess what users want and allows users to utilize the optimization level they want.
References ==========
.. [1] The compileall module (https://docs.python.org/3/library/compileall.html#module-compileall)
.. [2] The astoptimizer project (https://pypi.python.org/pypi/astoptimizer)
.. [3] ``importlib.util.cache_from_source()`` ( https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from... )
.. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1 ( https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#... )
.. [5] PEP 3147, PYC Repository Directories, Warsaw (http://www.python.org/dev/peps/pep-3147)
.. [6] The py_compile module (https://docs.python.org/3/library/compileall.html#module-compileall)
.. [7] The importlib.machinery module ( https://docs.python.org/3/library/importlib.html#module-importlib.machinery )
.. [8] ``importlib.util.MAGIC_NUMBER`` ( https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER )
.. [9] Informal poll of file name format options on Google+ (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm)
.. [10] The PyPy Project (http://pypy.org/)
Copyright =========
This document has been placed in the public domain.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Awesome, that's what I was hoping. Accepted! Congrats and thank you very much for writing the PEP and guiding the discussion. On Fri, Mar 20, 2015 at 4:00 PM, Brett Cannon <bcannon@gmail.com> wrote:
On Fri, Mar 20, 2015 at 4:41 PM Guido van Rossum <guido@python.org> wrote:
I am willing to be the BDFL for this PEP. I have tried to skim the recent discussion (only python-dev) and I don't see much remaining controversy. HOWEVER... The PEP is not clear (or at least too subtle) about the actual name for optimization level 0. If I have foo.py, and I compile it three times with three different optimization levels (no optimization; -O; -OO), and then I look in __pycache__, would I see this:
# (1) foo.cpython-35.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc
Or would I see this?
# (2) foo.cpython-35.opt-0.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc
#1
Your lead-in ("I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today.") makes me think I should expect (1), but I can't actually pinpoint where the language of the PEP says this.
It was meant to be explained by "When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics)", but obviously it's a bit too subtle. I just updated the PEP with an explicit list of bytecode file name examples based on no -O, -O, and -OO.
-Brett
On Fri, Mar 20, 2015 at 11:34 AM, Brett Cannon <bcannon@gmail.com> wrote:
I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today. I made this decision due to potential backwards-compatibility issues -- although I expect them to be minutes -- and to not force other implementations like PyPy to have some bogus value set since they don't have .pyo files to begin with (PyPy actually uses bytecode for -O and don't bother with -OO since PyPy already uses a bunch of memory when running).
Since this closes out the last open issue, I need either a BDFL decision or a BDFAP to be assigned to make a decision. Guido?
======================================
PEP: 488 Title: Elimination of PYO files Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon <brett@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Feb-2015 Post-History: 2015-03-06 2015-03-13 2015-03-20
Abstract ========
This PEP proposes eliminating the concept of PYO files from Python. To continue the support of the separation of bytecode files based on their optimization level, this PEP proposes extending the PYC file name to include the optimization level in the bytecode repository directory when it's called for (i.e., the ``__pycache__`` directory).
Rationale =========
As of today, bytecode files come in two flavours: PYC and PYO. A PYC file is the bytecode file generated and read from when no optimization level is specified at interpreter startup (i.e., ``-O`` is not specified). A PYO file represents the bytecode file that is read/written when **any** optimization level is specified (i.e., when ``-O`` **or** ``-OO`` is specified). This means that while PYC files clearly delineate the optimization level used when they were generated -- namely no optimizations beyond the peepholer -- the same is not true for PYO files. To put this in terms of optimization levels and the file extension:
- 0: ``.pyc`` - 1 (``-O``): ``.pyo`` - 2 (``-OO``): ``.pyo``
The reuse of the ``.pyo`` file extension for both level 1 and 2 optimizations means that there is no clear way to tell what optimization level was used to generate the bytecode file. In terms of reading PYO files, this can lead to an interpreter using a mixture of optimization levels with its code if the user was not careful to make sure all PYO files were generated using the same optimization level (typically done by blindly deleting all PYO files and then using the `compileall` module to compile all-new PYO files [1]_). This issue is only compounded when people optimize Python code beyond what the interpreter natively supports, e.g., using the astoptimizer project [2]_.
In terms of writing PYO files, the need to delete all PYO files every time one either changes the optimization level they want to use or are unsure of what optimization was used the last time PYO files were generated leads to unnecessary file churn. The change proposed by this PEP also allows for **all** optimization levels to be pre-compiled for bytecode files ahead of time, something that is currently impossible thanks to the reuse of the ``.pyo`` file extension for multiple optimization levels.
As for distributing bytecode-only modules, having to distribute both ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case of code obfuscation and smaller file deployments. This means that bytecode-only modules will only load from their non-optimized ``.pyc`` file name.
Proposal ========
To eliminate the ambiguity that PYO files present, this PEP proposes eliminating the concept of PYO files and their accompanying ``.pyo`` file extension. To allow for the optimization level to be unambiguous as well as to avoid having to regenerate optimized bytecode files needlessly in the `__pycache__` directory, the optimization level used to generate the bytecode file will be incorporated into the bytecode file name. When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics). This increases backwards-compatibility while also being more understanding of Python implementations which have no use for optimization levels (e.g., PyPy[10]_).
Currently bytecode file names are created by ``importlib.util.cache_from_source()``, approximately using the following expression defined by PEP 3147 [3]_, [4]_, [5]_::
'{name}.{cache_tag}.pyc'.format(name=module_name,
cache_tag=sys.implementation.cache_tag)
This PEP proposes to change the expression when an optimization level is specified to::
'{name}.{cache_tag}.opt-{optimization}.pyc'.format( name=module_name, cache_tag=sys.implementation.cache_tag, optimization=str(sys.flags.optimize))
The "opt-" prefix was chosen so as to provide a visual separator from the cache tag. The placement of the optimization level after the cache tag was chosen to preserve lexicographic sort order of bytecode file names based on module name and cache tag which will not vary for a single interpreter. The "opt-" prefix was chosen over "o" so as to be somewhat self-documenting. The "opt-" prefix was chosen over "O" so as to not have any confusion in case "0" was the leading prefix of the optimization level.
A period was chosen over a hyphen as a separator so as to distinguish clearly that the optimization level is not part of the interpreter version as specified by the cache tag. It also lends to the use of the period in the file name to delineate semantically different concepts.
For example, if ``-OO`` had been passed to the interpreter then instead of ``importlib.cpython-35.pyo`` the file name would be ``importlib.cpython-35.opt-2.pyc``.
It should be noted that this change in no way affects the performance of import. Since the import system looks for a single bytecode file based on the optimization level of the interpreter already and generates a new bytecode file if it doesn't exist, the introduction of potentially more bytecode files in the ``__pycache__`` directory has no effect in terms of stat calls. The interpreter will continue to look for only a single bytecode file based on the optimization level and thus no increase in stat calls will occur.
The only potentially negative result of this PEP is the probable increase in the number of ``.pyc`` files and thus increase in storage use. But for platforms where this is an issue, ``sys.dont_write_bytecode`` exists to turn off bytecode generation so that it can be controlled offline.
Implementation ==============
importlib ---------
As ``importlib.util.cache_from_source()`` is the API that exposes bytecode file paths as well as being directly used by importlib, it requires the most critical change. As of Python 3.4, the function's signature is::
importlib.util.cache_from_source(path, debug_override=None)
This PEP proposes changing the signature in Python 3.5 to::
importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
The introduced ``optimization`` keyword-only parameter will control what optimization level is specified in the file name. If the argument is ``None`` then the current optimization level of the interpreter will be assumed (including no optimization). Any argument given for ``optimization`` will be passed to ``str()`` and must have ``str.isalnum()`` be true, else ``ValueError`` will be raised (this prevents invalid characters being used in the file name). If the empty string is passed in for ``optimization`` then the addition of the optimization will be suppressed, reverting to the file name format which predates this PEP.
It is expected that beyond Python's own two optimization levels, third-party code will use a hash of optimization names to specify the optimization level, e.g. ``hashlib.sha256(','.join(['no dead code', 'const folding'])).hexdigest()``. While this might lead to long file names, it is assumed that most users never look at the contents of the __pycache__ directory and so this won't be an issue.
The ``debug_override`` parameter will be deprecated. As the parameter expects a boolean, the integer value of the boolean will be used as if it had been provided as the argument to ``optimization`` (a ``None`` argument will mean the same as for ``optimization``). A deprecation warning will be raised when ``debug_override`` is given a value other than ``None``, but there are no plans for the complete removal of the parameter at this time (but removal will be no later than Python 4).
The various module attributes for importlib.machinery which relate to bytecode file suffixes will be updated [7]_. The ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will both be documented as deprecated and set to the same value as ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be not later than Python 4).
All various finders and loaders will also be updated as necessary, but updating the previous mentioned parts of importlib should be all that is required.
Rest of the standard library ----------------------------
The various functions exposed by the ``py_compile`` and ``compileall`` functions will be updated as necessary to make sure they follow the new bytecode file name semantics [6]_, [1]_. The CLI for the ``compileall`` module will not be directly affected (the ``-b`` flag will be implicit as it will no longer generate ``.pyo`` files when ``-O`` is specified).
Compatibility Considerations ============================
Any code directly manipulating bytecode files from Python 3.2 on will need to consider the impact of this change on their code (prior to Python 3.2 -- including all of Python 2 -- there was no __pycache__ which already necessitates bifurcating bytecode file handling support). If code was setting the ``debug_override`` argument to ``importlib.util.cache_from_source()`` then care will be needed if they want the path to a bytecode file with an optimization level of 2. Otherwise only code **not** using ``importlib.util.cache_from_source()`` will need updating.
As for people who distribute bytecode-only modules (i.e., use a bytecode file instead of a source file), they will have to choose which optimization level they want their bytecode files to be since distributing a ``.pyo`` file with a ``.pyc`` file will no longer be of any use. Since people typically only distribute bytecode files for code obfuscation purposes or smaller distribution size then only having to distribute a single ``.pyc`` should actually be beneficial to these use-cases. And since the magic number for bytecode files changed in Python 3.5 to support PEP 465 there is no need to support pre-existing ``.pyo`` files [8]_.
Rejected Ideas ==============
Completely dropping optimization levels from CPython ----------------------------------------------------
Some have suggested that instead of accommodating the various optimization levels in CPython, we should instead drop them entirely. The argument is that significant performance gains would occur from runtime optimizations through something like a JIT and not through pre-execution bytecode optimizations.
This idea is rejected for this PEP as that ignores the fact that there are people who do find the pre-existing optimization levels for CPython useful. It also assumes that no other Python interpreter would find what this PEP proposes useful.
Alternative formatting of the optimization level in the file name -----------------------------------------------------------------
Using the "opt-" prefix and placing the optimization level between the cache tag and file extension is not critical. All options which have been considered are:
* ``importlib.cpython-35.opt-1.pyc`` * ``importlib.cpython-35.opt1.pyc`` * ``importlib.cpython-35.o1.pyc`` * ``importlib.cpython-35.O1.pyc`` * ``importlib.cpython-35.1.pyc`` * ``importlib.cpython-35-O1.pyc`` * ``importlib.O1.cpython-35.pyc`` * ``importlib.o1.cpython-35.pyc`` * ``importlib.1.cpython-35.pyc``
These were initially rejected either because they would change the sort order of bytecode files, possible ambiguity with the cache tag, or were not self-documenting enough. An informal poll was taken and people clearly preferred the formatting proposed by the PEP [9]_. Since this topic is non-technical and of personal choice, the issue is considered solved.
Embedding the optimization level in the bytecode metadata ---------------------------------------------------------
Some have suggested that rather than embedding the optimization level of bytecode in the file name that it be included in the file's metadata instead. This would mean every interpreter had a single copy of bytecode at any time. Changing the optimization level would thus require rewriting the bytecode, but there would also only be a single file to care about.
This has been rejected due to the fact that Python is often installed as a root-level application and thus modifying the bytecode file for modules in the standard library are always possible. In this situation integrators would need to guess at what a reasonable optimization level was for users for any/all situations. By allowing multiple optimization levels to co-exist simultaneously it frees integrators from having to guess what users want and allows users to utilize the optimization level they want.
References ==========
.. [1] The compileall module (https://docs.python.org/3/library/compileall.html#module-compileall)
.. [2] The astoptimizer project (https://pypi.python.org/pypi/astoptimizer)
.. [3] ``importlib.util.cache_from_source()`` ( https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from... )
.. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1 ( https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#... )
.. [5] PEP 3147, PYC Repository Directories, Warsaw (http://www.python.org/dev/peps/pep-3147)
.. [6] The py_compile module (https://docs.python.org/3/library/compileall.html#module-compileall)
.. [7] The importlib.machinery module ( https://docs.python.org/3/library/importlib.html#module-importlib.machinery )
.. [8] ``importlib.util.MAGIC_NUMBER`` ( https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER )
.. [9] Informal poll of file name format options on Google+ (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm)
.. [10] The PyPy Project (http://pypy.org/)
Copyright =========
This document has been placed in the public domain.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/62bd4/62bd40343bc747a5d19d8cbed8ec1ec5ba7199d4" alt=""
Thanks! PEP 488 is now marked as accepted. I expect I will have PEP 488 implemented before the PyCon sprints are over (work will be tracked in http://bugs.python.org/issue23731). On Fri, Mar 20, 2015 at 8:06 PM Guido van Rossum <guido@python.org> wrote:
Awesome, that's what I was hoping. Accepted! Congrats and thank you very much for writing the PEP and guiding the discussion.
On Fri, Mar 20, 2015 at 4:00 PM, Brett Cannon <bcannon@gmail.com> wrote:
On Fri, Mar 20, 2015 at 4:41 PM Guido van Rossum <guido@python.org> wrote:
I am willing to be the BDFL for this PEP. I have tried to skim the recent discussion (only python-dev) and I don't see much remaining controversy. HOWEVER... The PEP is not clear (or at least too subtle) about the actual name for optimization level 0. If I have foo.py, and I compile it three times with three different optimization levels (no optimization; -O; -OO), and then I look in __pycache__, would I see this:
# (1) foo.cpython-35.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc
Or would I see this?
# (2) foo.cpython-35.opt-0.pyc foo.cpython-35.opt-1.pyc foo.cpython-35.opt-2.pyc
#1
Your lead-in ("I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today.") makes me think I should expect (1), but I can't actually pinpoint where the language of the PEP says this.
It was meant to be explained by "When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics)", but obviously it's a bit too subtle. I just updated the PEP with an explicit list of bytecode file name examples based on no -O, -O, and -OO.
-Brett
On Fri, Mar 20, 2015 at 11:34 AM, Brett Cannon <bcannon@gmail.com> wrote:
I have decided to have the default case of no optimization levels mean that the .pyc file name will have *no* optimization level specified in the name and thus be just as it is today. I made this decision due to potential backwards-compatibility issues -- although I expect them to be minutes -- and to not force other implementations like PyPy to have some bogus value set since they don't have .pyo files to begin with (PyPy actually uses bytecode for -O and don't bother with -OO since PyPy already uses a bunch of memory when running).
Since this closes out the last open issue, I need either a BDFL decision or a BDFAP to be assigned to make a decision. Guido?
======================================
PEP: 488 Title: Elimination of PYO files Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon <brett@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Feb-2015 Post-History: 2015-03-06 2015-03-13 2015-03-20
Abstract ========
This PEP proposes eliminating the concept of PYO files from Python. To continue the support of the separation of bytecode files based on their optimization level, this PEP proposes extending the PYC file name to include the optimization level in the bytecode repository directory when it's called for (i.e., the ``__pycache__`` directory).
Rationale =========
As of today, bytecode files come in two flavours: PYC and PYO. A PYC file is the bytecode file generated and read from when no optimization level is specified at interpreter startup (i.e., ``-O`` is not specified). A PYO file represents the bytecode file that is read/written when **any** optimization level is specified (i.e., when ``-O`` **or** ``-OO`` is specified). This means that while PYC files clearly delineate the optimization level used when they were generated -- namely no optimizations beyond the peepholer -- the same is not true for PYO files. To put this in terms of optimization levels and the file extension:
- 0: ``.pyc`` - 1 (``-O``): ``.pyo`` - 2 (``-OO``): ``.pyo``
The reuse of the ``.pyo`` file extension for both level 1 and 2 optimizations means that there is no clear way to tell what optimization level was used to generate the bytecode file. In terms of reading PYO files, this can lead to an interpreter using a mixture of optimization levels with its code if the user was not careful to make sure all PYO files were generated using the same optimization level (typically done by blindly deleting all PYO files and then using the `compileall` module to compile all-new PYO files [1]_). This issue is only compounded when people optimize Python code beyond what the interpreter natively supports, e.g., using the astoptimizer project [2]_.
In terms of writing PYO files, the need to delete all PYO files every time one either changes the optimization level they want to use or are unsure of what optimization was used the last time PYO files were generated leads to unnecessary file churn. The change proposed by this PEP also allows for **all** optimization levels to be pre-compiled for bytecode files ahead of time, something that is currently impossible thanks to the reuse of the ``.pyo`` file extension for multiple optimization levels.
As for distributing bytecode-only modules, having to distribute both ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case of code obfuscation and smaller file deployments. This means that bytecode-only modules will only load from their non-optimized ``.pyc`` file name.
Proposal ========
To eliminate the ambiguity that PYO files present, this PEP proposes eliminating the concept of PYO files and their accompanying ``.pyo`` file extension. To allow for the optimization level to be unambiguous as well as to avoid having to regenerate optimized bytecode files needlessly in the `__pycache__` directory, the optimization level used to generate the bytecode file will be incorporated into the bytecode file name. When no optimization level is specified, the pre-PEP ``.pyc`` file name will be used (i.e., no change in file name semantics). This increases backwards-compatibility while also being more understanding of Python implementations which have no use for optimization levels (e.g., PyPy[10]_).
Currently bytecode file names are created by ``importlib.util.cache_from_source()``, approximately using the following expression defined by PEP 3147 [3]_, [4]_, [5]_::
'{name}.{cache_tag}.pyc'.format(name=module_name,
cache_tag=sys.implementation.cache_tag)
This PEP proposes to change the expression when an optimization level is specified to::
'{name}.{cache_tag}.opt-{optimization}.pyc'.format( name=module_name, cache_tag=sys.implementation.cache_tag, optimization=str(sys.flags.optimize))
The "opt-" prefix was chosen so as to provide a visual separator from the cache tag. The placement of the optimization level after the cache tag was chosen to preserve lexicographic sort order of bytecode file names based on module name and cache tag which will not vary for a single interpreter. The "opt-" prefix was chosen over "o" so as to be somewhat self-documenting. The "opt-" prefix was chosen over "O" so as to not have any confusion in case "0" was the leading prefix of the optimization level.
A period was chosen over a hyphen as a separator so as to distinguish clearly that the optimization level is not part of the interpreter version as specified by the cache tag. It also lends to the use of the period in the file name to delineate semantically different concepts.
For example, if ``-OO`` had been passed to the interpreter then instead of ``importlib.cpython-35.pyo`` the file name would be ``importlib.cpython-35.opt-2.pyc``.
It should be noted that this change in no way affects the performance of import. Since the import system looks for a single bytecode file based on the optimization level of the interpreter already and generates a new bytecode file if it doesn't exist, the introduction of potentially more bytecode files in the ``__pycache__`` directory has no effect in terms of stat calls. The interpreter will continue to look for only a single bytecode file based on the optimization level and thus no increase in stat calls will occur.
The only potentially negative result of this PEP is the probable increase in the number of ``.pyc`` files and thus increase in storage use. But for platforms where this is an issue, ``sys.dont_write_bytecode`` exists to turn off bytecode generation so that it can be controlled offline.
Implementation ==============
importlib ---------
As ``importlib.util.cache_from_source()`` is the API that exposes bytecode file paths as well as being directly used by importlib, it requires the most critical change. As of Python 3.4, the function's signature is::
importlib.util.cache_from_source(path, debug_override=None)
This PEP proposes changing the signature in Python 3.5 to::
importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
The introduced ``optimization`` keyword-only parameter will control what optimization level is specified in the file name. If the argument is ``None`` then the current optimization level of the interpreter will be assumed (including no optimization). Any argument given for ``optimization`` will be passed to ``str()`` and must have ``str.isalnum()`` be true, else ``ValueError`` will be raised (this prevents invalid characters being used in the file name). If the empty string is passed in for ``optimization`` then the addition of the optimization will be suppressed, reverting to the file name format which predates this PEP.
It is expected that beyond Python's own two optimization levels, third-party code will use a hash of optimization names to specify the optimization level, e.g. ``hashlib.sha256(','.join(['no dead code', 'const folding'])).hexdigest()``. While this might lead to long file names, it is assumed that most users never look at the contents of the __pycache__ directory and so this won't be an issue.
The ``debug_override`` parameter will be deprecated. As the parameter expects a boolean, the integer value of the boolean will be used as if it had been provided as the argument to ``optimization`` (a ``None`` argument will mean the same as for ``optimization``). A deprecation warning will be raised when ``debug_override`` is given a value other than ``None``, but there are no plans for the complete removal of the parameter at this time (but removal will be no later than Python 4).
The various module attributes for importlib.machinery which relate to bytecode file suffixes will be updated [7]_. The ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will both be documented as deprecated and set to the same value as ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be not later than Python 4).
All various finders and loaders will also be updated as necessary, but updating the previous mentioned parts of importlib should be all that is required.
Rest of the standard library ----------------------------
The various functions exposed by the ``py_compile`` and ``compileall`` functions will be updated as necessary to make sure they follow the new bytecode file name semantics [6]_, [1]_. The CLI for the ``compileall`` module will not be directly affected (the ``-b`` flag will be implicit as it will no longer generate ``.pyo`` files when ``-O`` is specified).
Compatibility Considerations ============================
Any code directly manipulating bytecode files from Python 3.2 on will need to consider the impact of this change on their code (prior to Python 3.2 -- including all of Python 2 -- there was no __pycache__ which already necessitates bifurcating bytecode file handling support). If code was setting the ``debug_override`` argument to ``importlib.util.cache_from_source()`` then care will be needed if they want the path to a bytecode file with an optimization level of 2. Otherwise only code **not** using ``importlib.util.cache_from_source()`` will need updating.
As for people who distribute bytecode-only modules (i.e., use a bytecode file instead of a source file), they will have to choose which optimization level they want their bytecode files to be since distributing a ``.pyo`` file with a ``.pyc`` file will no longer be of any use. Since people typically only distribute bytecode files for code obfuscation purposes or smaller distribution size then only having to distribute a single ``.pyc`` should actually be beneficial to these use-cases. And since the magic number for bytecode files changed in Python 3.5 to support PEP 465 there is no need to support pre-existing ``.pyo`` files [8]_.
Rejected Ideas ==============
Completely dropping optimization levels from CPython ----------------------------------------------------
Some have suggested that instead of accommodating the various optimization levels in CPython, we should instead drop them entirely. The argument is that significant performance gains would occur from runtime optimizations through something like a JIT and not through pre-execution bytecode optimizations.
This idea is rejected for this PEP as that ignores the fact that there are people who do find the pre-existing optimization levels for CPython useful. It also assumes that no other Python interpreter would find what this PEP proposes useful.
Alternative formatting of the optimization level in the file name -----------------------------------------------------------------
Using the "opt-" prefix and placing the optimization level between the cache tag and file extension is not critical. All options which have been considered are:
* ``importlib.cpython-35.opt-1.pyc`` * ``importlib.cpython-35.opt1.pyc`` * ``importlib.cpython-35.o1.pyc`` * ``importlib.cpython-35.O1.pyc`` * ``importlib.cpython-35.1.pyc`` * ``importlib.cpython-35-O1.pyc`` * ``importlib.O1.cpython-35.pyc`` * ``importlib.o1.cpython-35.pyc`` * ``importlib.1.cpython-35.pyc``
These were initially rejected either because they would change the sort order of bytecode files, possible ambiguity with the cache tag, or were not self-documenting enough. An informal poll was taken and people clearly preferred the formatting proposed by the PEP [9]_. Since this topic is non-technical and of personal choice, the issue is considered solved.
Embedding the optimization level in the bytecode metadata ---------------------------------------------------------
Some have suggested that rather than embedding the optimization level of bytecode in the file name that it be included in the file's metadata instead. This would mean every interpreter had a single copy of bytecode at any time. Changing the optimization level would thus require rewriting the bytecode, but there would also only be a single file to care about.
This has been rejected due to the fact that Python is often installed as a root-level application and thus modifying the bytecode file for modules in the standard library are always possible. In this situation integrators would need to guess at what a reasonable optimization level was for users for any/all situations. By allowing multiple optimization levels to co-exist simultaneously it frees integrators from having to guess what users want and allows users to utilize the optimization level they want.
References ==========
.. [1] The compileall module (https://docs.python.org/3/library/compileall.html#module-compileall )
.. [2] The astoptimizer project (https://pypi.python.org/pypi/astoptimizer)
.. [3] ``importlib.util.cache_from_source()`` ( https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from... )
.. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1 ( https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#... )
.. [5] PEP 3147, PYC Repository Directories, Warsaw (http://www.python.org/dev/peps/pep-3147)
.. [6] The py_compile module (https://docs.python.org/3/library/compileall.html#module-compileall )
.. [7] The importlib.machinery module ( https://docs.python.org/3/library/importlib.html#module-importlib.machinery )
.. [8] ``importlib.util.MAGIC_NUMBER`` ( https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER )
.. [9] Informal poll of file name format options on Google+ (https://plus.google.com/u/0/+BrettCannon/posts/fZynLNwHWGm)
.. [10] The PyPy Project (http://pypy.org/)
Copyright =========
This document has been placed in the public domain.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Mar 21, 2015 7:44 AM, "Brett Cannon" <bcannon@gmail.com> wrote:
Thanks! PEP 488 is now marked as accepted. I expect I will have PEP 488
implemented before the PyCon sprints are over (work will be tracked in http://bugs.python.org/issue23731).
On Fri, Mar 20, 2015 at 8:06 PM Guido van Rossum <guido@python.org> wrote:
Awesome, that's what I was hoping. Accepted! Congrats and thank you very
much for writing the PEP and guiding the discussion. Congratulations Brett! This is a welcome change. I'll be sure to give you a review. -eric
participants (3)
-
Brett Cannon
-
Eric Snow
-
Guido van Rossum