Merge .pyc into .pyo--store multiple code objects in one file?

A .pyc file is made up of these three elements: 4 byte magic number 4 byte timestamp marshaled code object A .pyo file is the same except the code object has been optimized. I ask you: why gunk up the filesystem with two files when one would do? I propose we change the pyc file so it can contain multiple code objects. Or, indeed, multiple arbitrary objects. The .pyc file could become as a general-purpose cache for data relevant to the .py file. For example, perhaps the Unladen Swallow guys could cache post-JITted code. Or the wordcode-based interpreter could cache its wordcode equivalent. Lots of implementation choices suggest themselves. Here's the cheapest approach that seems suitable: 4 byte magic number 4 byte timestamp marshaled array of pairs of ints, alternating "id of object" with "relative offset of object from the end of this marshaled array" marshaled code object 1 marshaled code object 2 ... If you look for your cached object inside and it's not there, you compute your object then rewrite the file adding your id and offset to the end of the array. (Your offset will always be the former size of the file.) If the timestamp has changed, blow away all objects and start over with an empty array. If it'd be too disruptive to change .pyc/.pyo files this way, would switching to a new file extension be better? I suspect this probably isn't actually a good idea, /larry/

I'm a dope--this idea has already been batted around and discarded, resulting in the PYD directory PEP. Sorry I missed it when it happened. Ignore me, /larry/

Larry Hastings <larry@...> writes:
I think we should dump the lie about "optimized" bytecode when the only optimization is that we strip some docstrings, disable asserts and set __debug__ to False. There should be only one possible bytecode file (XXX.pyc), and we could provide a "strip" tool (and/or corresponding function in the compileall module) for people for whom minimizing bytecode file size is important. Also, it would be interesting to know who bothers to use "python -O" (or "-OO"). I know I never use it. Antoine.

My team uses the optimized flag when building 'binaries' via py2exe before distributing. In the end it really is security through obscurity, but something is always better than nothing. On a different note, I don't think asserts belong in production. I fully concede that there is no real consensus either way on this belief so I think it's important to have the option so people can run in the mode which they prefer. -Mark On Tue, Feb 2, 2010 at 10:16 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

On Tue, Feb 2, 2010 at 07:16, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think the hope has always been that the peepholer would be extended to do some tweaks that would only be reasonable under a -O flag. Obviously this has not happened and who knows if it ever will. But if PEP 3147 catches on this should become less of an issue.
Would also require a flag for distutils for when you are installing a package that is for production compared to debug use you byte-compile to the level you want. But I think the compileall/strip/distutils solution would be enough to cover all major cases.
Also, it would be interesting to know who bothers to use "python -O" (or "-OO"). I know I never use it.
Do any other languages do it this way with separate files? Or do they tend toward not even having the option and the few that do use a strip tool? I honestly can't think of any languages off the top of my head where the -O flag is even actively considered by everyone beyond C/C++. -Brett

On Tue, Feb 2, 2010 at 12:49 PM, Brett Cannon <brett@python.org> wrote:
Unladen Swallow has a number of optimizations in mind that tweak corner cases of Python semantics, which we'd like to hide behind a compiler flag so that you have to explicitly ask for them. We haven't yet implemented these optimizations, since they will likely be controversial and require discussion. Feel free to remove the -O flag in the meantime; it can be added back later. Collin Winter

On Tue, Feb 2, 2010 at 1:35 PM, Collin Winter <collinw@gmail.com> wrote:
Anything new will need its own flag to enable/disable anyways (if it is insufficient to leave it to being done at runtime on a per module basis via a sys.xxx() call) as people already rely on today's exact behavior of -O. We should never equate disabling assert statements (a change to the actual program logic) with actual optimization. -gps

We use -O where I work in performance critical places. It also better than halves the startup time, which some users are sensitive to. -- Zachary Burns (407)590-4814 Aim - Zac256FL Production Engineer (Digital Overlord) Zindagi Games

On Mon, Feb 15, 2010 at 2:10 PM, Zac Burns <zac256@gmail.com> wrote:
On second thought, what I just said about halving the startup time isn't true - it's probably something else that we are doing when also applying -O. Sorry about that. It does contribute to better startup time though... just not half. -- Zachary Burns (407)590-4814 Aim - Zac256FL Production Engineer (Digital Overlord) Zindagi Games

I wonder what people have again the -O flag. You are probably missing an important use case: when writing larger applications you typically add lots of debug sections to the code. Using the -O flag, this code doesn't only get disabled, it gets removed and that can lead to a serious optimization, hence the name of the flag. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 15 2010)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Antoine Pitrou wrote:
The fact that asserts and if __debug__ blocks get skipped under -O strikes me as more than adequate reason to consider optimised and non-optimised bytecode as different things. Sure, we don't mess with things as extensively as C/C++ do with their more aggressive optimisation settings, but completely skipping all assert statements and some if statements isn't something we can enable by default (and can't necessarily do with an after-the-fact stripping tool, since the removal of entire statements can affect the construction of the symbol table). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Le mercredi 03 février 2010 à 06:58 +1000, Nick Coghlan a écrit :
Well, do people rely on this nowadays? With unit testing becoming very common (and recommended) practice, I'm not sure what sprinkling asserts in debug mode really brings.

On Tue, Feb 2, 2010 at 13:05, Antoine Pitrou <solipsis@pitrou.net> wrote:
We might not rely on them, but I am sure there are those who prefer them on top of TDD (or instead of). Removing them would be somewhat forcing a programming methodology on users which I think we should avoid. -Brett

Brett Cannon wrote:
As Curt said, they're a great form of executable comment, unit tests or no unit tests. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Tue, Feb 2, 2010 at 9:42 PM, Brett Cannon <brett@python.org> wrote:
FWIW, I'm one of 'those': I find there are assert uses that can't be covered as neatly or directly with unit tests. They're great for documenting loop invariants in complicated algorithms, for example: def xgcd(b, c): """Extended Euclidean algorithm: return integers g, x and y such that b*x + c*y == g and g is a greatest common divisor of b and c.""" g1, g2 = b, c x1, x2 = 1, 0 y1, y2 = 0, 1 while g2: assert x1 * b + y1 * c == g1 and x2 * b + y2 * c == g2 q = g1 // g2 g1, g2 = g2, g1 - q*g2 x1, x2 = x2, x1 - q*x2 y1, y2 = y2, y1 - q*y2 return g1, x1, y1 (Not that the above counts as a complicated algorithm, of course.) Mark

n Feb 2, 3:16=A0pm, Antoine Pitrou <solip...@pitrou.net> wrote:
We run -O in production so that __debug__ is set to False. But I'd much rather have only .pyc files and a strip tool + a flag (maybe still -O) to switch between __debug__ being True or False (extra bonus points for the strip tool being able to strip out "if __debug__:" blocks too). The reason I'd rather have only .pyc files is that now if you only have .pyo files and don't start with -O you don't find the modules. This is inconsistent as well: inside a zipfile the .pyo files are used both when using -O and when not using that option. Only an inconvenience but still, it's something a customer doens't know if I quickly ask the output of some "python -c 'something'" and forget to tell them to use -O. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org

Antoine Pitrou wrote:
If there was an easy way to get info from python on the status and location of the bytecode files it uses, it might not be so bad.
Then having .pyc also be used for optimized byte code could work. On another note, it always seemed a bit backwards to me that the -OO wasn't the default output and if it were, we would have options to run in debug mode which would include 'docstrings', 'if debug ...' statements, and 'assert ...' statements, for testing purposes. <shrug> Ron

Am 03.02.2010 01:11, schrieb Ron Adam:
Removal of docstrings should be separate from the rest, though. Quite a few programs would stop working properly without docstrings, with the docstring being used as a kind of easily introspectable annotation to a class or function. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

I'm a dope--this idea has already been batted around and discarded, resulting in the PYD directory PEP. Sorry I missed it when it happened. Ignore me, /larry/

Larry Hastings <larry@...> writes:
I think we should dump the lie about "optimized" bytecode when the only optimization is that we strip some docstrings, disable asserts and set __debug__ to False. There should be only one possible bytecode file (XXX.pyc), and we could provide a "strip" tool (and/or corresponding function in the compileall module) for people for whom minimizing bytecode file size is important. Also, it would be interesting to know who bothers to use "python -O" (or "-OO"). I know I never use it. Antoine.

My team uses the optimized flag when building 'binaries' via py2exe before distributing. In the end it really is security through obscurity, but something is always better than nothing. On a different note, I don't think asserts belong in production. I fully concede that there is no real consensus either way on this belief so I think it's important to have the option so people can run in the mode which they prefer. -Mark On Tue, Feb 2, 2010 at 10:16 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

On Tue, Feb 2, 2010 at 07:16, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think the hope has always been that the peepholer would be extended to do some tweaks that would only be reasonable under a -O flag. Obviously this has not happened and who knows if it ever will. But if PEP 3147 catches on this should become less of an issue.
Would also require a flag for distutils for when you are installing a package that is for production compared to debug use you byte-compile to the level you want. But I think the compileall/strip/distutils solution would be enough to cover all major cases.
Also, it would be interesting to know who bothers to use "python -O" (or "-OO"). I know I never use it.
Do any other languages do it this way with separate files? Or do they tend toward not even having the option and the few that do use a strip tool? I honestly can't think of any languages off the top of my head where the -O flag is even actively considered by everyone beyond C/C++. -Brett

On Tue, Feb 2, 2010 at 12:49 PM, Brett Cannon <brett@python.org> wrote:
Unladen Swallow has a number of optimizations in mind that tweak corner cases of Python semantics, which we'd like to hide behind a compiler flag so that you have to explicitly ask for them. We haven't yet implemented these optimizations, since they will likely be controversial and require discussion. Feel free to remove the -O flag in the meantime; it can be added back later. Collin Winter

On Tue, Feb 2, 2010 at 1:35 PM, Collin Winter <collinw@gmail.com> wrote:
Anything new will need its own flag to enable/disable anyways (if it is insufficient to leave it to being done at runtime on a per module basis via a sys.xxx() call) as people already rely on today's exact behavior of -O. We should never equate disabling assert statements (a change to the actual program logic) with actual optimization. -gps

We use -O where I work in performance critical places. It also better than halves the startup time, which some users are sensitive to. -- Zachary Burns (407)590-4814 Aim - Zac256FL Production Engineer (Digital Overlord) Zindagi Games

On Mon, Feb 15, 2010 at 2:10 PM, Zac Burns <zac256@gmail.com> wrote:
On second thought, what I just said about halving the startup time isn't true - it's probably something else that we are doing when also applying -O. Sorry about that. It does contribute to better startup time though... just not half. -- Zachary Burns (407)590-4814 Aim - Zac256FL Production Engineer (Digital Overlord) Zindagi Games

I wonder what people have again the -O flag. You are probably missing an important use case: when writing larger applications you typically add lots of debug sections to the code. Using the -O flag, this code doesn't only get disabled, it gets removed and that can lead to a serious optimization, hence the name of the flag. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 15 2010)
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Antoine Pitrou wrote:
The fact that asserts and if __debug__ blocks get skipped under -O strikes me as more than adequate reason to consider optimised and non-optimised bytecode as different things. Sure, we don't mess with things as extensively as C/C++ do with their more aggressive optimisation settings, but completely skipping all assert statements and some if statements isn't something we can enable by default (and can't necessarily do with an after-the-fact stripping tool, since the removal of entire statements can affect the construction of the symbol table). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Le mercredi 03 février 2010 à 06:58 +1000, Nick Coghlan a écrit :
Well, do people rely on this nowadays? With unit testing becoming very common (and recommended) practice, I'm not sure what sprinkling asserts in debug mode really brings.

On Tue, Feb 2, 2010 at 13:05, Antoine Pitrou <solipsis@pitrou.net> wrote:
We might not rely on them, but I am sure there are those who prefer them on top of TDD (or instead of). Removing them would be somewhat forcing a programming methodology on users which I think we should avoid. -Brett

Brett Cannon wrote:
As Curt said, they're a great form of executable comment, unit tests or no unit tests. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Tue, Feb 2, 2010 at 9:42 PM, Brett Cannon <brett@python.org> wrote:
FWIW, I'm one of 'those': I find there are assert uses that can't be covered as neatly or directly with unit tests. They're great for documenting loop invariants in complicated algorithms, for example: def xgcd(b, c): """Extended Euclidean algorithm: return integers g, x and y such that b*x + c*y == g and g is a greatest common divisor of b and c.""" g1, g2 = b, c x1, x2 = 1, 0 y1, y2 = 0, 1 while g2: assert x1 * b + y1 * c == g1 and x2 * b + y2 * c == g2 q = g1 // g2 g1, g2 = g2, g1 - q*g2 x1, x2 = x2, x1 - q*x2 y1, y2 = y2, y1 - q*y2 return g1, x1, y1 (Not that the above counts as a complicated algorithm, of course.) Mark

n Feb 2, 3:16=A0pm, Antoine Pitrou <solip...@pitrou.net> wrote:
We run -O in production so that __debug__ is set to False. But I'd much rather have only .pyc files and a strip tool + a flag (maybe still -O) to switch between __debug__ being True or False (extra bonus points for the strip tool being able to strip out "if __debug__:" blocks too). The reason I'd rather have only .pyc files is that now if you only have .pyo files and don't start with -O you don't find the modules. This is inconsistent as well: inside a zipfile the .pyo files are used both when using -O and when not using that option. Only an inconvenience but still, it's something a customer doens't know if I quickly ask the output of some "python -c 'something'" and forget to tell them to use -O. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org

Antoine Pitrou wrote:
If there was an easy way to get info from python on the status and location of the bytecode files it uses, it might not be so bad.
Then having .pyc also be used for optimized byte code could work. On another note, it always seemed a bit backwards to me that the -OO wasn't the default output and if it were, we would have options to run in debug mode which would include 'docstrings', 'if debug ...' statements, and 'assert ...' statements, for testing purposes. <shrug> Ron

Am 03.02.2010 01:11, schrieb Ron Adam:
Removal of docstrings should be separate from the rest, though. Quite a few programs would stop working properly without docstrings, with the docstring being used as a kind of easily introspectable annotation to a class or function. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
participants (14)
-
Antoine Pitrou
-
Brett Cannon
-
Collin Winter
-
Curt Hagenlocher
-
Floris Bruynooghe
-
Georg Brandl
-
Gregory P. Smith
-
Larry Hastings
-
M.-A. Lemburg
-
Mark Dickinson
-
Mark Roddy
-
Nick Coghlan
-
Ron Adam
-
Zac Burns