Patch to have distutils generate PYC *and* PYO files

Attached you find a patch against the current CVS tree which allows distutils to include PYC as well as PYO files.
I haven't dug deep into the option system yet, but it would be nice if there also were an option which lets the packager set the optimization level she want to use for the PYO files.
Currently level 1 is used. Level 2 would also eliminate doc-strings which can sometimes be useful to reduce package size or to hide interface information.

On 20 September 2000, M.-A. Lemburg said:
Attached you find a patch against the current CVS tree which allows distutils to include PYC as well as PYO files.
Looks like it should work, but I really don't like the idea of spawning an interpreter for *every* *single* *file* we want to compile. This sounds very expensive.
Doesn't look like it would be too hard to rework your 'compile()' function to work on a bunch of files at once, and the "install_lib" command already sorta-kinda works that way. The trouble is generating a single Python command that compiles an arbitrary number of scripts -- it's not impossible, just hairy.
Maybe it's time to reconsider when we do byte-compilation. The reasons for doing it at install time were: * so the .py filename encoded in the .pyc file is correct * to make built distributions smaller
But the former is a red herring, since it turns out that we do pseudo-installations to temporary directories to create RPM and wininst built distributions, and will probably do the same for other smart bdist formats.
And the latter seems to be a lower priority than making what happens at install time as simple as possible. Most people (myself included) seem to be more concerned that an RPM come with "batteries included" (ie. .pyc and .pyo) files than that it be as small as possible.
Byte-compiling at build time would also (I think) make it easier to distribute *only* bytecode, for those poor unenlightened souls who somehow think that keeping their source code to themselves will make the world a better place. (Obviously I disagree with those people, but I don't want to bar them from using the Distutils.)
If we byte-compile at build time, then using the standard compileall module/script is a no-brainer: the build directory is purely the domain of this particular module distribution, so if we blindly walk over it compiling everything, that's just fine. (Doing that in the real install directory would be quite rude, which is why the current byte-compilation code doesn't use compileall.py.)
I haven't dug deep into the option system yet, but it would be nice if there also were an option which lets the packager set the optimization level she want to use for the PYO files.
I think the best way to do that would be to support something like
build --optimize # .pyc and .pyo (level 1) build --optimize=2 # .pyc and .pyo (level 2)
unfortunately, getopt (and therefore fancy_getopt) doesn't support default option values. ;-( So this would have to be something else, alas.
I think this is a frill compared to figuring out when to compile and how to get both .pyc and .pyo in there, though.
Greg

Greg Ward wrote:
On 20 September 2000, M.-A. Lemburg said:
Attached you find a patch against the current CVS tree which allows distutils to include PYC as well as PYO files.
Looks like it should work, but I really don't like the idea of spawning an interpreter for *every* *single* *file* we want to compile. This sounds very expensive.
It's not as expensive as it sounds... on my machine I can't even notice a difference.
Doesn't look like it would be too hard to rework your 'compile()' function to work on a bunch of files at once, and the "install_lib" command already sorta-kinda works that way. The trouble is generating a single Python command that compiles an arbitrary number of scripts -- it's not impossible, just hairy.
Right. It should be possible by providing a helper in distutils which is then used instead of py_compile.
Maybe it's time to reconsider when we do byte-compilation. The reasons for doing it at install time were:
- so the .py filename encoded in the .pyc file is correct
- to make built distributions smaller
But the former is a red herring, since it turns out that we do pseudo-installations to temporary directories to create RPM and wininst built distributions, and will probably do the same for other smart bdist formats.
And the latter seems to be a lower priority than making what happens at install time as simple as possible. Most people (myself included) seem to be more concerned that an RPM come with "batteries included" (ie. .pyc and .pyo) files than that it be as small as possible.
Right. I don't care about the RPM size at all: bandwidth is there, harddisk's are cheap. Size is not as much an argument anymore as it was some years ago.
Byte-compiling at build time would also (I think) make it easier to distribute *only* bytecode, for those poor unenlightened souls who somehow think that keeping their source code to themselves will make the world a better place. (Obviously I disagree with those people, but I don't want to bar them from using the Distutils.)
Ah, you are forgetting about the few of us who have to make a living by writing commercial closed source software, e.g. my apps are only shipped with PYO files with doc-strings stripped.
Most of the underlying helpers are Open Source though, ie. the mx Extensions were written for just this purpose -- not only to get a warm fuzzy feeling from community feedback ;-)
If we byte-compile at build time, then using the standard compileall module/script is a no-brainer: the build directory is purely the domain of this particular module distribution, so if we blindly walk over it compiling everything, that's just fine. (Doing that in the real install directory would be quite rude, which is why the current byte-compilation code doesn't use compileall.py.)
Hmm, the existing logic works just fine -- why change it ?
I haven't dug deep into the option system yet, but it would be nice if there also were an option which lets the packager set the optimization level she want to use for the PYO files.
I think the best way to do that would be to support something like
build --optimize # .pyc and .pyo (level 1) build --optimize=2 # .pyc and .pyo (level 2)
unfortunately, getopt (and therefore fancy_getopt) doesn't support default option values. ;-( So this would have to be something else, alas.
But it does support -O and -OO -- which is what Python itself uses. Wouldn't that be an option ?
If not, I'd also be satisfied with --optimize=0|1|2 without default value.
I think this is a frill compared to figuring out when to compile and how to get both .pyc and .pyo in there, though.
Not necessarily... some people might not want to get doc-strings into production code.

On Fri, 22 Sep 2000, M.-A. Lemburg wrote:
Greg Ward wrote:
Looks like it should work, but I really don't like the idea of spawning an interpreter for *every* *single* *file* we want to compile. This sounds very expensive.
It's not as expensive as it sounds... on my machine I can't even notice a difference.
I suggest that this is probably a result of the interpreter remaining in the buffer cache. On some of the newer drives with large on-board memory caches, you probably wouldn't even see the disk light blink.
All machines not being equal, though, it would be very expensive on (even slightly) older hardware. If there's another way, it's probably better.
$.02 Mark

On 22 September 2000, M.-A. Lemburg said:
It's not as expensive as it sounds... on my machine I can't even notice a difference.
I tried it on my machine: spawning a new interpreter to byte-compile every file in the Distutils is nearly a 100% penalty: 13 sec vs a little under 7 sec. So I think it's worth the trouble.
Right. I don't care about the RPM size at all: bandwidth is there, harddisk's are cheap. Size is not as much an argument anymore as it was some years ago.
I mostly agree, but turning off byte-compilation should be an option. (Certainly not the default!)
Byte-compiling at build time would also (I think) make it easier to distribute *only* bytecode, for those poor unenlightened souls who somehow think that keeping their source code to themselves will make the world a better place. (Obviously I disagree with those people, but I don't want to bar them from using the Distutils.)
Ah, you are forgetting about the few of us who have to make a living by writing commercial closed source software, e.g. my apps are only shipped with PYO files with doc-strings stripped.
No, I am proposing to change the situation to accomodate people who only distribute byte-code. Right now, all built distributions always include source, because they must pseudo-install the source in order to compile it. If we do compilation at build-time, that should make it easier to distribute only byte-code. If you want to distribute only byte-code, you'd have to sneak some code into the appropriate bdist commands to delete source files after pseudo-installation but before constructing the archive.
I still think software wants to be free, but I'm not going to shove that philosophy down anyone's throat. ;-)
Most of the underlying helpers are Open Source though, ie. the mx Extensions were written for just this purpose -- not only to get a warm fuzzy feeling from community feedback ;-)
If we byte-compile at build time, then using the standard compileall module/script is a no-brainer: the build directory is purely the domain of this particular module distribution, so if we blindly walk over it compiling everything, that's just fine. (Doing that in the real install directory would be quite rude, which is why the current byte-compilation code doesn't use compileall.py.)
Hmm, the existing logic works just fine -- why change it ?
I can see a couple reasons to change compilation to build time:
* it just seems right: installation should be limited to copying files around, possibly setting modes, and nothing more. More complicated stuff, including compilation, should be done at *build* time... that's why it's called "build" time!
* should make it easier to support closed source distributions
And one good reason not to change it (apart from inertia):
* makes it trickier to put the right source filename into the .pyc file (not that that's being done currently, as you can see from a traceback in code installed by a Distutils-generated RPM file; the .pyc files refer to /var/tmp/.../usr/lib/python1.5/... rather than /usr/lib/python1.5/...)
Am I missing anything?
Greg

Maybe it's time to reconsider when we do byte-compilation. The reasons for doing it at install time were:
- so the .py filename encoded in the .pyc file is correct
- to make built distributions smaller
But the former is a red herring, since it turns out that we do pseudo-installations to temporary directories to create RPM and wininst built distributions, and will probably do the same for other smart bdist formats.
Not exactly true: wininst switches off compilation in the install_lib command which runs at build (pseudo installation) time: install_lib.compile = 0 install_lib.optimize = 0 and compiles at install time both to pyc and pyo, which is correct IMO, because we now have the right filenames in the compiled files.
Of course, since the windows installer embeds python, I have a backdoor into the interpreter :-)
Thomas
participants (4)
-
Greg Ward
-
M.-A. Lemburg
-
Mark W. Alexander
-
Thomas Heller