
Hi all -- based on and inspired by recent patches from Marc-Andre Lemburg and Rene Liebscher, I've finally started tackling the byte-compilation problem in earnest. Here's the approach I'm taking: * new function 'byte_compile()' in distutils.util: this is the all- singing, all-dancing wrapper around py_compile that will do all the real work * reduce the 'bytecompile()' method in the install_lib command to a simple wrapper around 'util.byte_compile()', that does the Right Thing with respect to optimization and claimed source filename written to the .py{c,o} file * add similar functionality to the build_py command, so that you may optionally do byte-compilation at build time rather than install time. The first two steps are done and checked in, except that install_lib's 'bytecompile()' method doesn't yet take advantage of the fancy features in the new 'byte_compile()' -- it doesn't rewrite filenames or do optimization. The default will continue to be doing compilation at install time rather than build time. I'm still leaning towards build-time compilation, but it's too late in the Distutils 1.0 release cycle to change things like this. However, I want to have the *option* to do compilation at build time, so people can experiment with it, see if it works, figure out what other features are needed so it really works, etc. The idea is that developers could put settings in their setup.cfg that control when to do byte-compilation; I suspect developers who want to distribute closed-source modules will have to do build-time compilation. Probably the "install" command will need some sort of "don't install source" option, or maybe the build command should have a "blow away source after compiling it" option. Here's my 'byte_compile()' function: as usual, it works for me. Please review it, and if you're following CVS, try it out. (Should be enough to install any module distribution containing pure Python modules.) ------------------------------------------------------------------------ def byte_compile (py_files, optimize=0, force=0, prefix=None, base_dir=None, verbose=1, dry_run=0, direct=None): """Byte-compile a collection of Python source files to either .pyc or .pyo files in the same directory. 'optimize' must be one of the following: 0 - don't optimize (generate .pyc) 1 - normal optimization (like "python -O") 2 - extra optimization (like "python -OO") If 'force' is true, all files are recompiled regardless of timestamps. The source filename encoded in each bytecode file defaults to the filenames listed in 'py_files'; you can modify these with 'prefix' and 'basedir'. 'prefix' is a string that will be stripped off of each source filename, and 'base_dir' is a directory name that will be prepended (after 'prefix' is stripped). You can supply either or both (or neither) of 'prefix' and 'base_dir', as you wish. If 'verbose' is true, prints out a report of each file. If 'dry_run' is true, doesn't actually do anything that would affect the filesystem. Byte-compilation is either done directly in this interpreter process with the standard py_compile module, or indirectly by writing a temporary script and executing it. Normally, you should let 'byte_compile()' figure out to use direct compilation or not (see the source for details). The 'direct' flag is used by the script generated in indirect mode; unless you know what you're doing, leave it set to None. """ # First, if the caller didn't force us into direct or indirect mode, # figure out which mode we should be in. We take a conservative # approach: choose direct mode *only* if the current interpreter is # in debug mode and optimize is 0. If we're not in debug mode (-O # or -OO), we don't know which level of optimization this # interpreter is running with, so we can't do direct # byte-compilation and be certain that it's the right thing. Thus, # always compile indirectly if the current interpreter is in either # optimize mode, or if either optimization level was requested by # the caller. if direct is None: direct = (__debug__ and optimize == 0) # "Indirect" byte-compilation: write a temporary script and then # run it with the appropriate flags. if not direct: from tempfile import mktemp script_name = mktemp(".py") if verbose: print "writing byte-compilation script '%s'" % script_name if not dry_run: script = open(script_name, "w") script.write("""\ from distutils.util import byte_compile files = [ """) script.write(string.join(map(repr, py_files), ",\n") + "]\n") script.write(""" byte_compile(files, optimize=%s, force=%s, prefix=%s, base_dir=%s, verbose=%s, dry_run=0, direct=1) """ % (`optimize`, `force`, `prefix`, `base_dir`, `verbose`)) script.close() cmd = [sys.executable, script_name] if optimize == 1: cmd.insert(1, "-O") elif optimize == 2: cmd.insert(1, "-OO") spawn(cmd, verbose=verbose, dry_run=dry_run) # "Direct" byte-compilation: use the py_compile module to compile # right here, right now. Note that the script generated in indirect # mode simply calls 'byte_compile()' in direct mode, a weird sort of # cross-process recursion. Hey, it works! else: from py_compile import compile for file in py_files: if file[-3:] != ".py": raise ValueError, \ "invalid filename: %s doesn't end with '.py'" % `file` # Terminology from the py_compile module: # cfile - byte-compiled file # dfile - purported source filename (same as 'file' by default) cfile = file + (__debug__ and "c" or "o") dfile = file if prefix: if file[:len(prefix)] != prefix: raise ValueError, \ ("invalid prefix: filename %s doesn't start with %s" % (`file`, `prefix`)) dfile = dfile[len(prefix):] if base_dir: dfile = os.path.join(base_dir, dfile) cfile_base = os.path.basename(cfile) if direct: if force or newer(file, cfile): if verbose: print "byte-compiling %s to %s" % (file, cfile_base) if not dry_run: compile(file, cfile, dfile) else: if verbose: print "skipping byte-compilation of %s to %s" % \ (file, cfile_base) ------------------------------------------------------------------------ -- Greg Ward gward@python.net http://starship.python.net/~gward/

Greg Ward wrote:
The latter is not very useful, IMHO. I will definitely need the "compile at build time and don't argue about not finding the sources at install time" option ;-)
Looks ok , except that I would pass the Python filenames through os.path.abspath() before writing any externally run scripts... both to work around possible security problems and to make sure the shell finds the right files.
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[I propose a way to deal with closed-source distribution]
[Marc-Andre comments]
The latter option -- "blow away source after compiling it" -- was proposed because it would be easier to implement than "don't install source". The "install_*" commands generally boil down to a recursive copy from the build tree to some installation director(y|ies). The less work done there, the better, as figuring out the installation dirs is hard enough. (Hence my preference for compiling at build-time.) If we do things this way: * copy source into build/lib * byte-compile source * blow it away * install it then the install commands remain fairly simple. If not, we have to do something to exclude .py files from the "install" recursive copy.
Hmmm, that's probably wise. [ ... far too much time passes ... ] No wait, it breaks the "dfile" argument in certain circumstances, so that .pyc files created at build time have the wrong source filename encoded in them. Bother. This will be tricky to fix, so I'm gonna punt on it. Greg -- Greg Ward gward@python.net http://starship.python.net/~gward/

Greg Ward wrote:
Uhm, I only wanted to prevent the install command from producing errors in case it cannot find the .py files to install (it should suffice just being able to copy the .pyc/o files). Basically, the .py files should be in the source archive and be used to build the binaries. The binaries should then optionally only include the .pyc/o files and the install command or RPM shouldn't care much about not finding .py files...
Ok... but please make sure that the temporary script uses an absolute name and that the name of that file cannot be guessed. Otherwise, you'd open up a /tmp security problem here which could be used to trick distutils into executing code which wasn't generated by it. Perhaps you should pipe the program text to a Python interpreter instead... this would be the most secure option. We really need a sys.set/getoptimization() API in 2.1... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On 03 October 2000, M.-A. Lemburg said:
OK -- *poof!* -- it works. Time machine. It's *always* been like that; the "install" command just installs whatever it finds in the various subdirectories of build/. If you delete the .py files in build, and then install *without building* again, then you have a no-source-code installation. Example, from my distutils development dir: rm -rf /usr/lib/python1.5/site-packages/distutils python setup.py clean -a python setup.py build_py --compile build rm `find build/ -name '*.py'` python setup.py install --skip-build That works: I now have a closed source Distutils installation. If I provoke a traceback (by setting DISTUTILS_DEBUG=1 and providing a bad directory), I get: Traceback (innermost last): File "setup.py", line 55, in ? ['Src/arrayfnsmodule.c']) File "./distutils/core.py", line 138, in setup File "./distutils/dist.py", line 829, in run_commands File "./distutils/dist.py", line 849, in run_command File "./distutils/command/build.py", line 106, in run File "/usr/lib/python1.5/cmd.py", line 328, in run_command File "./distutils/dist.py", line 849, in run_command File "./distutils/command/build_py.py", line 104, in run File "./distutils/command/build_py.py", line 371, in build_packages File "./distutils/command/build_py.py", line 333, in build_module File "/usr/lib/python1.5/cmd.py", line 358, in mkpath File "./distutils/dir_util.py", line 80, in mkpath distutils.errors.DistutilsFileError: could not create '/foo': Permission denied The filenames in the .pyc files are less than helpful (they're relative to build/lib in my development tree), but who cares? there're no source files installed here anyways! Is it obvious? Not really. Does it work? You bet! Could you sit down right now and churn out a closed-source RPM? I don't think so -- bdist_rpm doesn't let you supply arbitrary options to the "install" or "build" commands, and I don't think it should. You'd have to extend bdist_rpm to do this. Is this the right way to do closed source distributions? Well, to do it cleanly would require a "--remove-source" option to "build_py", and then support from bdist_{rpm,wininst,...}. Not impossible by any stretch. Who else out there wants this functionality? Should the ability to distribute closed-source module distributions be standard? Or should people who want it be required to extend bdist_{rpm,wininst,...} if they need it? trying-so-hard-not-to-say-"greedy closed source bastards!"-because- you-know-I-don't-really-mean-it, Greg -- Greg Ward gward@python.net http://starship.python.net/~gward/

Greg Ward wrote:
The latter is not very useful, IMHO. I will definitely need the "compile at build time and don't argue about not finding the sources at install time" option ;-)
Looks ok , except that I would pass the Python filenames through os.path.abspath() before writing any externally run scripts... both to work around possible security problems and to make sure the shell finds the right files.
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[I propose a way to deal with closed-source distribution]
[Marc-Andre comments]
The latter option -- "blow away source after compiling it" -- was proposed because it would be easier to implement than "don't install source". The "install_*" commands generally boil down to a recursive copy from the build tree to some installation director(y|ies). The less work done there, the better, as figuring out the installation dirs is hard enough. (Hence my preference for compiling at build-time.) If we do things this way: * copy source into build/lib * byte-compile source * blow it away * install it then the install commands remain fairly simple. If not, we have to do something to exclude .py files from the "install" recursive copy.
Hmmm, that's probably wise. [ ... far too much time passes ... ] No wait, it breaks the "dfile" argument in certain circumstances, so that .pyc files created at build time have the wrong source filename encoded in them. Bother. This will be tricky to fix, so I'm gonna punt on it. Greg -- Greg Ward gward@python.net http://starship.python.net/~gward/

Greg Ward wrote:
Uhm, I only wanted to prevent the install command from producing errors in case it cannot find the .py files to install (it should suffice just being able to copy the .pyc/o files). Basically, the .py files should be in the source archive and be used to build the binaries. The binaries should then optionally only include the .pyc/o files and the install command or RPM shouldn't care much about not finding .py files...
Ok... but please make sure that the temporary script uses an absolute name and that the name of that file cannot be guessed. Otherwise, you'd open up a /tmp security problem here which could be used to trick distutils into executing code which wasn't generated by it. Perhaps you should pipe the program text to a Python interpreter instead... this would be the most secure option. We really need a sys.set/getoptimization() API in 2.1... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On 03 October 2000, M.-A. Lemburg said:
OK -- *poof!* -- it works. Time machine. It's *always* been like that; the "install" command just installs whatever it finds in the various subdirectories of build/. If you delete the .py files in build, and then install *without building* again, then you have a no-source-code installation. Example, from my distutils development dir: rm -rf /usr/lib/python1.5/site-packages/distutils python setup.py clean -a python setup.py build_py --compile build rm `find build/ -name '*.py'` python setup.py install --skip-build That works: I now have a closed source Distutils installation. If I provoke a traceback (by setting DISTUTILS_DEBUG=1 and providing a bad directory), I get: Traceback (innermost last): File "setup.py", line 55, in ? ['Src/arrayfnsmodule.c']) File "./distutils/core.py", line 138, in setup File "./distutils/dist.py", line 829, in run_commands File "./distutils/dist.py", line 849, in run_command File "./distutils/command/build.py", line 106, in run File "/usr/lib/python1.5/cmd.py", line 328, in run_command File "./distutils/dist.py", line 849, in run_command File "./distutils/command/build_py.py", line 104, in run File "./distutils/command/build_py.py", line 371, in build_packages File "./distutils/command/build_py.py", line 333, in build_module File "/usr/lib/python1.5/cmd.py", line 358, in mkpath File "./distutils/dir_util.py", line 80, in mkpath distutils.errors.DistutilsFileError: could not create '/foo': Permission denied The filenames in the .pyc files are less than helpful (they're relative to build/lib in my development tree), but who cares? there're no source files installed here anyways! Is it obvious? Not really. Does it work? You bet! Could you sit down right now and churn out a closed-source RPM? I don't think so -- bdist_rpm doesn't let you supply arbitrary options to the "install" or "build" commands, and I don't think it should. You'd have to extend bdist_rpm to do this. Is this the right way to do closed source distributions? Well, to do it cleanly would require a "--remove-source" option to "build_py", and then support from bdist_{rpm,wininst,...}. Not impossible by any stretch. Who else out there wants this functionality? Should the ability to distribute closed-source module distributions be standard? Or should people who want it be required to extend bdist_{rpm,wininst,...} if they need it? trying-so-hard-not-to-say-"greedy closed source bastards!"-because- you-know-I-don't-really-mean-it, Greg -- Greg Ward gward@python.net http://starship.python.net/~gward/
participants (2)
-
Greg Ward
-
M.-A. Lemburg