Jim Ahlstrom wrote:
> I got a little time to think about this over the weekend
> and propose this design. It is a way to package *.pyc files
> into a single-file portable archive for distribution with
> commercial, CGI (like Marc is doing) or otherwise simplified
> distributions. It is dumb-stupid-simple, my personal favorite and a
> requirement for commercial software. This is not concerned with an
> "installer", which is a separate problem. Few of these ideas are
…
[View More]> mine.
Sigh. Most of this is in my installer.
> 1. There is a Python library file format (PYL) which can
> hold *.pyc files. The format is:
> a.pyc, b.pyc, marshalled dict, offset of dict, magic number.
Right (except I put magic before offset). I've got 2. One is just like
this, except everything is zlib-ed. There's another where you've got
aribtrary chunks of bytes
chunk1, chunk2, ... table-of-contents, magic, offset
where table-of-contents is fairly easily read and written in either
Python or C, and contains additional information (like whether the
chunk needs compressing / decompressing, and a filename...).
> The a.pyc, b.pyc,... are the bytes of the *.pyc files including the
> eight byte header. The dict has keys "name", values are the seek
> offset of the start of the *.pyc files. The "offset of dict" is the
> seek offset for finding the dict. The magic number identifies the
> file as a PYL file.
>
> The "names" can be normal names such as "os", or dotted names
> such as "foo.bar". Although initially devoted to *.pyc files,
> we note that it is possible to include other resources with
> non-module names such as "more*data".
My first kind is confined to compiled python. The 2nd can have
anything.
> A PYL file is portable to any computer, Unix, Windows, etc.
Yup. Both kinds.
> Compression is not included, and should be done at the
> "installer" level if desired.
Actually, because the .pyz file is always open and you don't have to
stat anything, I find that it's faster even with decompression than
the normal import stuff.
> 2. The PYL file always has the same directory and file name
> as the binary module containing import.c. but with a .pyl ending:
>
> Python Binary PYL File Name
> /usr/local/bin/python /usr/local/bin/python.pyl
> /my/so/dir/python.so /my/so/dir/python.pyl
> C:/python/python15.dll C:/python/python15.pyl
You put anything in one of my .pyz's, (packages or modules) and they
just look like they're on your pythonpath.
> These Python binary names are available as sys.executable (the main)
> and sys.dllfullpath (the shared library or DLL).
Not sure what you're getting at. You can find the name of the .pyz by
asking the importer (an attribute stuck onto the imported module).
> 3) Since the PYL file can be efficiently read backwards, it
> can, if desired, be appended the the Python binary itself:
> cat python15.pyl >> python15.dll
Never tried it on the dll. I do it with the .exe.
> 4) The PYL file is created with the Python program makepyl.py
> and no C compiler is necessary.
Right. It's called archivebuilder. Pass it module names, package
names, directory names...
> 5) There is a new optional built-in module "importer" which may be
> included by editing Setup. It is imported in Py_Initialize() after
> "sys" is set up, but before any other imports such as "exceptions".
> It is not an error if it is absent. If present, it replaces
> __import__ just like imputils.py does. The replacement importer
> searches for PYL files as a .pyl file, in the current
> sys.executable, and in the current sys.dllfullpath (name of DLL).
> Note that importer can use multiple PYL files. Importer is able to
> import the modules exceptions, site and sitecustomize, and probably
> most other modules. Importer has methods to print out the names of
> PYL modules available. You could still override importer using
> sitecustomize and imputils if desired, in which case it may be
> convenient to use importer's methods.
>
> 6) Alternative to (5): Modules exceptions, site, sitecustomize and
> imputils are frozen (using frozen modules) into the interpreter
> binary, and sitecustomize boots imputils. Thereafter the Python
> logic in sitecustomize and imputils implements the logic of (5).
> Sitecustomize has methods to print out the names of PYL modules
> available.
I don't muck with core python at all. That means you need exceptions
and site, with site using imputil to load the .pyzs. I'm hoping
imputil attains sanctified status, so this happens in only one step.
> 7) The Python main program is enhanced to start "__main__" as
> the main program if it can be imported, in which case all command
> line arguments are sent to __main__. This enables you to create a
> Python program "myprog", and start it with the command "myprog -a
> arg1 ..." just like a normal program.
Yup. For Windows, I include a bunch of different .exes (all from one
source) which are just python.exe / pythonw.exe with some added smarts
about archives (the 2nd kind). Also not linked against python's import
lib, so they don't have to have python.dll in place before they start.
> 8) The Python main can start any module "foo" (which may be in a
> PYL file) as the main program in response to a new command line
> argument. This enables you to ship multiple main programs in the PYL
> file.
Not really necessary to build that in. Your __main__ script can do it.
> 9) The current frozenmain.c is eliminated and the enhanced main is
> used instead. This (I hope) results in a net decrease in code.
A full python/Lib .pyz occupies less than 500K. Incidentally, I
packed up your demo05.py (from wpy). For some reason, on my NT
system, I end up using the Tk version of wpy. At any rate, with all
the Tcl/Tk, it still comes out to about 1.1Meg. Runs on a system
without any Python / Tcl / Tk, but not perfectly - you seem to do some
funny things in wpy.
- Gordon
[View Less]
I think I finally understand the work Greg, Gordon and the distutils
sig folks have done with building the Python library into the
Python binary. But I am still having problems. I still think
we may need to be able to have multiple frozen module arrays.
When Python starts up (Py_Initialize()) it:
Performs an import of exceptions.py.
Calls initsigs().
The implication is that someday something_else.py may be imported
here.
Calls initmain().
Performs an import of site.py.
Greg's (…
[View More]wonderful) imputil.py must be "turned on" in site.py. So
the conclusion is that the default Python import logic can not
be replaced until after it has been used to find exceptions.py and
site.py. This seems to be unfortunate. It would be nice
to replace it for all modules.
It would be nice if all this were identical on Unix and Windows,
and that no C compiler were required. And that Python versions
could be changed by replacing python15.dll/.so. And that having
a frozen "__main__" still worked...
Here are a few ideas:
1) Use freeze to include exceptions.py, site.py and imputil.py
in the binary. Frozen modules are always (???) found before
identical modules on PYTHONPATH/Registry. There is no
chicken-and-egg problem because site.py has already been
hacked to turn on imputil.py and to provide for custom
imports.
This works now. But there can be only one frozen module array,
thus my suggestion to allow multiple arrays. OTOH, maybe this
is OK since imports are now customized. But we are taking away
the freeze feature including frozen "__main__".
2) Declare that imputil.py is part of the distribution, and hack
Py_Initialize() to turn it on before anything is imported.
I am not sure how to solve the chicken-and-egg problem.
Perhaps sys.executable could be used as the initial search
path in Py_Initialize(), and the regular path be used iff
sys.executable fails. At least on Windows 95 and later,
sys.executable is the highly reliable path to the Python
interpreter binary. I am not sure how reliable it is on Unix.
The idea is that site.py (etc.) is always in the same directory
as the binary.
For this, we need to guarantee that *.py in the directory of
sys.executable is always found first. This is not currently
the case.
3) Somehow have the user create special built-in modules which
actually contain the *.pyc code, and hack Py_Initialize() to
load and initialize them first.
4) Use pickle to create the magic file "python.py0" which contains
site.py etc. in a standard format, I guess a dictionary.
Hack Py_Initialize() to load its modules if it is located in
the directory of sys.executable. So the start-up Python code
lives in "python.py0" with its binary. Maybe we could
automagically load python.py1, 2, ... too thus providing
a hook for the user to add frozen code without a compiler.
sys.executable must be bullet-proof. Does the executable
mean python.exe or python15.dll?? Both??
Another problem is that tools/Freeze/*.py seems to require one of
the frozen modules to be named "__main__" and thus be executed on
start up. This means we can't put all this into python15.dll. It
seems that (on Windows) exceptions.py and imputils.py should go into
python15.dll, and not into python.exe. Site.py might go into either
python15.dll or python.exe.
FWIW, I am currently using three frozen module arrays. Site.py,
exceptions.py and the Python library goes in python15.dll. My
WPY modules go in python.exe, and the application main modules
go in another DLL. This enables replacing python15.dll to update
Python. But I am not that happy with this scheme.
Jim Ahlstrom
[View Less]
Hi all --
as promised earlier in the week, I have completed the beginning steps
along the road to a 'build_ext' command that will work under Unix. The
legions of eager developers who watch every movement on the
distutils-checkins list will know that I've just added two modules,
ccompiler and unixccompiler, which provide the classes CCompiler and
UnixCCompiler.
The basic idea is this: CCompiler defines the interface to a generic
C/C++ compiler, and UnixCCompiler implements an interface to the
…
[View More]traditional Unix "cc -Dmacro -Iincludedir -Umacro -c foo.c -o foo.o"
compiler invocation (and -l/-L linker invocation). So far all it does
is generate and print out command lines, but that's enough to convince
me that it works on my Linux/gcc system, i.e. it generates the command
lines I intended it to generate, and gets the right
preprocessor/compiler/linker flags from Python's Makefile (so that the
basic compile/link steps are essentially the same as would be done by a
Makefile.pre.in-generated Makefile).
Please, take a look at the code. To encourage this, you'll find the
bulk of ccompiler.py below: it's mostly comments and docstrings, since
after all it mostly exists to define an interface. It's crucial that
this interface be capable of what we need to build Python extensions on
Unix, Windows, and Mac OS, and I'm relying on you folks to tell me what
needs to be added to support Windows and Mac OS. (Well, if I missed
something for Unix, be sure to tell me about that too. That's less
likely, though, and I'll have a clue what you're talking about when
politely inform me of my errors.)
In particular: is this interface sufficient to handle Windows .def
files? Is it enough for the weird case of using Oracle's libraries that
Greg Stein mentioned (or other libraries with hairy interdependencies)?
What Mac C compiler is supported, and is there a way to drive it
programmatically? (Ie. is this even *possible* on the Mac?)
Oh, if you're looking for some example code: see test/test_cc.py. Gives
whatever CCompiler class applies on your platform a run for its money.
(Currently only works when os.name == 'posix', since so far only
UnixCCompiler is implemented.)
Anyways, here's that hunk of ccompiler.py:
class CCompiler:
"""Abstract base class to define the interface that must be implemented
by real compiler abstraction classes. Might have some use as a
place for shared code, but it's not yet clear what code can be
shared between compiler abstraction models for different platforms.
The basic idea behind a compiler abstraction class is that each
instance can be used for all the compile/link steps in building
a single project. Thus, attributes common to all of those compile
and link steps -- include directories, macros to define, libraries
to link against, etc. -- are attributes of the compiler instance.
To allow for variability in how individual files are treated,
most (all?) of those attributes may be varied on a per-compilation
or per-link basis."""
# XXX things not handled by this compiler abstraction model:
# * client can't provide additional options for a compiler,
# e.g. warning, optimization, debugging flags. Perhaps this
# should be the domain of concrete compiler abstraction classes
# (UnixCCompiler, MSVCCompiler, etc.) -- or perhaps the base
# class should have methods for the common ones.
# * can't put output files (object files, libraries, whatever)
# into a separate directory from their inputs. Should this be
# handled by an 'output_dir' attribute of the whole object, or a
# parameter to the compile/link_* methods, or both?
# * can't completely override the include or library searchg
# path, ie. no "cc -I -Idir1 -Idir2" or "cc -L -Ldir1 -Ldir2".
# I'm not sure how widely supported this is even by POSIX
# compilers, much less on other platforms. And I'm even less
# sure how useful it is; probably for cross-compiling, but I
# have no intention of supporting that.
# * can't do really freaky things with the library list/library
# dirs, e.g. "-Ldir1 -lfoo -Ldir2 -lfoo" to link against
# different versions of libfoo.a in different locations. I
# think this is useless without the ability to null out the
# library search path anyways.
# * don't deal with verbose and dry-run flags -- probably a
# CCompiler object should just drag them around the way the
# Distribution object does (either that or we have to drag
# around a Distribution object, which is what Command objects
# do... but might be kind of annoying)
[...]
# -- Bookkeeping methods -------------------------------------------
def define_macro (self, name, value=None):
"""Define a preprocessor macro for all compilations driven by
this compiler object. The optional parameter 'value' should be
a string; if it is not supplied, then the macro will be defined
without an explicit value and the exact outcome depends on the
compiler used (XXX true? does ANSI say anything about this?)"""
def undefine_macro (self, name):
"""Undefine a preprocessor macro for all compilations driven by
this compiler object. If the same macro is defined by
'define_macro()' and undefined by 'undefine_macro()' the last
call takes precedence (including multiple redefinitions or
undefinitions). If the macro is redefined/undefined on a
per-compilation basis (ie. in the call to 'compile()'), then
that takes precedence."""
def add_include_dir (self, dir):
"""Add 'dir' to the list of directories that will be searched
for header files. The compiler is instructed to search
directories in the order in which they are supplied by
successive calls to 'add_include_dir()'."""
def set_include_dirs (self, dirs):
"""Set the list of directories that will be searched to 'dirs'
(a list of strings). Overrides any preceding calls to
'add_include_dir()'; subsequence calls to 'add_include_dir()'
add to the list passed to 'set_include_dirs()'. This does
not affect any list of standard include directories that
the compiler may search by default."""
def add_library (self, libname):
"""Add 'libname' to the list of libraries that will be included
in all links driven by this compiler object. Note that
'libname' should *not* be the name of a file containing a
library, but the name of the library itself: the actual filename
will be inferred by the linker, the compiler, or the compiler
abstraction class (depending on the platform).
The linker will be instructed to link against libraries in the
order they were supplied to 'add_library()' and/or
'set_libraries()'. It is perfectly valid to duplicate library
names; the linker will be instructed to link against libraries
as many times as they are mentioned."""
def set_libraries (self, libnames):
"""Set the list of libraries to be included in all links driven
by this compiler object to 'libnames' (a list of strings).
This does not affect any standard system libraries that the
linker may include by default."""
def add_library_dir (self, dir):
"""Add 'dir' to the list of directories that will be searched for
libraries specified to 'add_library()' and 'set_libraries()'.
The linker will be instructed to search for libraries in the
order they are supplied to 'add_library_dir()' and/or
'set_library_dirs()'."""
def set_library_dirs (self, dirs):
"""Set the list of library search directories to 'dirs' (a list
of strings). This does not affect any standard library
search path that the linker may search by default."""
def add_link_object (self, object):
"""Add 'object' to the list of object files (or analogues, such
as explictly named library files or the output of "resource
compilers") to be included in every link driven by this
compiler object."""
def set_link_objects (self, objects):
"""Set the list of object files (or analogues) to be included
in every link to 'objects'. This does not affect any
standard object files that the linker may include by default
(such as system libraries)."""
# -- Worker methods ------------------------------------------------
# (must be implemented by subclasses)
def compile (self,
sources,
macros=None,
includes=None):
"""Compile one or more C/C++ source files. 'sources' must be
a list of strings, each one the name of a C/C++ source
file. Return a list of the object filenames generated
(one for each source filename in 'sources').
'macros', if given, must be a list of macro definitions. A
macro definition is either a (name, value) 2-tuple or a (name,)
1-tuple. The former defines a macro; if the value is None, the
macro is defined without an explicit value. The 1-tuple case
undefines a macro. Later definitions/redefinitions/
undefinitions take precedence.
'includes', if given, must be a list of strings, the directories
to add to the default include file search path for this
compilation only."""
pass
# XXX this is kind of useless without 'link_binary()' or
# 'link_executable()' or something -- or maybe 'link_static_lib()'
# should not exist at all, and we just have 'link_binary()'?
def link_static_lib (self,
objects,
output_libname,
libraries=None,
library_dirs=None):
"""Link a bunch of stuff together to create a static library
file. The "bunch of stuff" consists of the list of object
files supplied as 'objects', the extra object files supplied
to 'add_link_object()' and/or 'set_link_objects()', the
libraries supplied to 'add_library()' and/or
'set_libraries()', and the libraries supplied as 'libraries'
(if any).
'output_libname' should be a library name, not a filename;
the filename will be inferred from the library name.
'library_dirs', if supplied, should be a list of additional
directories to search on top of the system default and those
supplied to 'add_library_dir()' and/or 'set_library_dirs()'."""
pass
# XXX what's better/more consistent/more universally understood
# terminology: "shared library" or "dynamic library"?
def link_shared_lib (self,
objects,
output_libname,
libraries=None,
library_dirs=None):
"""Link a bunch of stuff together to create a shared library
file. Has the same effect as 'link_static_lib()' except
that the filename inferred from 'output_libname' will most
likely be different, and the type of file generated will
almost certainly be different."""
pass
def link_shared_object (self,
objects,
output_filename,
libraries=None,
library_dirs=None):
"""Link a bunch of stuff together to create a shared object
file. Much like 'link_shared_lib()', except the output
filename is explicitly supplied as 'output_filename'."""
pass
# class CCompiler
Hope you enjoyed that as much as I did. ;-)
Greg
--
Greg Ward - software developer gward(a)cnri.reston.va.us
Corporation for National Research Initiatives
1895 Preston White Drive voice: +1-703-620-8990
Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913
[View Less]