Re: [Python-Dev] Re: [Distutils] Questions about distutils strategy

Dec. 9, 1999

      ...
[Guido]
...
...
It's not Mark's fault, it's Microsoft's fault.  If you don't do
things the way MS wants you to, experienced Windows users will
gripe, misunderstand what you do, etc.
[Tim] 
Something just occurred to me:  MS's guidelines aren't arbitrary,
they actually have very good reasons.  In the case of putting all
an app's crucial info in the Registry, it's the only way to allow
a site administrator to set policy and site options remotely (an
admin can fiddle other machines' registries remotely).  This
works very well indeed when there's only "one copy" of an app on
a machine (or at most one copy "per user").
[Gordon]
And actually, the business about separate subtrees for the 
machine's configuration and the user's configuration is pretty 
clever. MS doesn't explain it well, and it gets misused, but 
when done right, it's a lot simpler than the maze of .xxxrc files 
you sometimes find in other OSes.
I agree.  And I am guilty of not even try to find MS' explanation -- I
just looked in the registry at what other apps did and tried to mimic
that (plus what Mark had already done), without really knowing what I
was doing.  I now know a little better -- see the end of this message.
...
In my Linux version, I went to the heart of the matter - 
getpath.c. It occurs to me that getpath.c might do better to 
follow a normal bootstrap process - ie,  create the absolute 
minimal sys.path required to go to the next step. Then the 
rest of what goes on in getpath.c could be written in Python. 
Maybe that Python code needs to get frozen in (to prevent 
bozos from destroying an installation by stepping on 
getpath.py), but it would make it a lot easier to create 
independent installations, and also reduce the variations 
between platforms at the C level. (Then again, I've never heard 
of anyone stepping on exceptions.py.)
Yes, this is exactly what was proposed in the thread on the Big Import
Rewrite.
...
If some registry manipulation primitives were exposed (say, 
through ntpath) that would mean that Windows developers 
could (if they wanted) play by the MS rules with at least the 
option of not stepping on each other.
That's a good idea.  These functions are already available through
Mark's win32api extension -- much of which will eventually (I hope
before 1.6 is out!) become part of the core distribution.

In the mean time, I've been thinking a bit more about how Python
should be using the Windows registry.  (It's clear to me that Python
should use the registry -- those who disagree can go build their own
Python distribution.)

The basic ideas of Python's current registry usage are sound: there's
a resource built into the DLL which is part of the key into the
registry used for all information.

The problem lies in which key is used.  All versions of Python 1.5.x
(1.5, 1.5.1, 1.5.2) use the same key!  This is a main cause of
trouble, because it means that different versions cannot peacefully
live together even if the user installs them into different
directories -- they will all use the registry keys of the last version
installed.  This, in turn, means that someone who writes a Python
application that has a dependency on a particular Python version (and
which application worth distributing doesn't :-) cannot trust that if
a Python installation is present, it is the right one.  But they also
cannot simply bundle the standard installer for the correct Python
version with their program, because its installation would overwrite
an existing Python application, thus breaking some *other* Python apps
that the user might already have installed.

(There's a solution for app builders who are willing to do a lot of
work -- you can change the registry key resource in the DLL.  For
example, Alice comes with its own version of Python 1.5.1 and it uses
"1.5.1-alice" as its registry key.  The Alice installer installs
Python in a subdirectory of the Alice installation directory and
points the 1.5.1-alice registry entries there.  The problem is that
this is a lot of work for the average app builder.)

I thought a bit about how VB solves this.  I think that when you wrap
up a VB app in, all the support code (mostly a big DLL) is wrapped
with it.  When the user runs the installer, the DLL is installed
(probably in the WINDOWS directory).  If a user installs several VB
apps built with the same VB version, they all attempt to install the
exact same DLL; of course the installers notice this and optimize it
away, keeping a reference count.  (Ignoring for now the fact that
those reference counts don't always work!)  If an app builty with a
different VB version is installed, it has a DLL with a different name,
and that is installed separately.  Other support files, I presume, are
dealt with in much the same way.  Voila, there's the theory.

How can we do something similar for Python?

A app written in Python should need to install only three or four
files:

- a driver EXE to start the app
- a copy of the Python DLL
- the Python library in an archive
- the app code in an archive

The latter two could be combined into a single archive, but I propose
that we use two archives so that the DLL and the Python library
archive can be shared between installations of independent Python apps
as long as they use the exact same Python version and don't need
additional 3rd party packages.  (I believe that Jim A's proposal
combines the archives with the EXE and the DLL, reducing the number of
files to two.  That's fine too.)

Is there a use for the registry here at all?  Maybe not.  (I notice
that VB seems to have a single registry entry, pointing to a DLL; all
other VB files also seem to live there.)

Complications:

- Some apps may need a custom extension module, which has to be
  installed as a PYD file.  So it seems that there needs to be a
  directory per app, and perhaps per version of the app (if the app
  distributor cares).

- Some apps need other, non-pyc files (e.g. data tables or help
  files); it would be handy if these could be stored in the archives as
  well.

- Some standard extension modules are in their own PYD files; these
  also need to be installed.  They aren't typically marked with a
  version, so perhaps a path directory per version of Python (if not per
  installed app) is wise.

- How to distribute an app that needs 3rd party stuff, e.g. Tcl/Tk, or
  PIL, or NumPy?  Their Python code can easily be wrapped up in another
  archive with a standard name incorporating a version number; but the
  required PYD and DLL files are a separate story.  (E.g. for Tkinter,
  you need _tkinter.pyd which links against tcl80.dll.)  Basically the
  same solution as for standard PYD files can work; the needed DLL files
  can be installed either systemwide (if they have a reliable version
  number in their name, like tcl80.dll) or in the per-app or per-package
  directory (like NumPy).

- Presumably, the archives will contain PYC files only.  This means
  that tracebacks will not show source code, only line numbers.  For Jim
  A, this is probably exactly what he wants (if the user gets a
  traceback, his "robust app" has miserably failed, and he takes it in
  pride that this doesn't happen).  But for some others, access to the
  sources could be essential.

  For example, I might want to distribute IDLE using this mechanism;
  users of IDLE who are curious about the standard library (or about
  IDLE itself) should be able to open the source for an arbitrary module
  (and maybe even edit it, although that's not a priority and perhaps
  should even be discouraged).  Library source access is an important
  feature of the IDLE debugger as well.

  A way out for IDLE is to install a classic distribution of the Python
  library sources, into the filesystem at an IDLE specific location.
  Other apps, with only the need for source code in tracebacks, might
  choose to to have the PY files in the archives sitting next to the PYC
  files, and somehow the traceback mechanism should be accessing the
  archive to get a hold of the source.

And yes, I realize that Jim A's latest offering solves most of these
problems to a large extent -- well done.  (Jim, would you care to
comment on the issues that you don't address?  Will you address them
in a future version?)

Final notes:

There are two different problems here.  One is how to distribute
Python apps robustly to end users who don't particular care about
Python.  This is Jim A's problem (and he has a solution that works for
him).  In general the solutions here try to isolate the installed app
from other Python installations.  I'm proposing that at least the DLL
and the Python library archive can probably be shared between apps
without reducing robustness if we keep track more carefully of version
numbers.

The other problem is how to distribute packages of Python and
extension modules for use by Python users.  These typically need to
drop into some existing Python installation.  This is Paul Dubois'
problem with NumPy (amongst others) and is the current focus of the
distutil SIG.

However I believe that there could be a lot of common infrastructure
that would help us create better solutions for both problems.  For
package distribution, common infrastructure (a.k.a. standards) is
essential.  For app distribution, common infrastructure isn't so
important (since the solutions strive for total isolation, there's no
problem if different apps use solutions).  However, this changes when
app creators want to distribute robust self-sufficient apps that use
3rd party packages -- then the 3rd party packages must allow being
packaged up using the app distribution creator of choice.

Solving this compound problem (creating package distributions that can
be redistributed easily as part of robust Python app distributions)
should be an important goal for the infrastructure we're building
here.  The Big Import Rewrite ought to add this to its list of
objectives if it isn't already on it.  My guess is that the solution
for this compound problem will increase the dependency of app
distribution tools on the package distribution infrastructure; which
to me seems like a Good Thing because it would lead to more code
sharing.

--Guido van Rossum (home page: http://www.python.org/~guido/)