
I think you can get 90% of where you want to be with something much simpler. And the simpler implementation will be useful in the 100% solution, so it is not wasted time.
Agreed, but I'm not sure that it addresses the problems that started this thread. I can't really tell, since the message starting the thread just requested imputil, without saying which parts of it were needed. A followup claimed that imputil was a fine prototype but too slow for real work. I inferred that flexibility was requested. But maybe that was projection since that was on my own list. (I'm happy with the performance and find manipulating zip or jar files clumsy, so I'm not too concerned about all the nice things you can *do* with that flexibility. :-)
How about if we just design a Python archive file format; provide code in the core (in Python or C) to import from it; provide a Python program to create archive files; and provide a Standard Directory to put archives in so they can be found quickly. For extensibility and control, we add functions to the imp module. Detailed comments follow:
These tools go well beyond just an archive file format, but hopefully a file format will help. Greg and Gordon should be able to control the format so it meets their needs. We need a standard format.
I think the standard format should be a subclass of zip or jar (which is itself a subclass of zip). We have already written (at CNRI, as yet unreleased) the necessary Python tools to manipulate zip archives; moreover 3rd party tools are abundantly available, both on Unix and on Windows (as well as in Java). Zip files also lend themselves to self-extracting archives and similar things, because the file index is at the end, so I think that Greg & Gordon should be happy.
I don't like sys.path at all. It is currently part of the problem.
Eh? That's the first thing I hear something bad about it. Maybe that's because you live on Windows -- on Unix, search paths are ubiquitous.
I suggest that archive files MUST be put into a known directory.
Why? Maybe this works on Windows; on Unix this is asking for trouble because it prevents users from augmenting the installation provided by the sysadmin. Even on newer Windows versions, users without admin perms may not be allowed to add files to that privileged directory.
On Windows this is the directory of the executable, sys.executable. On Unix this $PREFIX plus version, namely "%s/lib/python%s/" % (sys.prefix, sys.version[0:3]). Other platforms can have different rules.
We should also have the ability to append archive files to the executable or a shared library assuming the OS allows this (Windows and Linux do allow it). This is the first location searched, nails the archive to the interpreter, insulates us from an erroneous sys.path, and enables single-file Python programs.
OK for the executable. I'm not sure what the point is of appending an archive to the shared library? Anyway, does it matter (on Windows) if you add it to python16.dll or to python.exe?
We don't need compression. The whole ./Lib is 1.2 Meg, and if we compress it to zero we save a Meg. Irrelevant. Installers provide compression anyway so when Python programs are shipped, they will be compressed then.
Problems are that Python does not ship with compression, we will have to add it, we will have to support it and its current method of compression forever, and it adds complexity.
OK, OK. I think most zip tools have a way to turn off the compression. (Anyway, it's a matter of more I/O time vs. more CPU time; hardare for both is getting better faster than we can tweak the code :-)
Sigh, this proposal does not provide for this. It seems like a job for imputil. But if the file format and import code is available from the imp module, it can be used as part of the solution.
Well, the question is really if we want flexibility or archive files. I care more about the flexibility. If we get a clear vote for archive files, I see no problem with implementing that first.
If the Python library is available as an archive, I think startup will be greatly improved anyway.
Really? I know about all the system calls it makes, but I don't really see much of a delay -- I have a prompt in well under 0.1 second. --Guido van Rossum (home page: http://www.python.org/~guido/)