Python 1.6 status

Greg Stein recently reminded me that he was holding off on 1.6 patches because he was under the impression that I wasn't accepting them yet. The situation is rather more complicated than that. There are a great deal of things that need to be done, and for many of them I'd be most happy to receive patches! For other things, however, I'm still in the requirements analysis phase, and patches might be premature (e.g., I want to redesign the import mechanisms, and while I like some of the prototypes that have been posted, I'm not ready to commit to any specific implementation). How do you know for which things I'm ready for patches? Ask me. I've tried to make lists before, and there are probably some hints in the TODO FAQ wizard as well as in the "requests" section of the Python Bugs List. Greg also suggested that I might receive more patches if I opened up the CVS tree for checkins by certain valued contributors. On the one hand I'm reluctant to do that (I feel I have a pretty good track record of checking in patches that are mailed to me, assuming I agree with them) but on the other hand there might be something to say for this, because it gives contributors more of a sense of belonging to the inner core. Of course, checkin privileges don't mean you can check in anything you like -- as in the Apache world, changes must be discussed and approved by the group, and I would like to have a veto. However once a change is approved, it's much easier if the contributor can check the code in without having to go through me all the time. A drawback may be that some people will make very forceful requests to be given checkin privileges, only to never use them; just like there are some members of python-dev who have never contributed. I definitely want to limit the number of privileged contributors to a very small number (e.g. 10-15). One additional detail is the legal side -- contributors will have to sign some kind of legal document similar to the current (wetsign.html) release form, but guiding all future contributions. I'll have to discuss this with CNRI's legal team. Greg, I understand you have checkin privileges for Apache. What is the procedure there for handing out those privileges? What is the procedure for using them? (E.g. if you made a bogus change to part of Apache you're not supposed to work on, what happens?) I'm hoping for several kind of responses to this email: - uncontroversial patches - questions about whether specific issues are sufficiently settled to start coding a patch - discussion threads opening up some issues that haven't been settled yet (like the current, very productive, thread in i18n) - posts summarizing issues that were settled long ago in the past, requesting reverification that the issue is still settled - suggestions for new issues that maybe ought to be settled in 1.6 - requests for checkin privileges, preferably with a specific issue or area of expertise for which the requestor will take responsibility --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
I'm hoping for several kind of responses to this email:
My list of things to do for 1.6 is: * Translate re.py to C and switch to the latest PCRE 2 codebase (mostly done, perhaps ready for public review in a week or so). * Go through the O'Reilly POSIX book and draw up a list of missing POSIX functions that aren't available in the posix module. This was sparked by Greg Ward showing me a Perl daemonize() function he'd written, and I realized that some of the functions it used weren't available in Python at all. (setsid() was one of them, I think.) * A while back I got approval to add the mmapfile module to the core. The outstanding issue there is that the constructor has a different interface on Unix and Windows platforms. On Windows: mm = mmapfile.mmapfile("filename", "tag name", <mapsize>) On Unix, it looks like the mmap() function: mm = mmapfile.mmapfile(<filedesc>, <mapsize>, <flags> (like MAP_SHARED), <prot> (like PROT_READ, PROT_READWRITE) ) Can we reconcile these interfaces, have two different function names, or what?
- suggestions for new issues that maybe ought to be settled in 1.6
Perhaps we should figure out what new capabilities, if any, should be added in 1.6. Fred has mentioned weak references, and there are other possibilities such as ExtensionClass. -- A.M. Kuchling http://starship.python.net/crew/amk/ Society, my dear, is like salt water, good to swim in but hard to swallow. -- Arthur Stringer, _The Silver Poppy_

[Guido]
I'm specifically requesting not to have checkin privileges. So there. I see two problems: 1. When patches go thru you, you at least eyeball them. This catches bugs and design errors early. 2. For a multi-platform app, few people have adequate resources for testing; e.g., I can test under an obsolete version of Win95, and NT if I have to, but that's it. You may not actually do better testing than that, but having patches go thru you allows me the comfort of believing you do <wink>.

I'm specifically requesting not to have checkin privileges. So there.
I will force nobody to use checkin privileges. However I see that for some contributors, checkin privileges will save me and them time.
I will still eyeball them -- only after the fact. Since checkins are pretty public, being slapped on the wrist for a bad checkin is a pretty big embarrassment, so few contributors will check in buggy code more than once. Moreover, there will be more eyeballs.
I expect that the same mechanisms will apply. I have access to Solaris, Linux and Windows (NT + 98) but it's actually a lot easier to check portability after things have been checked in. And again, there will be more testers. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
I will force nobody to use checkin privileges.
That almost went without saying <wink>.
However I see that for some contributors, checkin privileges will save me and them time.
Then it's Good! Provided it doesn't hurt language stability. I agree that changing the system to mail out diffs addresses what I was worried about there.

Fredrik Lundh wrote:
But please don't add the current version as default importer... its strategy is way too slow for real life apps (yes, I've tested this: imports typically take twice as long as with the builtin importer). I'd opt for an import manager which provides a useful API for import hooks to register themselves with. What we really need is not yet another complete reimplementation of what the builtin importer does, but rather a more detailed exposure of the various import aspects: finding modules and loading modules. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 43 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Marc-Andre wrote:
I think imputil's emulation of the builtin importer is more of a demonstration than a serious implementation. As for speed, it depends on the test.
I'd opt for an import manager which provides a useful API for import hooks to register themselves with.
I think that rather than blindly chain themselves together, there should be a simple minded manager. This could let the programmer prioritize them.
The first clause I sort of agree with - the current implementation is a fine implementation of a filesystem directory based importer. I strongly disagree with the second clause. The current import hooks are just such a detailed exposure; and they are incomprehensible and unmanagable. I guess you want to tweak the "finding" part of the builtin import mechanism. But that's no reason to ask all importers to break themselves up into "find" and "load" pieces. It's a reason to ask that the standard importer be, in some sense, "subclassable" (ie, expose hooks, or perhaps be an extension class like thingie). - Gordon

Gordon McMillan wrote:
IMHO the current import mechanism is good for developers who must work on the library code in the directory tree, but a disaster for sysadmins who must distribute Python applications either internally to a number of machines or commercially. What we need is a standard Python library file like a Java "Jar" file. Imputil can support this as 130 lines of Python. I have also written one in C. I like the imputil approach, but if we want to add a library importer to import.c, I volunteer to write it. I don't want to just add more complicated and unmanageable hooks which people will all use different ways and just add to the confusion. It is easy to install packages by just making them into a library file and throwing it into a directory. So why aren't we doing it? Jim Ahlstrom

[imputil and friends] "James C. Ahlstrom" wrote:
Perhaps we ought to rethink the strategy under a different light: what are the real requirement we have for Python imports ? Perhaps the outcome is only the addition of say one or two features and those can probably easily be added to the builtin system... then we can just forget about the whole import hook dilema for quite a while (AFAIK, this is how we got packages into the core -- people weren't happy with the import hook). Well, just an idea... I have other threads to follow :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 43 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Gordon McMillan wrote:
Agreed. I like some of imputil's features, but I think the API need to be redesigned.
Indeed. (A list of importers has been suggested, to replace the list of directories currently used.)
Based on how many people have successfully written import hooks, I have to agree. :-(
Agreed. Subclassing is a good way towards flexibility. And Jim Ahlstrom writes:
Unfortunately, you're right. :-(
Please volunteer to design or at least review the grand architecture -- see below.
You're so right!
It is easy to install packages by just making them into a library file and throwing it into a directory. So why aren't we doing it?
Rhetorical question. :-) So here's a challenge: redesign the import API from scratch. Let me start with some requirements. Compatibility issues: --------------------- - the core API may be incompatible, as long as compatibility layers can be provided in pure Python - support for rexec functionality - support for freeze functionality - load .py/.pyc/.pyo files and shared libraries from files - support for packages - sys.path and sys.modules should still exist; sys.path might have a slightly different meaning - $PYTHONPATH and $PYTHONHOME should still be supported (I wouldn't mind a splitting up of importdl.c into several platform-specific files, one of which is chosen by the configure script; but that's a bit of a separate issue.) New features: ------------- - Integrated support for Greg Ward's distribution utilities (i.e. a module prepared by the distutil tools should install painlessly) - Good support for prospective authors of "all-in-one" packaging tool authors like Gordon McMillan's win32 installer or /F's squish. (But I *don't* require backwards compatibility for existing tools.) - Standard import from zip or jar files, in two ways: (1) an entry on sys.path can be a zip/jar file instead of a directory; its contents will be searched for modules or packages (2) a file in a directory that's on sys.path can be a zip/jar file; its contents will be considered as a package (note that this is different from (1)!) I don't particularly care about supporting all zip compression schemes; if Java gets away with only supporting gzip compression in jar files, so can we. - Easy ways to subclass or augment the import mechanism along different dimensions. For example, while none of the following features should be part of the core implementation, it should be easy to add any or all: - support for a new compression scheme to the zip importer - support for a new archive format, e.g. tar - a hook to import from URLs or other data sources (e.g. a "module server" imported in CORBA) (this needn't be supported through $PYTHONPATH though) - a hook that imports from compressed .py or .pyc/.pyo files - a hook to auto-generate .py files from other filename extensions (as currently implemented by ILU) - a cache for file locations in directories/archives, to improve startup time - a completely different source of imported modules, e.g. for an embedded system or PalmOS (which has no traditional filesystem) - Note that different kinds of hooks should (ideally, and within reason) properly combine, as follows: if I write a hook to recognize .spam files and automatically translate them into .py files, and you write a hook to support a new archive format, then if both hooks are installed together, it should be possible to find a .spam file in an archive and do the right thing, without any extra action. Right? - It should be possible to write hooks in C/C++ as well as Python - Applications embedding Python may supply their own implementations, default search path, etc., but don't have to if they want to piggyback on an existing Python installation (even though the latter is fraught with risk, it's cheaper and easier to understand). Implementation: --------------- - There must clearly be some code in C that can import certain essential modules (to solve the chicken-or-egg problem), but I don't mind if the majority of the implementation is written in Python. Using Python makes it easy to subclass. - In order to support importing from zip/jar files using compression, we'd at least need the zlib extension module and hence libz itself, which may not be available everywhere. - I suppose that the bootstrap is solved using a mechanism very similar to what freeze currently used (other solutions seem to be platform dependent). - I also want to still support importing *everything* from the filesystem, if only for development. (It's hard enough to deal with the fact that exceptions.py is needed during Py_Initialize(); I want to be able to hack on the import code written in Python without having to rebuild the executable all the time. Let's first complete the requirements gathering. Are these requirements reasonable? Will they make an implementation too complex? Am I missing anything? Finally, to what extent does this impact the desire for dealing differently with the Python bytecode compiler (e.g. supporting optimizers written in Python)? And does it affect the desire to implement the read-eval-print loop (the >>> prompt) in Python? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Let's first complete the requirements gathering.
Yes.
I think you can get 90% of where you want to be with something much simpler. And the simpler implementation will be useful in the 100% solution, so it is not wasted time. How about if we just design a Python archive file format; provide code in the core (in Python or C) to import from it; provide a Python program to create archive files; and provide a Standard Directory to put archives in so they can be found quickly. For extensibility and control, we add functions to the imp module. Detailed comments follow:
Easily met by keeping the current C code.
These tools go well beyond just an archive file format, but hopefully a file format will help. Greg and Gordon should be able to control the format so it meets their needs. We need a standard format.
I don't like sys.path at all. It is currently part of the problem. I suggest that archive files MUST be put into a known directory. On Windows this is the directory of the executable, sys.executable. On Unix this $PREFIX plus version, namely "%s/lib/python%s/" % (sys.prefix, sys.version[0:3]). Other platforms can have different rules. We should also have the ability to append archive files to the executable or a shared library assuming the OS allows this (Windows and Linux do allow it). This is the first location searched, nails the archive to the interpreter, insulates us from an erroneous sys.path, and enables single-file Python programs.
We don't need compression. The whole ./Lib is 1.2 Meg, and if we compress it to zero we save a Meg. Irrelevant. Installers provide compression anyway so when Python programs are shipped, they will be compressed then. Problems are that Python does not ship with compression, we will have to add it, we will have to support it and its current method of compression forever, and it adds complexity.
Sigh, this proposal does not provide for this. It seems like a job for imputil. But if the file format and import code is available from the imp module, it can be used as part of the solution.
- support for a new compression scheme to the zip importer
I guess compression should be easy to add if Python ships with a compression module.
- a cache for file locations in directories/archives, to improve startup time
If the Python library is available as an archive, I think startup will be greatly improved anyway.
Yes.
That's a good reason to omit compression. At least for now.
Yes, except that we need to be careful to preserve the freeze feature for users. We don't want to take it over.
Yes, we need a function in imp to turn archives off: import imp imp.archiveEnable(0)
I don't think it impacts these at all. Jim Ahlstrom

Agreed, but I'm not sure that it addresses the problems that started this thread. I can't really tell, since the message starting the thread just requested imputil, without saying which parts of it were needed. A followup claimed that imputil was a fine prototype but too slow for real work. I inferred that flexibility was requested. But maybe that was projection since that was on my own list. (I'm happy with the performance and find manipulating zip or jar files clumsy, so I'm not too concerned about all the nice things you can *do* with that flexibility. :-)
I think the standard format should be a subclass of zip or jar (which is itself a subclass of zip). We have already written (at CNRI, as yet unreleased) the necessary Python tools to manipulate zip archives; moreover 3rd party tools are abundantly available, both on Unix and on Windows (as well as in Java). Zip files also lend themselves to self-extracting archives and similar things, because the file index is at the end, so I think that Greg & Gordon should be happy.
I don't like sys.path at all. It is currently part of the problem.
Eh? That's the first thing I hear something bad about it. Maybe that's because you live on Windows -- on Unix, search paths are ubiquitous.
I suggest that archive files MUST be put into a known directory.
Why? Maybe this works on Windows; on Unix this is asking for trouble because it prevents users from augmenting the installation provided by the sysadmin. Even on newer Windows versions, users without admin perms may not be allowed to add files to that privileged directory.
OK for the executable. I'm not sure what the point is of appending an archive to the shared library? Anyway, does it matter (on Windows) if you add it to python16.dll or to python.exe?
OK, OK. I think most zip tools have a way to turn off the compression. (Anyway, it's a matter of more I/O time vs. more CPU time; hardare for both is getting better faster than we can tweak the code :-)
Well, the question is really if we want flexibility or archive files. I care more about the flexibility. If we get a clear vote for archive files, I see no problem with implementing that first.
If the Python library is available as an archive, I think startup will be greatly improved anyway.
Really? I know about all the system calls it makes, but I don't really see much of a delay -- I have a prompt in well under 0.1 second. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Think about multiple packages in multiple zip files. The zip files store file directories. That means we would need a sys.zippath to search the zip files. I don't want another PYTHONPATH phenomenon. Greg Stein and I once discussed this (and Gordon I think). They argued that the directories should be flattened. That is, think of all directories which can be reached on PYTHONPATH. Throw away all initial paths. The resultant archive has *.pyc at the top level, as well as package directories only. The search path is "." in every archive file. No directory information is stored, only module names, some with dots.
On windows, just print sys.path. It is junk. A commercial distribution has to "just work", and it fails if a second installation (by someone else) changes PYTHONPATH to suit their app. I am trying to get to "just works", no excuses, no complications.
It works on Windows because programs install themselves in their own subdirectories, and can put files there instead of /windows/system32. This holds true for Windows 2000 also. A Unix-style installation to /windows/system32 would (may?) require "administrator" privilege. On Unix you are right. I didn't think of that because I am the Unix sysadmin here, so I can put things where I want. The Windows solution doesn't fit with Unix, because executables go in a ./bin directory and putting library files there is a no-no. Hmmmm... This needs more thought. Anyone else have ideas??
The point of using python16.dll is to append the Python library to it, and append to python.exe (or use files) for everything else. That way, the 1.6 interpreter is linked to the 1.6 Lib, upgrading to 1.7 means replacing only one file, and there is no wasted storage in multiple Lib's. I am thinking of multiple Python programs in different directories. But maybe you are right. On Windows, if python.exe can be put in /windows/system32 then it really doesn't matter.
Well, if Python now has its own compression that is built in and comes with it, then that is different. Maybe compression is OK.
I don't like flexibility, I like standardization and simplicity. Flexibility just encourages users to do the wrong thing. Everyone vote please. I don't have a solid feeling about what people want, only what they don't like.
So do I. I guess I was just echoing someone else's complaint. JimA

[Guido]
No problem (I created my own formats for relatively minor reasons). [JimA]
What if sys.path looked like: [DirImporter('.'), ZlibImporter('c:/python/stdlib.pyz'), ...]
While I do flat archives (no dots, but that's a different story), there's no reason the archive couldn't be structured. Flat archives are definitely simpler. [JimA]
Py_Initialize (); PyRun_SimpleString ("import sys; del sys.path[1:]"); Yeah, there's a hole there. Fixable if you could do a little pre- Py_Initialize twiddling.
I suggest that archive files MUST be put into a known directory.
No way. Hard code a directory? Overwrite someone else's Python "standalone"? Write to a C: partition that is deliberately sized to hold nothing but Windows? Make network installations impossible?
There's nothing Unix-style about installing to /Windows/system32. 'Course *they* have symbolic links that actually work...
The official Windows solution is stuff in registry about app paths and such. Putting the dlls in the exe's directory is a workaround which works and is more managable than the official solution.
That's a handy trick on Windows, but it's got nothing to do with Python.
I've noticed that the people who think there should only be one way to do things never agree on what it is.
Everyone vote please. I don't have a solid feeling about what people want, only what they don't like.
Flexibility. You can put Christian's favorite Einstein quote here too.
Install some stuff. Deinstall some of it. Repeat (mixing up the order) until your registry and hard drive are shattered into tiny little fragments. It doesn't take long (there's lots of stuff a defragmenter can't touch once it's there). - Gordon

Gordon McMillan wrote:
Well, that changes the current meaning of sys.path.
Ooops. I didn't mean a known directory you couldn't change. But I did mean a directory you shouldn't change. But you are right. The directory should be configurable. But I would still like to see a highly encouraged directory. I don't yet have a good design for this. Anyone have ideas on an official way to find library files? I think a Python library file is a Good Thing, but it is not useful if the archive can't be found. I am thinking of a busy SysAdmin with someone nagging him/her to install Python. SysAdmin doesn't want another headache. What if Python becomes popular and users want it on Unix and PC's? More work! There should be a standard way to do this that just works and is dumb-stupid-simple. This is a Python promotion issue. Yes everyone here can make sys.path work, but that is not the point.
I agree completely.
It also works on Linux. I don't know about other systems.
Flexibility. You can put Christian's favorite Einstein quote here too.
I hope we can still have ease of use with all this flexibility. As I said, we need to promote Python. Jim Ahlstrom

Guido van Rossum wrote:
import d # from directory a/b/c/
import d # from directory a/b/c/
Since you were asking: I would like functionality equivalent to my latest import patch for a slightly different lookup scheme for module import inside packages to become a core feature. If it becomes a core feature I promise to never again start threads about relative imports :-) Here's the summary again: """ [The patch] changes the default import mechanism to work like this: try a.b.c.d try a.b.d try a.d try d fail instead of just doing the current two-level lookup: try a.b.c.d try d fail As a result, relative imports referring to higher level packages work out of the box without any ugly underscores in the import name. Plus the whole scheme is pretty simple to explain and straightforward. """ You can find the patch attached to the message "Walking up the package hierarchy" in the python-dev mailing list archive. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 42 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Thu, 18 Nov 1999, Guido van Rossum wrote:
It what ways? It sounds like you've applied some thought. Do you have any concrete ideas yet, or "just a feeling" :-) I'm working through some changes from JimA right now, and would welcome other suggestions. I think there may be some outstanding stuff from MAL, but I'm not sure (Marc?)
... So here's a challenge: redesign the import API from scratch.
I would suggest starting with imputil and altering as necessary. I'll use that viewpoint below.
Which APIs are you referring to? The "imp" module? The C functions? The __import__ and reload builtins? I'm guessing some of imp, the two builtins, and only one or two C functions.
- support for rexec functionality
No problem. I can think of a number of ways to do this.
- support for freeze functionality
No problem. A function in "imp" must be exposed to Python to support this within the imputil framework.
- load .py/.pyc/.pyo files and shared libraries from files
No problem. Again, a function is needed for platform-specific loading of shared libraries.
- support for packages
No problem. Demo's in current imputil.
- sys.path and sys.modules should still exist; sys.path might have a slightly different meaning
I would suggest that both retain their *exact* meaning. We introduce sys.importers -- a list of importers to check, in sequence. The first importer on that list uses sys.path to look for and load modules. The second importer loads builtins and frozen code (i.e. modules not on sys.path). Users can insert/append new importers or alter sys.path as before. sys.modules continues to record name:module mappings.
- $PYTHONPATH and $PYTHONHOME should still be supported
No problem.
Easy enough. The standard importer can select the appropriate platform-specific module/function to perform the load. i.e. these can move to Modules/ and be split into a module-per-platform.
I don't know the specific requirements/functionality that would be required here (does Greg? :-), but I can't imagine any problem with this.
Um. *No* problem. :-)
While this could easily be done, I might argue against it. Old apps/modules that process sys.path might get confused. If compatibility is not an issue, then "No problem." An alternative would be an Importer instance added to sys.importers that is configured for a specific archive (in other words, don't add the zip file to sys.path, add ZipImporter(file) to sys.importers). Another alternative is an Importer that looks at a "sys.py_archives" list. Or an Importer that has a py_archives instance attribute.
No problem. This will slow things down, as a stat() for *.zip and/or *.jar must be done, in addition to *.py, *.pyc, and *.pyo.
I presume we would support whatever zlib gives us, and no more.
Presuming ZipImporter is a class (derived from Importer), then this ability is wholly dependent upon the author of ZipImporter providing the hook. The Importer class is already designed for subclassing (and its interface is very narrow, which means delegation is also *very* easy; see imputil.FuncImporter).
- support for a new archive format, e.g. tar
A cakewalk. Gordon, JimA, and myself each have archive formats. :-)
No problem at all.
- a hook that imports from compressed .py or .pyc/.pyo files
No problem at all.
- a hook to auto-generate .py files from other filename extensions (as currently implemented by ILU)
No problem at all.
- a cache for file locations in directories/archives, to improve startup time
No problem at all.
- a completely different source of imported modules, e.g. for an embedded system or PalmOS (which has no traditional filesystem)
No problem at all. In each of the above cases, the Importer.get_code() method just needs to grab the byte codes from the XYZ data source. That data source can be cmopressed, across a network, on-the-fly generated, or whatever. Each importer can certainly create a cache based on its concept of "location". In some cases, that would be a mapping from module name to filesystem path, or to a URL, or to a compiled-in, frozen module.
Ack. Very, very difficult. The imputil scheme combines the concept of locating/loading into one step. There is only one "hook" in the imputil system. Its semantic is "map this name to a code/module object and return it; if you don't have it, then return None." Your compositing example is based on the capabilities of the find-then-load paradigm of the existing "ihooks.py". One module finds something (foo.spam) and the other module loads it (by generating a .py). All is not lost, however. I can easily envision the get_code() hook as allowing any kind of return type. If it isn't a code or module object, then another hook is called to transform it. [ actually, I'd design it similarly: a *series* of hooks would be called until somebody transforms the foo.spam into a code/module object. ] The compositing would be limited ony by the (Python-based) Importer classes. For example, my ZipImporter might expect to zip up .pyc files *only*. Obviously, you would want to alter this to support zipping any file, then use the suffic to determine what to do at unzip time.
- It should be possible to write hooks in C/C++ as well as Python
Use FuncImporter to delegate to an extension module. This is one of the benefits of imputil's single/narrow interface.
An application would have full control over the contents of sys.importers. For a restricted execution app, it might install an Importer that loads files from *one* directory only which is configured from a specific Win32 Registry entry. That importer could also refuse to load shared modules. The BuiltinImporter would still be present (although the app would certainly omit all but the necessary builtins from the build). Frozen modules could be excluded.
I posited once before that the cost of import is mostly I/O rather than CPU, so using Python should not be an issue. MAL demonstrated that a good design for the Importer classes is also required. Based on this, I'm a *strong* advocate of moving as much as possible into Python (to get Python's ease-of-coding with little relative cost). The (core) C code should be able to search a path for a module and import it. It does not require dynamic loading or packages. This will be used to import exceptions.py, then imputil.py, then site.py. The platform-specific module that perform dynamic-loading must be a statically linked module (in Modules/ ... it doesn't have to be in the Python/ directory). site.py can complete the bootstrap by setting up sys.importers with the appropriate Importer instances (this is where an application can define its own policy). sys.path was initially set by the import.c bootstrap code (from the compiled-in path and environment variables). Note that imputil.py would not install any hooks when it is loaded. That is up to site.py. This implies the core C code will import a total of three modules using its builtin system. After that, the imputil mechanism would be importing everything (site.py would .install() an Importer which then takes over the __import__ hook). Further note that the "import" Python statement could be simplified to use only the hook. However, this would require the core importer to inject some module names into the imputil module's namespace (since it couldn't use an import statement until a hook was installed). While this simplification is "neat", it complicates the run-time system (the import statement is broken until a hook is installed). Therefore, the core C code must also support importing builtins. "sys" and "imp" are needed by imputil to bootstrap. The core importer should not need to deal with dynamic-load modules. To support frozen apps, the core importer would need to support loading the three modules as frozen modules. The builtin/frozen importing would be exposed thru "imp" for use by imputil for future imports. imputil would load and use the (builtin) platform-specific module to do dynamic-load imports.
Yes. I don't see this as a requirement, though. We wouldn't start to use these by default, would we? Or insist on zlib being present? I see this as more along the lines of "we have provided a standardized Importer to do this, *provided* you have zlib support."
The bootstrap that I outlined above could be done in C code. The import code would be stripped down dramatically because you'll drop package support and dynamic loading. Alternatively, you could probably do the path-scanning in Python and freeze that into the interpreter. Personally, I don't like this idea as it would not buy you much at all (it would still need to return to C for accessing a number of scanning functions and module importing funcs).
My outline above does not freeze anything. Everything resides in the filesystem. The C code merely needs a path-scanning loop and functions to import .py*, builtin, and frozen types of modules. If somebody nukes their imputil.py or site.py, then they return to Python 1.4 behavior where the core interpreter uses a path for importing (i.e. no packages). They lose dynamically-loaded module support.
I'm not a fan of the compositing due to it requiring a change to semantics that I believe are very useful and very clean. However, I outlined a possible, clean solution to do that (a secondary set of hooks for transforming get_code() return values). The requirements are otherwise reasonable to me, as I see that they can all be readily solved (i.e. they aren't burdensome). While this email may be long, I do not believe the resulting system would be complex. From the user-visible side of things, nothing would be changed. sys.path is still present and operates as before. They *do* have new functionality they can grow into, though (sys.importers). The underlying C code is simplified, and the platform-specific dynamic-load stuff can be distributed to distinct modules, as needed (e.g. BeOS/dynloadmodule.c and PC/dynloadmodule.c).
If the three startup files require byte-compilation, then you could have some issues (i.e. the byte-compiler must be present). Once you hit site.py, you have a "full" environment and can easily detect and import a read-eval-print loop module (i.e. why return to Python? just start things up right there). site.py can also install new optimizers as desired, a new Python-based parser or compiler, or whatever... If Python is built without a parser or compiler (I hope that's an option!), then the three startup modules would simply be frozen into the executable. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
We should retain the current order. I think is is: first builtin, next frozen, next sys.path. I really think frozen modules should be loaded in preference to sys.path. After all, they are compiled in.
Users can insert/append new importers or alter sys.path as before.
I agree with Greg that sys.path should remain as it is. A list of importers can add the extra functionality. Users will probably want to adjust the order of the list.
Yes, I agree. And I think the main() should be written in Python. Lots of Python should be written in Python.
But these can be frozen in (as you mention below). I dislike depending on sys.path to load essential modules. If they are not frozen in, then we need a command line argument to specify their path, with sys.path used otherwise. Jim Ahlstrom

Here's the promised response to Greg's response to my wishlist.
I actually think that the way the PVM (Python VM) calls the importer ought to be changed. Assigning to __builtin__.__import__ is a crock. The API for __import__ is a crock.
I'm guessing some of imp, the two builtins, and only one or two C functions.
All of those.
- support for rexec functionality
No problem. I can think of a number of ways to do this.
Agreed, I think that imputil can do this.
Agreed. It currently exports init_frozen() which is about the right functionality.
Is it useful to expose the platform differences? The current imp.load_dynamic() should suffice.
This is looking like the redesign I was looking for. (Note that imputil's current chaining is not good since it's impossible to remove or reorder importers, which I think is a required feature; an explicit list would solve this.) Actually, the order is the other way around, but by now you should know that. It makes sense to have separate ones for builtin and frozen modules -- these have nothing in common. There's another issue, which isn't directly addressed by imputil, although with clever use of inheritance it might be doable. I'd like more support for this however. Quite orthogonally to the issue of having separate importers, I might want to recognize new extensions. Take the example of the ILU folks. They want to be able to drop a file "foo.isl" in any directory on sys.path and have the ILU stubber automatically run if you try to import foo (the client stubs) or foo__skel (the server skeleton). This doesn't fit in the sys.importers strategy, because they want to be able to drop their .isl files in any directory along sys.path. (Or, more likely, they want to have control over where in sys.modules the directory/directories with .isl files are placed.) This requires an ugly modification to the _fs_import() function. (Which should have been a method, by the way, to make overriding it in a subclass of PathImporter easier!) I've been thinking here along the lines of a strategy where the standard importer (the one that walks sys.path) has a set of hooks that define various things it could look for, e.g. .py files, .pyc files, .so or .dll files. This list of hooks could be changed to support looking for .isl files. There's an old, subtle issue that could be solved through this as well: whether or not a .pyc file without a .py file should be accepted or not. Long ago (in Python 0.9.8) a .pyc file alone would never be loaded. This was changed at the request of a small but vocal minority of Python developers who wanted to distribute .pyc files without .py files. It has occasionally caused frustration because sometimes developers move .py files around but forget to remove the .pyc files, and then the .pyc file is silently picked up if it occurs on sys.path earlier than where the .py was moved to. Having a set of hooks for various extensions would make it possible to have a default where lone .pyc files are ignored, but where one can insert a .pyc importer in the list of hooks that does the right thing here. (Of course, it may be possible that this whole feature of lone .pyc files should be replaced since the same need is easily taken care of by zip importers. I also want to support (Jim A notwithstanding :-) a feature whereby different things besides directories can live on sys.path, as long as they are strings -- these could be added from the PYTHONPATH env variable. Every piece of code that I've ever seen that uses sys.path doesn't care if a directory named in sys.path doesn't exist -- it may try to stat various files in it, which also don't exist, and as far as it is concerned that is just an indication that the requested module doesn't live there. Again, we would have to dissect imputil to support various hooks that deal with different kind of entities in sys.path. The default hook list would consist of a single item that interprets the name as a directory name; other hooks could support zip files or URLs. Jack's "magic cookies" could also be supported nicely through such a mechanism.
Users can insert/append new importers or alter sys.path as before.
sys.modules continues to record name:module mappings.
Yes. Note that the interpretation of __file__ could be problematic. To what value do you set __file__ for a module loaded from a zip archive?
Again: what's the advantage of exposing the platform specificity?
Probably more support is required from the other end: once it's common for modules to be imported from zip files, the distutil code needs to support the creation and installation of such zip files. Also, there is a need for the install phase of distutil to communicate the location of the zip file to the Python installation.
:-)
Note that this is what I mention above for distutil support.
While this could easily be done, I might argue against it. Old apps/modules that process sys.path might get confused.
Above I argued that this shouldn't be a problem.
This would be harder for distutil: where does Python get the initial list of importers?
Another alternative is an Importer that looks at a "sys.py_archives" list. Or an Importer that has a py_archives instance attribute.
OK, but again distutil needs to be able to add to this list when it installs a package. (Note that package deinstallation should also be supported!) (Of course I don't require this to affect Python processes that are already running; but it should be possible to easily change the default search path for all newly started instances of a given Python installation.)
Fine, this is where the caching comes in handy.
That's it. :-)
Agreed. But since we're likely going to provide this as a standandard feature, we must ensure that it provides this hook.
But maybe it's *too* narrow; some of the hooks I suggest above seem to require extra interfaces -- at least in some of the subclasses of the Importer base class. Note: I looked at the doc string for get_code() and I don't understand what the difference is between the modname and fqname arguments. If I write "import foo.bar", what are modname and fqname? Why are both present? Also, while you claim that the API is narrow, the multiple return values (also the different types for the second item) make it complicated.
See above -- I think this should be more integrated with sys.path than you are thinking of. The more I think about it, the more I see that the problem is that for you, the importer that uses sys.path is a final subclass of Importer (i.e. it is itself not further subclassed). Several of the hooks I want seem to require additional hooks in the PathImporter rather than new importers.
See above for sys.path integration remark.
Actually, I take most of this back. Importers that deal with new extension types often have to go through a file system to transform their data to .py files, and this is just too complicated. However it would be still nice if there was code sharing between the code that looks for .py and .pyc files in a zip archive and the code that does the same in a filesystem. Hm, maybe even that shouldn't be necessary, the zip file probably should contain only .pyc files... (Unrelated remark: I should really try to release the set of modules we've written here at CNRI to deal with zip files. Unfortunately zip files are hairy and so is our code.)
That's fine. I actually don't recall where the find-then-load API came from, I think it may be an artefact of the original implementation strategy. It is currently used as follows: we try to see if there's a .pyc and then we try to see if there's a .py; if both exist we compare the timestamps etc. to choose which one. But that's still a red herring.
I still don't understand why ihooks.py had to be so complicated. I guess I just had much less of an understanding of the issues. (It was also partly a compromise with an alternative design by Ken Manheimer, who basically forced me to support packages, originally through ni.py.)
OK. This could be a feature of a subclass of Importer.
Maybe not so great, since it sounds like the C code can't benefit from any of the infrastructure that imputil offers. I'm not sure about this one though.
This is one of the benefits of imputil's single/narrow interface.
Plus its vague specs? :-)
Actually there's little reason to exclude frozen modules or any .py/.pyc modules -- by definition, bytecode can't be dangerous. It's the builtins and extensions that need to be censored. We currently do this by subclassing ihooks, where we mask the test for builtins with a comparison to a predefined list of names.
Agreed. However, how do you explain the slowdown (from 9 to 13 seconds I recall) though? Are you a lousy coder? :-)
It does, however, need to import builtin modules. imputil currently imports imp, sys, strop and __builtin__, struct and marshal; note that struct can easily be a dynamic loadable module, and so could strop in theory. (Note that strop will be unnecessary in 1.6 if you use string methods.) I don't think that this chicken-or-egg problem is particularly problematic though.
See earlier comments.
I thing that algorithm (currently in getpath.c / getpathp.c) might also be moved to Python code -- imported frozen. Sadly, rebuilding with a new version of a frozen module might be more complicated than rebuilding with a new version of a C module, but writing and maintaining this code in Python would be *sooooooo* much easier that I think it's worth it.
(Three not counting the builtin modules.)
Same chicken-or-egg. We can be pragmatic. For a developer, I'd like a bit of robustness (all this makes it rather hard to debug a broken imputil, and that's a fair amount of code!).
Same question. Since that all has to be coded in C anyway, why not?
To support frozen apps, the core importer would need to support loading the three modules as frozen modules.
I'd like to see a description of how someone like Jim A would build a single-file application using the new mechanism. This could completely replace freeze. (Freeze currently requires a C compiler; that's bad.)
Sure.
Agreed. Zlib support is easy to get, but there are probably platforms where it's not. (E.g. maybe the Mac? I suppose that on the Mac, there would be some importer classes to import from a resource fork.)
Not the dynamic loading. But yes the package support.
Good. Though I think there's also a need for freezing everything. And when we go the route of the zip archive, the zip archive handling code needs to be somewhere -- frozen seems to be a reasonable choice.
But if the path guessing is also done by site.py (as I propose) the path will probably be wrong. A warning should be printed.
As you may see from my responses, I'm a big fan of having several different sets of hooks. I do withdraw the composition requirement though.
Another chicken-or-egg. No biggie.
You mean "why return to C?" I agree. It would be cool if somehow IDLE and Pythonwin would also be bootstrapped using the same mechanisms. (This would also solve the question "which interactive environment am I using?" that some modules and apps want to see answered because they need to do things differently when run under IDLE,for example.)
More power to hooks! --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
Not the case -- I know you've looked at some of my code in the KOE that ensures only real directories are on the path, and each is only there once (pathhack.py). Given that sys.path is often too long and includes duplicate entries in a large system (often one entry with and one without a trailing / for a given directory), it useful to be able to distinguish between things that should be interpretable as paths and things that aren't. It should not be hard to declare that "cookies" or whatever have some special form, like "<cookie>".
It doesn't help that that code just plain stinks. I maintain that no one here understands the whole of it. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

"FLD" == Fred L Drake, <fdrake@acm.org> writes:
FLD> It doesn't help that that code just plain stinks. I maintain FLD> that no one here understands the whole of it. I'm all for improving the code and getting it out. The real problem is that interfaces have been glommed on for every new use of a Zip file. (You want to read one off a socket and extract files before you've got the whole thing? No problem! Add a new class.) We need to figure out the common patterns for using the archives and write a new set of interfaces to support that. Jeremy

[Jeremy, on our Zip code]
If we gave you the code we currently have, would someone else in this forum be willing to redesign it? Eventually it would become part of the Python distribution. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote: [...]
Note that the interpretation of __file__ could be problematic. To what value do you set __file__ for a module loaded from a zip archive?
Makefiles use "archive(entry)" (this also supports nesting if needed). [...]
This may be off-topic, but has anyone considered what it would take to load shared libs out of an archive? One way is to extract on-the-fly to a temporary area. A refinement is to leave extracted files there as cache, and perhaps even to extract to a file with a name derived from its MD5 digest (this way multiple users and even Python installations can share the cache). Would it be useful to define a "standard" area? -- Jean-Claude

Jean-Claude Wippler <jcw@equi4.com> wrote:
This may be off-topic, but has anyone considered what it would take to load shared libs out of an archive?
well, we do that in a number of applications. (lazy installers are really cool... if you've installed works, you've seen some weird stuff -- for example, when the application starts the first time, it's loading everything from inside the installer. the rest of the installation is done from within the application itself, using archives in the installation executable) I think things like this are better left for the application designers, though... </F>

Jean-Claude Wippler wrote:
I discovered the hard way this entry is not optional. I just used the archive file name for __file__.
IMHO putting shared libs in an archive is a bad idea because the OS can not use them there. They must be extracted as you say. But then storage is wasted by using space in the archive and the external file. Deleting them after use wastes time. Better to leave them out of the archive and provide for them in the installer. IMHO the archive is a basic simple feature, and people make installers on top of that. Archives shouldn't try to do it all. JimA

James C. Ahlstrom <jim@interet.com> wrote:
have you tried it? if not, why do you think you should be allowed to forbid others from doing it? in "the inmates are running the asylum", alan cooper points out that the *major* reason people all over the world love web applications are that there are no bloody installers. and here you are advocating that we all should be forced to use installers, when python makes it trivial to write self-installing apps. double-argh! (on the other hand, why do I complain? all pythonworks customers is going to be able to do all this anyway...). <rant size="major"> frankly, this "design by committee" (or is it "design by people who've never even been close to implementing something because they thought it was too hard, and thus think they're qualified to argue against those of us who didn't even realize that it was a hard problem"?) trend I've been seeing in all kinds of python forums makes me sooooo sad. the more of this I see (dist- utils-sig, doc-sig, here, c.l.python), the sadder I get, and the more I sympathise with John Skaller who's defining his own python-like universe... if someone needs me, I'll be down in the pub having a beer with the mad scientist, the shiny eff-bot, and mr. nitpicker. if we're not there, you'll find us in the lab, working on new string matching facilities for 1.6, SOAP [1], tkinter replacements for the masses, and whatever else we can come up with... see you! </rant> 1) http://www.newsalert.com/bin/story?StoryId=Coenz0bWbu0znmdKXqq

Fredrik Lundh wrote:
Huh ? Two points: 1. How can you be sure that people haven't tried implementing their ideas and for various reasons have come to some conclusion about those ideas ? 2. Would you seriously disqualify people from joining a discussion by the simple arguement that they have not implemented anything yet ? Just take the Unicode discussion as example: it was very lively and resulted in a decent proposal which is now subject to further investigation by the implementors ;-) Many people have joined in even though they did not and/or will not implement anything. Still, their arguments were very useful to show up weaknesses in the proposal. Now, let's rather have a beer in the pub around the corner than go on ranting about :-). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 27 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Fredrik Lundh wrote:
[snip]
<rant size="major">
frankly, this "design by committee"...
[snip]
... see you!
</rant>
C'mon /F, it's a battle of ideas and that's the way it works before filtering the good ones from the bad ones, then focusing on the appropriate implementation. I'm in sync with the discussion, although I haven't posted my partial notes on it due to lack of time. But let me say that overall, this discussion is a good thing and the more opinions we get, the better. BTW, you just _can't_ leave like this and start playing solitaire at the bar, first, because we need beer too and it's unlikely that you'll find a bar we don't know already, and second, because it was you who revived this discussion with 1 word, repeated 3 times:
Thus, with no visible argumentation (so don't shoot on others when they argue instead of you), and with this one word, you pushed Guido to the extreme of suggesting a complete redesign of the import machinery from scratch, based on a "Grand Architecture" :-). Right? -- Right! This is a fact and a fairly amount of the credits go entirely to you! Since then, however, I haven't really seen your arguments, and I believe that nobody here got exactly your point. I, for one, may well argue against imputil as being just another brick on top of the grand mess. But because I haven't made the time to write properly my notes, I don't dare to express a partial opinion, not blame those who argue good or bad in the meantime, when I'm silent. So, why are you showing us your back when you have clearly something to say, but like me, you haven't made the time to say it? Please don't waste my time with emotional rants ;-). Everybody here tries to contribute according to its knowledge, experience and availability. Later, -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Fredrik Lundh wrote:
James C. Ahlstrom <jim@interet.com> wrote:
IMHO putting shared libs in an archive is a bad idea because the OS
Dear Fredrik, I thought the point of Python-Dev was to propose designs and get feedback, right? Well, I got feedback :-). OK, I agree to alter my archive format so it provides the ability to store shared libs and not just *.pyd. I will add the string length and if needed a flag indicating the name is a shared lib. Now the details:
have you tried it? if not, why do you think you should be allowed to forbid others from doing it?
Yes I have tried it, and I am currently on my fourth version of an archive format which is based on formats by Greg Stein and Gordon McMillan. I hope it meets with the favor of the Grand Inquisition, and becomes the standard format. But maybe it won't. Oh well.
I am not forcing anyone to do anything, only proposing that shared libs are best handled directly by imputil and not the class within imputil which handles archive files. It is just a geeky design issue, nothing more. JimA

[Guido] big snip
I just left it alone (ie, as it was when I picked up the .pyc). Turns out OK, because then when the end user files a bug report, the developer can track it down.
As I recall: import foo.bar -> get_code(None, 'foo', 'foo') # returns foo -> get_code(<self>, 'bar', 'foo.bar')
Why are both present?
I think so the importer can choose between being tree structured or flat.
I have something working for Linux now. I froze exceptions.py. I hacked getpath.c so prefix = exec_prefix = executable's directory and the starting path is [prefix]. Although I did it differently, you could regard imputil.py and archive.py as frozen, too. (On WIndows it's somewhat different, because the result uses the stock python15.dll.) This somewhat oversimplifies; and I haven't really thought out all the ways people might try to use sym links. I'm inclined to think the starting path should contain both the executable's real directory and the sym link's directory.
.... I do withdraw the composition requirement though.
Hooray! - Gordon

On Thu, 2 Dec 1999, Guido van Rossum wrote:
Something like sys.set_import_hook() ? The other alternative that I see would be to have the C code scan sys.importers, assuming each are callable objects, and call them with the appropriate params (e.g. module name). Of course, to move this scanning into Python would require something like sys.set_import_hook() unless Python looks for a hard-coded module and entrypoint.
We can provide Python code to provide compatibility for "imp" and the two hooks. Nothing we can do to the C code, though. I'm not sure what the import API looks like from C, and whether they could all stay. A brief glance looks like most could stay. [ removing any would change Python's API version, which might be "okay" ]
This comes up several times throughout this message, and in some off-list mail Guido and I have exchanged. Namely, "should dynamic loading be part of the core, or performed via a module?" I would rather see it become a module, rather than inside the core (despite the fact that the module would have to be compiled into the interpreter). I believe this provides more flexibility for people looking to replace/augment/update/fix dynamic loading on various architectures. Rather than changing the core, a person can just drop in another module. The isolation between the core and modules is nicer, aesthetically, to me. The modules would also be exposing Just Another Importer Function, rather than a specialized API in the builtin imp module. Also note that it is easier to keep a module *out* of a Python-based application, than it is to yank functions out of the core of Python. Frozen apps, embedded apps, etc could easily leave out dynamic loading. Are there strict advantages? Not any that I can think of right now (beyond a bit of ease-of-use mentioned above). It just feels better to me.
The chaining is an aspect of the current, singular import hook that Python uses. In the past, I've suggested the installation of a "manager" that maintains a list. sys.importers is similar in practice. Note that this Manager would be present with the sys.set_import_hook() scheme, while the Manager is implied if the core scans sys.importers.
Yes, JimA pointed this out. The latest imputil has corrected this. I combined the builtin and frozen Importers because they were just so similar. I didn't want to iterate over two Importers when a single one sufficed quite well. *shrug* Could go either way, really.
Correct: while imputil doesn't address this, the standard/default Importer classes *definitely* can.
I yanked that code out of the DirectoryImporter so that the PathImporter could use it. I could see a reorg that creates a FileSystemImporter that defines the method, and the other two just subclass from that.
Agreed. It should be easy to have a mapping of extension to handler. One issue: should there be an ordering to the extensions? Exercise for the reader to alter the data structures...
I think, "too bad for them." :-) Having just a .pyc is a very nice feature. But how can you tell whether it was meant to be a plain .pyc or a mis-ordered one? To truly resolve that, you would need to scan the whole path, looking for a .py. However, maybe somebody put the .pyc there on purpose, to override the .py! --- begin slightly-off-topic --- Here is a neat little Bash script that allows you to use a .pyc as a CGI (to avoid parse overhead). Normally, you can't just drop a .pyc into the cgi-bin directory because the OS doesn't know how to execute it. Not a problem, I say... just append your .pyc to the following Bash script and execute! :-) #!/bin/bash exec - 3< $0 ; exec python -c 'import os,marshal ; f = os.fdopen(3, "rb") ; f.readline() ; f.readline() ; f.seek(8, 1) ; _c = marshal.load(f) ; del os, marshal, f ; exec _c' $@ (the script should be two lines; and no... you can't use readlines(2)) The above script will preserve stdin, stdout, and stderr. If the caller also use 3< ... well, that got overridden :-) The script doesn't work on Windows for two reasons, though: 1) Bash, 2) the "rb" mode followed by readline() Detailed info at the bottom of http://www.lyra.org/greg/python/ --- end of off-topic ---
Maybe. I'd still like to see plain .pyc files, but I know I can work around any change you might make here :-) (i.e. whatever you'd like to do... go for it)
I'm not in favor of this, but it is more-than-doable. Again: your discretion...
Specifically, the PathImporter would get "dissected" :-). No problem.
You don't (certainly in a way that is nice/compatible for modules that refer to it). This is why I don't like __file__ and __path__. They just don't make sense in archives or frozen code. Python code that relies on them will create problems when that code is placed into different packaging mechanisms.
See above.
I'm quite confident that something can be designed that would satisfy the needs here. Something akin to .pth files that a zip importer could read.
For most code, no, but as Fred mentioned (and I surmise), there are things out there assuming that sys.path contains strings which specify directories. Sure, we can do this (your discretion), but my feeling is to avoid it.
Default is just the two: BuiltinImporter and PathImporter. Adding ZipImporters (or anything else) at startup is TBD, but shouldn't pose a problem.
IFF caching is enabled for the particular platform and installation.
Correct -- the *subclasses*. I still maintain the imputil design of a single hook (get_code) is Right. I'll make a swipe at PathImporter in the next few weeks to add the capability for new extensions.
Gordon detailed this in another note... Yes, the multiple return values make it a bit more complicated, but I can't think of any reasonable alternatives. A bit more doc should do the trick, I'd guess.
Correct -- I've currently designed/implemented PathImporter as "final". I don't forsee a problem turning it into something that can be hooked at run-time, or subclassed at code-time. A detailing of the features needed would be handy: * allow alternative file suffixes, with functions or subclasses to map the file into a code/module object.
Gordon replies to this... All of the archives that myself, Gordon, and JimA have been using only store .pyc files. I don't see much code sharing between the filesystem and archive import code.
That would be my preference, rather than loading more into the Importer base class itself.
There isn't any infrastructure that needs to be accessed. get_code() is the call-point, and there is no mechanism provided to the callee to call back into the imputil system.
This is one of the benefits of imputil's single/narrow interface.
Plus its vague specs? :-)
Ouch. I thought I was actually doing quite a bit better than normal with that long doc-string on get_code :-(
True. My concern is an invader misusing one "type" of module for another. For example, let's say you've provided a selection of modules each exporting function FOO, and the user can configure which module to use. Can they do damage if some unrelated, frozen module also exports FOO? Minor issue, anyhow. All the functionality is there.
Heh :-) I have not spent *any* time working on optimization. Currently, each Importer in the chain redoes some work of the prior Importer. A bit of restructuring would split the common work out to a Manager, which then calls a method in the Importer (and passes all the computed work). Of course, a bit of profiling wouldn't hurt either. Some of the "imp" interfaces could possibly be refined to better support the BuiltinImporter or the dynamic load features. The question is still valid, though -- at the moment, I can't explain it because I haven't looked into it.
Note: after writing this, I realized there is really no need for the core to do the imputil import. site.py can easily do that.
It does, however, need to import builtin modules. imputil currently
Correct.
I knew about strop, but imputil would be harder to use today if it relied on the string methods. So... I've delayed that change. The struct module is used in a couple teeny cases, dealing with constructing a network-order, 4-byte, binary integer value. It would be easy enough to just do that with a bit of Python code instead.
I don't think that this chicken-or-egg problem is particularly problematic though.
Right. In my ideal world, the core couldn't do a dynamic load, so that would need to be considered within the bootstrap process.
I think we can find a better way to freeze modules and to use them. Especially for the cases where we have specific "core" functions implemented in Python. (e.g. freezing parsers, compilers, and/or the read-eval loop) I don't forsee an issue that the build process becomes more complicated. If we nuke "makesetup" in favor of a Python script, then we could create a stub Python executable which runs the build script which writes the Setup file and the getpath*.c file(s).
Correct, although I'll modify my statement to "two plus the builtins".
True. I threw that out as an alternative, and then presented the counter argument :-)
It simplifies the core's import code to not deal with that stuff at all.
The portable mechanism for freezing will always need a compiler. Platform specific mechanisms (e.g. append to the .EXE, or use the linker to create a new ELF section) can optimize the freeze process in different ways. I don't have a design in my head for the freeze issues -- I've been considering that the mechanism would remain about the same. However, I can easily see that different platforms may want to use different freeze processes... hmm...
Exactly. And importer classes to load from a Win32 resources (modifying a .EXE's resources post-link is cleaner than the append solution)
Sure.
All right. Doesn't Python already print a warning if it can't find site.py?
Yes. However, I've only recognized one so far. Propose more... I'm confident we can update the PathImporter design to accomodate (and retain the underlying imputil paradigm).
I do withdraw the composition requirement though.
:-)
Heh. Yah, that's what I meant :-)
Haven't thought on this. Should be doable, I'd think.
:-) You betcha! I believe my next order of business: * update PathImporter with the file-extension hook * dynload C code reorg, per the other email * create new-model site.py and trash import.c * review freeze mechanisms and process * design mechanism for frozen core functionality (eg. getpath*.c) (coding and building design) * shift core functions to Python, using above design I'll just plow ahead, but also recognize that any/all may change. ie. I'll build examples/finals/prototypes and Guido can pick/choose/reimplement/etc as needed. I'm out next week, but should start on the above items by the end of the month (will probably do another mod_dav release in there somewhere). Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg, Great response. I think we know where we each stand. Please go ahead with a new design. (That's trust, not carte blanche.) Just one thought: the more I think about it, the less I like sys.importers: functionality which is implemented through sys.importers must necessarily be placed either in front of all of sys.path or after it. While this is helpful for "canned" apps that want *everything* to be imported from a fixed archive, I think that for regular Python installations sys.path should remain the point of attack. In particular, installing a new package (e.g. PIL) should affect sys.path, regardless of the way of delivery of the modules (shared libs, .py files, .pyc files, or a zip archive). I'm not too worried about code that inspects sys.path and expects certain invariants; that code is most likely interfering with the import mechanism so should be revisited anyway. On the lone .pyc issue: I'd like to see this disappear when using the filesystem, I see no use for it there if we support .pyc files in zip archives. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, 3 Dec 1999, Guido van Rossum wrote:
Accepted gratefully. Thx.
Okay. I'll design with respect to this model. To be explicit/clear and to be sure I'm hearing you right: sys.path may contain Importer instances. Given the name FOO, the system will step through sys.path looking for the first occurence of FOO (looking in a directory or delegating). FOO may be found with any number of (configurable) file extensions, which are ordered (e.g. ".so" before ".py" before ".isl").
The Benevolent Dictator has spoken. So be it. :-)
No problem. This actually creates a simplification in the system, as I'm seeing it now. I'm also seeing opportunities for a code reorg which may work towards MAL's issues with performance. I hope to have something in two or three weeks. I also hope people can be patient :-), but I certainly wouldn't mind seeing some alternative code! Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
On Fri, 3 Dec 1999, Guido van Rossum wrote:
This is basically a gripe about this design spec. So if the answer turns out to be "we need this functionality so shut up" then just say that and don't flame me. This spec is painful. Suppose sys.path has 10 elements, and there are six file extensions. Then the simple algorithm is slow: for path in sys.path: # Yikes, may not be a string! for ext in file_extensions: name = "%s.%s" % (module_name, ext) full_path = os.path.join(path, name) if os.path.isfile(full_path): # Process file here And sys.path can contain class instances which only makes things slower. You could do a readdir() and cache the results, but maybe that would be slower. A better algorithm might be faster, but a lot more complicated. In the context of archive files, it is also painful. It prevents you from saving a single dictionary of module names. Instead you must have len(sys.path) dictionaries. You could try to save in the archive information about whether (say) a foo.dll was present in the file system, but the list of extensions is extensible. The above problem only exists to support equally-named modules; that is, to support a run-time choice of whether to load foo.pyc, foo.dll, foo.isl, etc. I claim (without having written it) that the fastest algorithm to solve the unique-name case is much faster than the fastest algorithm to solve the choose-among-equal-names case. Do we really need to support the equal-name case [Jim runs for cover...]? If so, how about inventing a new way to support it. Maybe if equal names exist, these must be pre-loaded from a known location? JimA

On Sat, 4 Dec 1999, James C. Ahlstrom wrote:
This is the algorithm that Python uses today, and my standard Importers follow.
And sys.path can contain class instances which only makes things slower.
IMO, we don't know this, or whether it is significant.
Who knows. BUT: the import process is now in Python -- it makes it *much* easier to run these experiments. We could not really do this when the import process is "hard-coded" in C code.
I am not following this. What/where is the "single dictionary of module names" ? Are you referring to a cache? Or is this about building an archive? An archive would look just like we have now: map a name to a module. It would not need multiple dictionaries.
I don't understand what the problem is. I don't see one. We are still mapping a name to a module. sys.path defines a precedence. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
Agreed.
Agreed.
Agreed.
The "single dictionary of names" is in the single archive importer instance and has nothing to do with creating the archive. It is currently programmed this way. Suppose the user specifies by name 12 archive files to be searched. That is, the user hacks site.py to add archive names to the importer. The "single dictionary" means that the archive importer takes the 12 dictionaries in the 12 files and merges them together into one dictionary in order to speed up the search for a name. The good news is you can always just call the archive importer to get a module. The bad news is you can't do that for each entry on sys.path because there is no necessary identity between archive files and sys.path. The user specified the archive files by name, and they may or may not be on sys.path, and the user may or may not have specified them in the same order as sys.path even if they are. Suppose archive files must lie on sys.path and are processed in order. Then to find them you must know their name. But IMHO you want to avoid doing a readdir() on each element of sys.path and looking for files *.pyl. Suppose archive file names in general are the known name "lib.pyl" for the Python library, plus the names "package.pyl" where "package" can be the name of a Python package as a single archive file. Then if the user tries to import foo, imputil will search along sys.path looking for foo.pyc, foo.pyl, etc. If it finds foo.pyl, the archive importer will add it to its list of known archive files. But it must not add it to its single dictionary, because that would destroy the information about its position along sys.path. Instead, it must keep a separate dictionary for each element of sys.path and search the separate dictionaries under control of imputil. That is, get_code() needs a new argument for the element of sys.path being searched. Alternatively, you could create a new importer instance for each archive file found, but then you still have multiple dictionaries. They are in the multiple instances. All this is needed only to support import of identically named modules. If there are none, there is no problem because sys.path is being used only to find modules, not to disambiguate them. See also my separate reply to your other post which discusses this same issue. JimA

On Mon, 6 Dec 1999, James C. Ahlstrom wrote:
Ah. There is the problem. In Guido's suggestion for the "next path of inquiry" :-), there is no "single dictionary of names". Instead, you have Importer instances as items in sys.path. Each instance maintains its dictionary, and they are not (necessarily) combined. If we were to combine them, then we would need to maintain the ordering requirements implied by sys.path. However, this would be problematic if sys.path changed -- we would have to detect the situation and rebuild a merged dict.
The importer must be inserted into sys.path to establish a precedence. If the user wants to add 12 libraries... fine. But *all* of those modules will fall under a precedence defined by the Importer's position on sys.path.
I do not believe that we will arbitrarily locate and open library files. They must be specified explicitly.
If the user installs ".pyl" as a recognized extension (i.e. installs into the PathImporter), then the above scenario is possible. In my in-head-design, I had not imagined any state being retained for extension-recognizer hooks. Of course, state can be retained simply by using a bound-method for the hook function. get_code() would not need to change. The foo.pyl would be consulted at the appropriate time based on where it is found in sys.path. Note that file- extension hooks would definitely have a complete path to the target file. Those are not Importers, however (although they will closely follow the get_code() hook since the extension is called from get_code).

No need to worry about this: just don't merge the caches. Compared to the hundreds of failed open() calls that are done now, it's no big deal to do 12 failed Python dictionary lookups instead of one. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 7 Dec 1999, Guido van Rossum wrote:
Have no fear... I wasn't planning on this... complicates too much stuff for too little gain. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein <gstein@lyra.org> wrote:
so the "sys.path contains importers (or strings)" strategy is now officially sanctioned? cool!!! (a quick look in our code base says that this will cause some trouble, unless os.path.isdir() is modified to reject non-strings... after all, if it's not a string, it cannot be a valid directory path, so this does make some sense ;-) another aside: can we have a standard mechanism for listing the contents of a given archive, please? we have a lot of "path scanning" stuff (PIL and PST, among others), and it would be great if things didn't break down if you stuff it all in an archive. something like: for path in sys.path: if os.path.isdir(path): files = os.listdir(path) else: try: files = path.listdir() except AttributeError: files = None if files is None: # no idea what's in here else: # path provides (at least) these modules would be really useful. and yes, it shouldn't have to be mentioned, since squeeze have done it since early 1997, but archive importers should provide a standard way to include non-module resources in the archive, and a standard way to access such resources as ordinary python streams. e.g: file = path.open(name, "rb") or something... </F>

[Guido]
Good idea. Question: we have keyword import, __import__, imp and PyImport_*. Which of those (if any) define the "core API"? [rexec, freeze: yes]
- load .py/.pyc/.pyo files and shared libraries from files
Shared libraries? Might that not involve some rather shady platform-specific magic? If it can be kept kosher, I'm all for it; but I'd say no if it involved, um, undocumented features.
support for packages
Absolutely. I'll just comment that the concept of package.__path__ is also affected by the next point.
If sys.path changes meaning, should not $PYTHONPATH also?
I assume that this is mostly a matter of $PYTHONPATH and other path manipulation mechanisms?
I guess you've forgotten: I'm that *really* tall guy <wink>.
I don't mind this, but it depends on whether sys.path changes meaning.
But it's affected by the same considerations (eg, do we start with filesystem names and wrap them in importers, or do we just start with importer instances / specifications for importer instances).
I think this is a matter of what zip compression is officially blessed. I don't mind if it's none; providing / creating zipped versions for platforms that support it is nearly trivial.
Which begs the question of the meaning of sys.path; and if it's still filesystem names, how do you get one of these in there?
A bit of discussion: I've got 2 kinds of archives. One can contain anything & is much like a zip (and probably should be a zip). The other contains only compressed .pyc or .pyo. The latter keys contents by logical name, not filesystem name. No extensions, and when a package is imported, the code object returned is the __init__ code object, (vs returning None and letting the import mechanism come back and ask for package.__init__). When you're building an archive, you have to go thru the .py / .pyc / .pyo / is it a package / maybe compile logic anyway. Why not get it all over with, so that at runtime there's no choices to be made. Which means (for this kind of archive) that including somebody's .spam in your archive isn't a matter of a hook, but a matter of adding to the archive's build smarts.
A way of tweaking that which will become sys.path before Py_Initialize would be *most* welcome.
There are other possibilites here, but I have only half- formulated ideas at the moment. The critical part for embedding is to be able to *completely* control all path related logic.
I'll summarize as follows: 1) What "sys.path" means (and how it's construction can be manipulated) is critical. 2) See 1.
I can assure you that code.py runs fine out of an archive :-). - Gordon

On Tue, 16 Nov 1999, Guido van Rossum wrote:
Somebody proposes that a person is added to the list of people with checkin privileges. If nobody else in the group vetoes that, then they're in (their system doesn't require continual participation by each member, so it can only operate at a veto level, rather than a unanimous assent). It is basically determined on the basis of merit -- has the person been active (on the Apache developer's mailing list) and has the person contributed something significant? Further, by providing commit access, will they further the goals of Apache? And, of course, does their temperament seem to fit in with the other group members? I can make any change that I'd like. However, there are about 20 other people who can easily revert or alter my changes if they're bogus. There are no programmatic restrictions.... You could say it is based on mutual respect and a social contract of behavior. Large changes should be discussed before committing to CVS. Bug fixes, doc enhancements, minor functional improvements, etc, all follow a commit-then-review process. I just check the thing in. Others see the diff (emailed to the checkins mailing list (this is different from Python-checkins which only says what files are changed, rather than providing the diff)) and can comment on the change, make their own changes, etc. To be concrete: I added the Expat code that now appears in Apache 1.3.9. Before doing so, I queried the group. There were some issues that I dealt with before finally commiting Expat to the CVS repository. On another occasion, I added a new API to Apache; again, I proposed it first, got an "all OK" and committed it. I've done a couple bug fixes which I just checked in. [ "all OK" means three +1 votes and no vetoes. everybody has veto ability (but the responsibility to explain why and to remove their veto when their concerns are addressed). ] On many occasions, I've reviewed the diffs that were posted to the checkins list, and made comments back to the author. I've caught a few problems this way. For Apache 2.0, even large changes are commit-then-review at this point. At some point, it will switch over to review-then-commit and the project will start moving towards stabilization/release. (bug fixes and stuff will always remain commit-then-review) I'll note that the process works very well given that diffs are emailed. I doubt that it would be effective if people had to fetch CVS diffs themselves. Your note also implies "areas of ownership". This doesn't really exist within Apache. There aren't even "primary authors" or things like that. I have the ability/rights to change any portions: from the low-level networking, to the documentation, to the server-side include processing. Of coures, if I'm going to make a big change, then I'll be posting a patch for review first, and whoever has worked in that area in the past may/will/should comment. Cheers, -g -- Greg Stein, http://www.lyra.org/

This makes sense, but I have one concern: if somebody who isn't liked very much (say a capable hacker who is a real troublemaker) asks for privileges, would people veto this? I'd be reluctant to go on record as veto'ing a particular person. (E.g. there are a few troublemakers in c.l.py, and I would never want them to join python-dev let alone give them commit privileges, but I'm not sure if I would want to discuss this on a publicly archived mailing list -- or even on a privately archived mailing list, given that the number of members might be in the hundreds. [...stuff I like...]
That's a great idea; I'll see if we can do that to our checkin email, regardless of whether we hand out commit privileges.
But that's Apache, which is explicitly run as a collective. In Python, I definitely want to have ownership of certain sections of the code. But I agree that this doesn't need to be formalized by access control lists; the social process you describe sounds like it will work just fine. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Greg]
[Guido]
It seems that a key point in Greg's description is that people don't propose *themselves* for checkin. They have to talk someone else into proposing them. That should keep Endang out of the running for a few years <wink>. After that, I care more about their code than their personalities. If the stuff they check in is good, fine; if it's not, lock 'em out for direct cause.
I'd be reluctant to go on record as veto'ing a particular person.
Secret Ballot run off a web page -- although not so secret you can't see who voted for what <wink>.

There was a suggestion to start augmenting the checkin emails to include the diffs of the checkin. This would let you keep a current snapshot of the tree without having to do a direct `cvs update'. I think I can add this without a ton of pain. It would not be optional however, and the emails would get larger (and some checkins could be very large). There's also the question of whether to generate unified or context diffs. Personally, I find context diffs easier to read; unified diffs are smaller but not by enough to really matter. So here's an informal poll. If you don't care either way, you don't need to respond. Otherwise please just respond to me and not to the list. 1. Would you like to start receiving diffs in the checkin messages? 2. If you answer `yes' to #1 above, would you prefer unified or context diffs? -Barry

On Fri, 19 Nov 1999, Barry A. Warsaw wrote:
I've been using diffs-in-checkin for review, rather than to keep a local snapshot updated. I guess you use the email for this (procmail truly is frightening), but I think for most people it would be for purposes of review.
Absolutely.
2. If you answer `yes' to #1 above, would you prefer unified or context diffs?
Don't care. I've attached an archive of the files that I use in my CVS repository to do emailed diffs. These came from Ken Coar (an Apache guy) as an extraction from the Apache repository. Yes, they do use Perl. I'm not a Perl guy, so I probably would break things if I tried to "fix" the scripts by converting them to Python (in fact, Greg Ward helped to improve log_accum.pl for me!). I certainly would not be adverse to Python versions of these files, or other cleanups. I trimmed down the "avail" file, leaving a few examples. It works with cvs_acls.pl to provide per-CVS-module read/write access control. I'm currently running mod_dav, PyOpenGL, XML-SIG, PyWin32, and two other small projects out of this repository. It has been working quite well. Cheers, -g -- Greg Stein, http://www.lyra.org/

"GS" == Greg Stein <gstein@lyra.org> writes:
GS> I've been using diffs-in-checkin for review, rather than to GS> keep a local snapshot updated. Interesting; I hadn't though about this use for the diffs. GS> I've attached an archive of the files that I use in my CVS GS> repository to do emailed diffs. These came from Ken Coar (an GS> Apache guy) as an extraction from the Apache repository. Yes, GS> they do use Perl. I'm not a Perl guy, so I probably would GS> break things if I tried to "fix" the scripts by converting GS> them to Python (in fact, Greg Ward helped to improve GS> log_accum.pl for me!). I certainly would not be adverse to GS> Python versions of these files, or other cleanups. Well, we all know Greg Ward's one of those subversive types, but then again it's great to have (hopefully now-loyal) defectors in our camp, just to keep us honest :) Anyway, thanks for sending the code, it'll come in handy if I get stuck. Of course, my P**l skills are so rusted I don't think even an oilcan-armed Dorothy could lube 'em up, so I'm not sure how much use I can put them to. Besides, I already have a huge kludge that gets run on each commit, and I don't think it'll be too hard to add diff generation... IF the informal vote goes that way. -Barry

"BAW" == Barry A Warsaw <bwarsaw@cnri.reston.va.us> writes:
BAW> There was a suggestion to start augmenting the checkin emails BAW> to include the diffs of the checkin. This would let you keep BAW> a current snapshot of the tree without having to do a direct BAW> `cvs update'. The voting has stopped, with the "yeah" vote slightly head of the "nay" vote. We'll go with context diffs, and we'll be implementing Greg Stein's approach with the xml-checkins list: truncating diffs to H number of lines at the top and T number of lines at the bottom, so as not to overwhelm incoming email. I'll try to get this going sometime today (no promises). You'll likely see a number of tests coming through python-checkins in the meantime. I'll send a message out when it's done. -Barry

Okay folks, I think I've got the diff thing working now. The trick (for you CVS heads) was that you can't do a `cvs diff' while you're executing a loginfo script. Lock contention (repeat after me: "I Love CVS!"). Anyway, let's see how you all like it. Note that based on a suggestion by Greg Stein, seconded by GvR, I do not send out the entire diff of every file (which could potentially be huge). I send out 20 lines from the head of the diff and 20 lines from the tail, and suppress everything inbetween. Those numbers can be easily tweaked, and I'm not sure what the ideal is. Let's see what the emails look like when stuff starts getting checked in. Enjoy, -Barry

Guido van Rossum writes:
I'm hoping for several kind of responses to this email:
My list of things to do for 1.6 is: * Translate re.py to C and switch to the latest PCRE 2 codebase (mostly done, perhaps ready for public review in a week or so). * Go through the O'Reilly POSIX book and draw up a list of missing POSIX functions that aren't available in the posix module. This was sparked by Greg Ward showing me a Perl daemonize() function he'd written, and I realized that some of the functions it used weren't available in Python at all. (setsid() was one of them, I think.) * A while back I got approval to add the mmapfile module to the core. The outstanding issue there is that the constructor has a different interface on Unix and Windows platforms. On Windows: mm = mmapfile.mmapfile("filename", "tag name", <mapsize>) On Unix, it looks like the mmap() function: mm = mmapfile.mmapfile(<filedesc>, <mapsize>, <flags> (like MAP_SHARED), <prot> (like PROT_READ, PROT_READWRITE) ) Can we reconcile these interfaces, have two different function names, or what?
- suggestions for new issues that maybe ought to be settled in 1.6
Perhaps we should figure out what new capabilities, if any, should be added in 1.6. Fred has mentioned weak references, and there are other possibilities such as ExtensionClass. -- A.M. Kuchling http://starship.python.net/crew/amk/ Society, my dear, is like salt water, good to swim in but hard to swallow. -- Arthur Stringer, _The Silver Poppy_

[Guido]
I'm specifically requesting not to have checkin privileges. So there. I see two problems: 1. When patches go thru you, you at least eyeball them. This catches bugs and design errors early. 2. For a multi-platform app, few people have adequate resources for testing; e.g., I can test under an obsolete version of Win95, and NT if I have to, but that's it. You may not actually do better testing than that, but having patches go thru you allows me the comfort of believing you do <wink>.

I'm specifically requesting not to have checkin privileges. So there.
I will force nobody to use checkin privileges. However I see that for some contributors, checkin privileges will save me and them time.
I will still eyeball them -- only after the fact. Since checkins are pretty public, being slapped on the wrist for a bad checkin is a pretty big embarrassment, so few contributors will check in buggy code more than once. Moreover, there will be more eyeballs.
I expect that the same mechanisms will apply. I have access to Solaris, Linux and Windows (NT + 98) but it's actually a lot easier to check portability after things have been checked in. And again, there will be more testers. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
I will force nobody to use checkin privileges.
That almost went without saying <wink>.
However I see that for some contributors, checkin privileges will save me and them time.
Then it's Good! Provided it doesn't hurt language stability. I agree that changing the system to mail out diffs addresses what I was worried about there.

Fredrik Lundh wrote:
But please don't add the current version as default importer... its strategy is way too slow for real life apps (yes, I've tested this: imports typically take twice as long as with the builtin importer). I'd opt for an import manager which provides a useful API for import hooks to register themselves with. What we really need is not yet another complete reimplementation of what the builtin importer does, but rather a more detailed exposure of the various import aspects: finding modules and loading modules. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 43 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Marc-Andre wrote:
I think imputil's emulation of the builtin importer is more of a demonstration than a serious implementation. As for speed, it depends on the test.
I'd opt for an import manager which provides a useful API for import hooks to register themselves with.
I think that rather than blindly chain themselves together, there should be a simple minded manager. This could let the programmer prioritize them.
The first clause I sort of agree with - the current implementation is a fine implementation of a filesystem directory based importer. I strongly disagree with the second clause. The current import hooks are just such a detailed exposure; and they are incomprehensible and unmanagable. I guess you want to tweak the "finding" part of the builtin import mechanism. But that's no reason to ask all importers to break themselves up into "find" and "load" pieces. It's a reason to ask that the standard importer be, in some sense, "subclassable" (ie, expose hooks, or perhaps be an extension class like thingie). - Gordon

Gordon McMillan wrote:
IMHO the current import mechanism is good for developers who must work on the library code in the directory tree, but a disaster for sysadmins who must distribute Python applications either internally to a number of machines or commercially. What we need is a standard Python library file like a Java "Jar" file. Imputil can support this as 130 lines of Python. I have also written one in C. I like the imputil approach, but if we want to add a library importer to import.c, I volunteer to write it. I don't want to just add more complicated and unmanageable hooks which people will all use different ways and just add to the confusion. It is easy to install packages by just making them into a library file and throwing it into a directory. So why aren't we doing it? Jim Ahlstrom

[imputil and friends] "James C. Ahlstrom" wrote:
Perhaps we ought to rethink the strategy under a different light: what are the real requirement we have for Python imports ? Perhaps the outcome is only the addition of say one or two features and those can probably easily be added to the builtin system... then we can just forget about the whole import hook dilema for quite a while (AFAIK, this is how we got packages into the core -- people weren't happy with the import hook). Well, just an idea... I have other threads to follow :-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 43 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Gordon McMillan wrote:
Agreed. I like some of imputil's features, but I think the API need to be redesigned.
Indeed. (A list of importers has been suggested, to replace the list of directories currently used.)
Based on how many people have successfully written import hooks, I have to agree. :-(
Agreed. Subclassing is a good way towards flexibility. And Jim Ahlstrom writes:
Unfortunately, you're right. :-(
Please volunteer to design or at least review the grand architecture -- see below.
You're so right!
It is easy to install packages by just making them into a library file and throwing it into a directory. So why aren't we doing it?
Rhetorical question. :-) So here's a challenge: redesign the import API from scratch. Let me start with some requirements. Compatibility issues: --------------------- - the core API may be incompatible, as long as compatibility layers can be provided in pure Python - support for rexec functionality - support for freeze functionality - load .py/.pyc/.pyo files and shared libraries from files - support for packages - sys.path and sys.modules should still exist; sys.path might have a slightly different meaning - $PYTHONPATH and $PYTHONHOME should still be supported (I wouldn't mind a splitting up of importdl.c into several platform-specific files, one of which is chosen by the configure script; but that's a bit of a separate issue.) New features: ------------- - Integrated support for Greg Ward's distribution utilities (i.e. a module prepared by the distutil tools should install painlessly) - Good support for prospective authors of "all-in-one" packaging tool authors like Gordon McMillan's win32 installer or /F's squish. (But I *don't* require backwards compatibility for existing tools.) - Standard import from zip or jar files, in two ways: (1) an entry on sys.path can be a zip/jar file instead of a directory; its contents will be searched for modules or packages (2) a file in a directory that's on sys.path can be a zip/jar file; its contents will be considered as a package (note that this is different from (1)!) I don't particularly care about supporting all zip compression schemes; if Java gets away with only supporting gzip compression in jar files, so can we. - Easy ways to subclass or augment the import mechanism along different dimensions. For example, while none of the following features should be part of the core implementation, it should be easy to add any or all: - support for a new compression scheme to the zip importer - support for a new archive format, e.g. tar - a hook to import from URLs or other data sources (e.g. a "module server" imported in CORBA) (this needn't be supported through $PYTHONPATH though) - a hook that imports from compressed .py or .pyc/.pyo files - a hook to auto-generate .py files from other filename extensions (as currently implemented by ILU) - a cache for file locations in directories/archives, to improve startup time - a completely different source of imported modules, e.g. for an embedded system or PalmOS (which has no traditional filesystem) - Note that different kinds of hooks should (ideally, and within reason) properly combine, as follows: if I write a hook to recognize .spam files and automatically translate them into .py files, and you write a hook to support a new archive format, then if both hooks are installed together, it should be possible to find a .spam file in an archive and do the right thing, without any extra action. Right? - It should be possible to write hooks in C/C++ as well as Python - Applications embedding Python may supply their own implementations, default search path, etc., but don't have to if they want to piggyback on an existing Python installation (even though the latter is fraught with risk, it's cheaper and easier to understand). Implementation: --------------- - There must clearly be some code in C that can import certain essential modules (to solve the chicken-or-egg problem), but I don't mind if the majority of the implementation is written in Python. Using Python makes it easy to subclass. - In order to support importing from zip/jar files using compression, we'd at least need the zlib extension module and hence libz itself, which may not be available everywhere. - I suppose that the bootstrap is solved using a mechanism very similar to what freeze currently used (other solutions seem to be platform dependent). - I also want to still support importing *everything* from the filesystem, if only for development. (It's hard enough to deal with the fact that exceptions.py is needed during Py_Initialize(); I want to be able to hack on the import code written in Python without having to rebuild the executable all the time. Let's first complete the requirements gathering. Are these requirements reasonable? Will they make an implementation too complex? Am I missing anything? Finally, to what extent does this impact the desire for dealing differently with the Python bytecode compiler (e.g. supporting optimizers written in Python)? And does it affect the desire to implement the read-eval-print loop (the >>> prompt) in Python? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Let's first complete the requirements gathering.
Yes.
I think you can get 90% of where you want to be with something much simpler. And the simpler implementation will be useful in the 100% solution, so it is not wasted time. How about if we just design a Python archive file format; provide code in the core (in Python or C) to import from it; provide a Python program to create archive files; and provide a Standard Directory to put archives in so they can be found quickly. For extensibility and control, we add functions to the imp module. Detailed comments follow:
Easily met by keeping the current C code.
These tools go well beyond just an archive file format, but hopefully a file format will help. Greg and Gordon should be able to control the format so it meets their needs. We need a standard format.
I don't like sys.path at all. It is currently part of the problem. I suggest that archive files MUST be put into a known directory. On Windows this is the directory of the executable, sys.executable. On Unix this $PREFIX plus version, namely "%s/lib/python%s/" % (sys.prefix, sys.version[0:3]). Other platforms can have different rules. We should also have the ability to append archive files to the executable or a shared library assuming the OS allows this (Windows and Linux do allow it). This is the first location searched, nails the archive to the interpreter, insulates us from an erroneous sys.path, and enables single-file Python programs.
We don't need compression. The whole ./Lib is 1.2 Meg, and if we compress it to zero we save a Meg. Irrelevant. Installers provide compression anyway so when Python programs are shipped, they will be compressed then. Problems are that Python does not ship with compression, we will have to add it, we will have to support it and its current method of compression forever, and it adds complexity.
Sigh, this proposal does not provide for this. It seems like a job for imputil. But if the file format and import code is available from the imp module, it can be used as part of the solution.
- support for a new compression scheme to the zip importer
I guess compression should be easy to add if Python ships with a compression module.
- a cache for file locations in directories/archives, to improve startup time
If the Python library is available as an archive, I think startup will be greatly improved anyway.
Yes.
That's a good reason to omit compression. At least for now.
Yes, except that we need to be careful to preserve the freeze feature for users. We don't want to take it over.
Yes, we need a function in imp to turn archives off: import imp imp.archiveEnable(0)
I don't think it impacts these at all. Jim Ahlstrom

Agreed, but I'm not sure that it addresses the problems that started this thread. I can't really tell, since the message starting the thread just requested imputil, without saying which parts of it were needed. A followup claimed that imputil was a fine prototype but too slow for real work. I inferred that flexibility was requested. But maybe that was projection since that was on my own list. (I'm happy with the performance and find manipulating zip or jar files clumsy, so I'm not too concerned about all the nice things you can *do* with that flexibility. :-)
I think the standard format should be a subclass of zip or jar (which is itself a subclass of zip). We have already written (at CNRI, as yet unreleased) the necessary Python tools to manipulate zip archives; moreover 3rd party tools are abundantly available, both on Unix and on Windows (as well as in Java). Zip files also lend themselves to self-extracting archives and similar things, because the file index is at the end, so I think that Greg & Gordon should be happy.
I don't like sys.path at all. It is currently part of the problem.
Eh? That's the first thing I hear something bad about it. Maybe that's because you live on Windows -- on Unix, search paths are ubiquitous.
I suggest that archive files MUST be put into a known directory.
Why? Maybe this works on Windows; on Unix this is asking for trouble because it prevents users from augmenting the installation provided by the sysadmin. Even on newer Windows versions, users without admin perms may not be allowed to add files to that privileged directory.
OK for the executable. I'm not sure what the point is of appending an archive to the shared library? Anyway, does it matter (on Windows) if you add it to python16.dll or to python.exe?
OK, OK. I think most zip tools have a way to turn off the compression. (Anyway, it's a matter of more I/O time vs. more CPU time; hardare for both is getting better faster than we can tweak the code :-)
Well, the question is really if we want flexibility or archive files. I care more about the flexibility. If we get a clear vote for archive files, I see no problem with implementing that first.
If the Python library is available as an archive, I think startup will be greatly improved anyway.
Really? I know about all the system calls it makes, but I don't really see much of a delay -- I have a prompt in well under 0.1 second. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Think about multiple packages in multiple zip files. The zip files store file directories. That means we would need a sys.zippath to search the zip files. I don't want another PYTHONPATH phenomenon. Greg Stein and I once discussed this (and Gordon I think). They argued that the directories should be flattened. That is, think of all directories which can be reached on PYTHONPATH. Throw away all initial paths. The resultant archive has *.pyc at the top level, as well as package directories only. The search path is "." in every archive file. No directory information is stored, only module names, some with dots.
On windows, just print sys.path. It is junk. A commercial distribution has to "just work", and it fails if a second installation (by someone else) changes PYTHONPATH to suit their app. I am trying to get to "just works", no excuses, no complications.
It works on Windows because programs install themselves in their own subdirectories, and can put files there instead of /windows/system32. This holds true for Windows 2000 also. A Unix-style installation to /windows/system32 would (may?) require "administrator" privilege. On Unix you are right. I didn't think of that because I am the Unix sysadmin here, so I can put things where I want. The Windows solution doesn't fit with Unix, because executables go in a ./bin directory and putting library files there is a no-no. Hmmmm... This needs more thought. Anyone else have ideas??
The point of using python16.dll is to append the Python library to it, and append to python.exe (or use files) for everything else. That way, the 1.6 interpreter is linked to the 1.6 Lib, upgrading to 1.7 means replacing only one file, and there is no wasted storage in multiple Lib's. I am thinking of multiple Python programs in different directories. But maybe you are right. On Windows, if python.exe can be put in /windows/system32 then it really doesn't matter.
Well, if Python now has its own compression that is built in and comes with it, then that is different. Maybe compression is OK.
I don't like flexibility, I like standardization and simplicity. Flexibility just encourages users to do the wrong thing. Everyone vote please. I don't have a solid feeling about what people want, only what they don't like.
So do I. I guess I was just echoing someone else's complaint. JimA

[Guido]
No problem (I created my own formats for relatively minor reasons). [JimA]
What if sys.path looked like: [DirImporter('.'), ZlibImporter('c:/python/stdlib.pyz'), ...]
While I do flat archives (no dots, but that's a different story), there's no reason the archive couldn't be structured. Flat archives are definitely simpler. [JimA]
Py_Initialize (); PyRun_SimpleString ("import sys; del sys.path[1:]"); Yeah, there's a hole there. Fixable if you could do a little pre- Py_Initialize twiddling.
I suggest that archive files MUST be put into a known directory.
No way. Hard code a directory? Overwrite someone else's Python "standalone"? Write to a C: partition that is deliberately sized to hold nothing but Windows? Make network installations impossible?
There's nothing Unix-style about installing to /Windows/system32. 'Course *they* have symbolic links that actually work...
The official Windows solution is stuff in registry about app paths and such. Putting the dlls in the exe's directory is a workaround which works and is more managable than the official solution.
That's a handy trick on Windows, but it's got nothing to do with Python.
I've noticed that the people who think there should only be one way to do things never agree on what it is.
Everyone vote please. I don't have a solid feeling about what people want, only what they don't like.
Flexibility. You can put Christian's favorite Einstein quote here too.
Install some stuff. Deinstall some of it. Repeat (mixing up the order) until your registry and hard drive are shattered into tiny little fragments. It doesn't take long (there's lots of stuff a defragmenter can't touch once it's there). - Gordon

Gordon McMillan wrote:
Well, that changes the current meaning of sys.path.
Ooops. I didn't mean a known directory you couldn't change. But I did mean a directory you shouldn't change. But you are right. The directory should be configurable. But I would still like to see a highly encouraged directory. I don't yet have a good design for this. Anyone have ideas on an official way to find library files? I think a Python library file is a Good Thing, but it is not useful if the archive can't be found. I am thinking of a busy SysAdmin with someone nagging him/her to install Python. SysAdmin doesn't want another headache. What if Python becomes popular and users want it on Unix and PC's? More work! There should be a standard way to do this that just works and is dumb-stupid-simple. This is a Python promotion issue. Yes everyone here can make sys.path work, but that is not the point.
I agree completely.
It also works on Linux. I don't know about other systems.
Flexibility. You can put Christian's favorite Einstein quote here too.
I hope we can still have ease of use with all this flexibility. As I said, we need to promote Python. Jim Ahlstrom

Guido van Rossum wrote:
import d # from directory a/b/c/
import d # from directory a/b/c/
Since you were asking: I would like functionality equivalent to my latest import patch for a slightly different lookup scheme for module import inside packages to become a core feature. If it becomes a core feature I promise to never again start threads about relative imports :-) Here's the summary again: """ [The patch] changes the default import mechanism to work like this: try a.b.c.d try a.b.d try a.d try d fail instead of just doing the current two-level lookup: try a.b.c.d try d fail As a result, relative imports referring to higher level packages work out of the box without any ugly underscores in the import name. Plus the whole scheme is pretty simple to explain and straightforward. """ You can find the patch attached to the message "Walking up the package hierarchy" in the python-dev mailing list archive. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 42 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Thu, 18 Nov 1999, Guido van Rossum wrote:
It what ways? It sounds like you've applied some thought. Do you have any concrete ideas yet, or "just a feeling" :-) I'm working through some changes from JimA right now, and would welcome other suggestions. I think there may be some outstanding stuff from MAL, but I'm not sure (Marc?)
... So here's a challenge: redesign the import API from scratch.
I would suggest starting with imputil and altering as necessary. I'll use that viewpoint below.
Which APIs are you referring to? The "imp" module? The C functions? The __import__ and reload builtins? I'm guessing some of imp, the two builtins, and only one or two C functions.
- support for rexec functionality
No problem. I can think of a number of ways to do this.
- support for freeze functionality
No problem. A function in "imp" must be exposed to Python to support this within the imputil framework.
- load .py/.pyc/.pyo files and shared libraries from files
No problem. Again, a function is needed for platform-specific loading of shared libraries.
- support for packages
No problem. Demo's in current imputil.
- sys.path and sys.modules should still exist; sys.path might have a slightly different meaning
I would suggest that both retain their *exact* meaning. We introduce sys.importers -- a list of importers to check, in sequence. The first importer on that list uses sys.path to look for and load modules. The second importer loads builtins and frozen code (i.e. modules not on sys.path). Users can insert/append new importers or alter sys.path as before. sys.modules continues to record name:module mappings.
- $PYTHONPATH and $PYTHONHOME should still be supported
No problem.
Easy enough. The standard importer can select the appropriate platform-specific module/function to perform the load. i.e. these can move to Modules/ and be split into a module-per-platform.
I don't know the specific requirements/functionality that would be required here (does Greg? :-), but I can't imagine any problem with this.
Um. *No* problem. :-)
While this could easily be done, I might argue against it. Old apps/modules that process sys.path might get confused. If compatibility is not an issue, then "No problem." An alternative would be an Importer instance added to sys.importers that is configured for a specific archive (in other words, don't add the zip file to sys.path, add ZipImporter(file) to sys.importers). Another alternative is an Importer that looks at a "sys.py_archives" list. Or an Importer that has a py_archives instance attribute.
No problem. This will slow things down, as a stat() for *.zip and/or *.jar must be done, in addition to *.py, *.pyc, and *.pyo.
I presume we would support whatever zlib gives us, and no more.
Presuming ZipImporter is a class (derived from Importer), then this ability is wholly dependent upon the author of ZipImporter providing the hook. The Importer class is already designed for subclassing (and its interface is very narrow, which means delegation is also *very* easy; see imputil.FuncImporter).
- support for a new archive format, e.g. tar
A cakewalk. Gordon, JimA, and myself each have archive formats. :-)
No problem at all.
- a hook that imports from compressed .py or .pyc/.pyo files
No problem at all.
- a hook to auto-generate .py files from other filename extensions (as currently implemented by ILU)
No problem at all.
- a cache for file locations in directories/archives, to improve startup time
No problem at all.
- a completely different source of imported modules, e.g. for an embedded system or PalmOS (which has no traditional filesystem)
No problem at all. In each of the above cases, the Importer.get_code() method just needs to grab the byte codes from the XYZ data source. That data source can be cmopressed, across a network, on-the-fly generated, or whatever. Each importer can certainly create a cache based on its concept of "location". In some cases, that would be a mapping from module name to filesystem path, or to a URL, or to a compiled-in, frozen module.
Ack. Very, very difficult. The imputil scheme combines the concept of locating/loading into one step. There is only one "hook" in the imputil system. Its semantic is "map this name to a code/module object and return it; if you don't have it, then return None." Your compositing example is based on the capabilities of the find-then-load paradigm of the existing "ihooks.py". One module finds something (foo.spam) and the other module loads it (by generating a .py). All is not lost, however. I can easily envision the get_code() hook as allowing any kind of return type. If it isn't a code or module object, then another hook is called to transform it. [ actually, I'd design it similarly: a *series* of hooks would be called until somebody transforms the foo.spam into a code/module object. ] The compositing would be limited ony by the (Python-based) Importer classes. For example, my ZipImporter might expect to zip up .pyc files *only*. Obviously, you would want to alter this to support zipping any file, then use the suffic to determine what to do at unzip time.
- It should be possible to write hooks in C/C++ as well as Python
Use FuncImporter to delegate to an extension module. This is one of the benefits of imputil's single/narrow interface.
An application would have full control over the contents of sys.importers. For a restricted execution app, it might install an Importer that loads files from *one* directory only which is configured from a specific Win32 Registry entry. That importer could also refuse to load shared modules. The BuiltinImporter would still be present (although the app would certainly omit all but the necessary builtins from the build). Frozen modules could be excluded.
I posited once before that the cost of import is mostly I/O rather than CPU, so using Python should not be an issue. MAL demonstrated that a good design for the Importer classes is also required. Based on this, I'm a *strong* advocate of moving as much as possible into Python (to get Python's ease-of-coding with little relative cost). The (core) C code should be able to search a path for a module and import it. It does not require dynamic loading or packages. This will be used to import exceptions.py, then imputil.py, then site.py. The platform-specific module that perform dynamic-loading must be a statically linked module (in Modules/ ... it doesn't have to be in the Python/ directory). site.py can complete the bootstrap by setting up sys.importers with the appropriate Importer instances (this is where an application can define its own policy). sys.path was initially set by the import.c bootstrap code (from the compiled-in path and environment variables). Note that imputil.py would not install any hooks when it is loaded. That is up to site.py. This implies the core C code will import a total of three modules using its builtin system. After that, the imputil mechanism would be importing everything (site.py would .install() an Importer which then takes over the __import__ hook). Further note that the "import" Python statement could be simplified to use only the hook. However, this would require the core importer to inject some module names into the imputil module's namespace (since it couldn't use an import statement until a hook was installed). While this simplification is "neat", it complicates the run-time system (the import statement is broken until a hook is installed). Therefore, the core C code must also support importing builtins. "sys" and "imp" are needed by imputil to bootstrap. The core importer should not need to deal with dynamic-load modules. To support frozen apps, the core importer would need to support loading the three modules as frozen modules. The builtin/frozen importing would be exposed thru "imp" for use by imputil for future imports. imputil would load and use the (builtin) platform-specific module to do dynamic-load imports.
Yes. I don't see this as a requirement, though. We wouldn't start to use these by default, would we? Or insist on zlib being present? I see this as more along the lines of "we have provided a standardized Importer to do this, *provided* you have zlib support."
The bootstrap that I outlined above could be done in C code. The import code would be stripped down dramatically because you'll drop package support and dynamic loading. Alternatively, you could probably do the path-scanning in Python and freeze that into the interpreter. Personally, I don't like this idea as it would not buy you much at all (it would still need to return to C for accessing a number of scanning functions and module importing funcs).
My outline above does not freeze anything. Everything resides in the filesystem. The C code merely needs a path-scanning loop and functions to import .py*, builtin, and frozen types of modules. If somebody nukes their imputil.py or site.py, then they return to Python 1.4 behavior where the core interpreter uses a path for importing (i.e. no packages). They lose dynamically-loaded module support.
I'm not a fan of the compositing due to it requiring a change to semantics that I believe are very useful and very clean. However, I outlined a possible, clean solution to do that (a secondary set of hooks for transforming get_code() return values). The requirements are otherwise reasonable to me, as I see that they can all be readily solved (i.e. they aren't burdensome). While this email may be long, I do not believe the resulting system would be complex. From the user-visible side of things, nothing would be changed. sys.path is still present and operates as before. They *do* have new functionality they can grow into, though (sys.importers). The underlying C code is simplified, and the platform-specific dynamic-load stuff can be distributed to distinct modules, as needed (e.g. BeOS/dynloadmodule.c and PC/dynloadmodule.c).
If the three startup files require byte-compilation, then you could have some issues (i.e. the byte-compiler must be present). Once you hit site.py, you have a "full" environment and can easily detect and import a read-eval-print loop module (i.e. why return to Python? just start things up right there). site.py can also install new optimizers as desired, a new Python-based parser or compiler, or whatever... If Python is built without a parser or compiler (I hope that's an option!), then the three startup modules would simply be frozen into the executable. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
We should retain the current order. I think is is: first builtin, next frozen, next sys.path. I really think frozen modules should be loaded in preference to sys.path. After all, they are compiled in.
Users can insert/append new importers or alter sys.path as before.
I agree with Greg that sys.path should remain as it is. A list of importers can add the extra functionality. Users will probably want to adjust the order of the list.
Yes, I agree. And I think the main() should be written in Python. Lots of Python should be written in Python.
But these can be frozen in (as you mention below). I dislike depending on sys.path to load essential modules. If they are not frozen in, then we need a command line argument to specify their path, with sys.path used otherwise. Jim Ahlstrom

Here's the promised response to Greg's response to my wishlist.
I actually think that the way the PVM (Python VM) calls the importer ought to be changed. Assigning to __builtin__.__import__ is a crock. The API for __import__ is a crock.
I'm guessing some of imp, the two builtins, and only one or two C functions.
All of those.
- support for rexec functionality
No problem. I can think of a number of ways to do this.
Agreed, I think that imputil can do this.
Agreed. It currently exports init_frozen() which is about the right functionality.
Is it useful to expose the platform differences? The current imp.load_dynamic() should suffice.
This is looking like the redesign I was looking for. (Note that imputil's current chaining is not good since it's impossible to remove or reorder importers, which I think is a required feature; an explicit list would solve this.) Actually, the order is the other way around, but by now you should know that. It makes sense to have separate ones for builtin and frozen modules -- these have nothing in common. There's another issue, which isn't directly addressed by imputil, although with clever use of inheritance it might be doable. I'd like more support for this however. Quite orthogonally to the issue of having separate importers, I might want to recognize new extensions. Take the example of the ILU folks. They want to be able to drop a file "foo.isl" in any directory on sys.path and have the ILU stubber automatically run if you try to import foo (the client stubs) or foo__skel (the server skeleton). This doesn't fit in the sys.importers strategy, because they want to be able to drop their .isl files in any directory along sys.path. (Or, more likely, they want to have control over where in sys.modules the directory/directories with .isl files are placed.) This requires an ugly modification to the _fs_import() function. (Which should have been a method, by the way, to make overriding it in a subclass of PathImporter easier!) I've been thinking here along the lines of a strategy where the standard importer (the one that walks sys.path) has a set of hooks that define various things it could look for, e.g. .py files, .pyc files, .so or .dll files. This list of hooks could be changed to support looking for .isl files. There's an old, subtle issue that could be solved through this as well: whether or not a .pyc file without a .py file should be accepted or not. Long ago (in Python 0.9.8) a .pyc file alone would never be loaded. This was changed at the request of a small but vocal minority of Python developers who wanted to distribute .pyc files without .py files. It has occasionally caused frustration because sometimes developers move .py files around but forget to remove the .pyc files, and then the .pyc file is silently picked up if it occurs on sys.path earlier than where the .py was moved to. Having a set of hooks for various extensions would make it possible to have a default where lone .pyc files are ignored, but where one can insert a .pyc importer in the list of hooks that does the right thing here. (Of course, it may be possible that this whole feature of lone .pyc files should be replaced since the same need is easily taken care of by zip importers. I also want to support (Jim A notwithstanding :-) a feature whereby different things besides directories can live on sys.path, as long as they are strings -- these could be added from the PYTHONPATH env variable. Every piece of code that I've ever seen that uses sys.path doesn't care if a directory named in sys.path doesn't exist -- it may try to stat various files in it, which also don't exist, and as far as it is concerned that is just an indication that the requested module doesn't live there. Again, we would have to dissect imputil to support various hooks that deal with different kind of entities in sys.path. The default hook list would consist of a single item that interprets the name as a directory name; other hooks could support zip files or URLs. Jack's "magic cookies" could also be supported nicely through such a mechanism.
Users can insert/append new importers or alter sys.path as before.
sys.modules continues to record name:module mappings.
Yes. Note that the interpretation of __file__ could be problematic. To what value do you set __file__ for a module loaded from a zip archive?
Again: what's the advantage of exposing the platform specificity?
Probably more support is required from the other end: once it's common for modules to be imported from zip files, the distutil code needs to support the creation and installation of such zip files. Also, there is a need for the install phase of distutil to communicate the location of the zip file to the Python installation.
:-)
Note that this is what I mention above for distutil support.
While this could easily be done, I might argue against it. Old apps/modules that process sys.path might get confused.
Above I argued that this shouldn't be a problem.
This would be harder for distutil: where does Python get the initial list of importers?
Another alternative is an Importer that looks at a "sys.py_archives" list. Or an Importer that has a py_archives instance attribute.
OK, but again distutil needs to be able to add to this list when it installs a package. (Note that package deinstallation should also be supported!) (Of course I don't require this to affect Python processes that are already running; but it should be possible to easily change the default search path for all newly started instances of a given Python installation.)
Fine, this is where the caching comes in handy.
That's it. :-)
Agreed. But since we're likely going to provide this as a standandard feature, we must ensure that it provides this hook.
But maybe it's *too* narrow; some of the hooks I suggest above seem to require extra interfaces -- at least in some of the subclasses of the Importer base class. Note: I looked at the doc string for get_code() and I don't understand what the difference is between the modname and fqname arguments. If I write "import foo.bar", what are modname and fqname? Why are both present? Also, while you claim that the API is narrow, the multiple return values (also the different types for the second item) make it complicated.
See above -- I think this should be more integrated with sys.path than you are thinking of. The more I think about it, the more I see that the problem is that for you, the importer that uses sys.path is a final subclass of Importer (i.e. it is itself not further subclassed). Several of the hooks I want seem to require additional hooks in the PathImporter rather than new importers.
See above for sys.path integration remark.
Actually, I take most of this back. Importers that deal with new extension types often have to go through a file system to transform their data to .py files, and this is just too complicated. However it would be still nice if there was code sharing between the code that looks for .py and .pyc files in a zip archive and the code that does the same in a filesystem. Hm, maybe even that shouldn't be necessary, the zip file probably should contain only .pyc files... (Unrelated remark: I should really try to release the set of modules we've written here at CNRI to deal with zip files. Unfortunately zip files are hairy and so is our code.)
That's fine. I actually don't recall where the find-then-load API came from, I think it may be an artefact of the original implementation strategy. It is currently used as follows: we try to see if there's a .pyc and then we try to see if there's a .py; if both exist we compare the timestamps etc. to choose which one. But that's still a red herring.
I still don't understand why ihooks.py had to be so complicated. I guess I just had much less of an understanding of the issues. (It was also partly a compromise with an alternative design by Ken Manheimer, who basically forced me to support packages, originally through ni.py.)
OK. This could be a feature of a subclass of Importer.
Maybe not so great, since it sounds like the C code can't benefit from any of the infrastructure that imputil offers. I'm not sure about this one though.
This is one of the benefits of imputil's single/narrow interface.
Plus its vague specs? :-)
Actually there's little reason to exclude frozen modules or any .py/.pyc modules -- by definition, bytecode can't be dangerous. It's the builtins and extensions that need to be censored. We currently do this by subclassing ihooks, where we mask the test for builtins with a comparison to a predefined list of names.
Agreed. However, how do you explain the slowdown (from 9 to 13 seconds I recall) though? Are you a lousy coder? :-)
It does, however, need to import builtin modules. imputil currently imports imp, sys, strop and __builtin__, struct and marshal; note that struct can easily be a dynamic loadable module, and so could strop in theory. (Note that strop will be unnecessary in 1.6 if you use string methods.) I don't think that this chicken-or-egg problem is particularly problematic though.
See earlier comments.
I thing that algorithm (currently in getpath.c / getpathp.c) might also be moved to Python code -- imported frozen. Sadly, rebuilding with a new version of a frozen module might be more complicated than rebuilding with a new version of a C module, but writing and maintaining this code in Python would be *sooooooo* much easier that I think it's worth it.
(Three not counting the builtin modules.)
Same chicken-or-egg. We can be pragmatic. For a developer, I'd like a bit of robustness (all this makes it rather hard to debug a broken imputil, and that's a fair amount of code!).
Same question. Since that all has to be coded in C anyway, why not?
To support frozen apps, the core importer would need to support loading the three modules as frozen modules.
I'd like to see a description of how someone like Jim A would build a single-file application using the new mechanism. This could completely replace freeze. (Freeze currently requires a C compiler; that's bad.)
Sure.
Agreed. Zlib support is easy to get, but there are probably platforms where it's not. (E.g. maybe the Mac? I suppose that on the Mac, there would be some importer classes to import from a resource fork.)
Not the dynamic loading. But yes the package support.
Good. Though I think there's also a need for freezing everything. And when we go the route of the zip archive, the zip archive handling code needs to be somewhere -- frozen seems to be a reasonable choice.
But if the path guessing is also done by site.py (as I propose) the path will probably be wrong. A warning should be printed.
As you may see from my responses, I'm a big fan of having several different sets of hooks. I do withdraw the composition requirement though.
Another chicken-or-egg. No biggie.
You mean "why return to C?" I agree. It would be cool if somehow IDLE and Pythonwin would also be bootstrapped using the same mechanisms. (This would also solve the question "which interactive environment am I using?" that some modules and apps want to see answered because they need to do things differently when run under IDLE,for example.)
More power to hooks! --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
Not the case -- I know you've looked at some of my code in the KOE that ensures only real directories are on the path, and each is only there once (pathhack.py). Given that sys.path is often too long and includes duplicate entries in a large system (often one entry with and one without a trailing / for a given directory), it useful to be able to distinguish between things that should be interpretable as paths and things that aren't. It should not be hard to declare that "cookies" or whatever have some special form, like "<cookie>".
It doesn't help that that code just plain stinks. I maintain that no one here understands the whole of it. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

"FLD" == Fred L Drake, <fdrake@acm.org> writes:
FLD> It doesn't help that that code just plain stinks. I maintain FLD> that no one here understands the whole of it. I'm all for improving the code and getting it out. The real problem is that interfaces have been glommed on for every new use of a Zip file. (You want to read one off a socket and extract files before you've got the whole thing? No problem! Add a new class.) We need to figure out the common patterns for using the archives and write a new set of interfaces to support that. Jeremy

[Jeremy, on our Zip code]
If we gave you the code we currently have, would someone else in this forum be willing to redesign it? Eventually it would become part of the Python distribution. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote: [...]
Note that the interpretation of __file__ could be problematic. To what value do you set __file__ for a module loaded from a zip archive?
Makefiles use "archive(entry)" (this also supports nesting if needed). [...]
This may be off-topic, but has anyone considered what it would take to load shared libs out of an archive? One way is to extract on-the-fly to a temporary area. A refinement is to leave extracted files there as cache, and perhaps even to extract to a file with a name derived from its MD5 digest (this way multiple users and even Python installations can share the cache). Would it be useful to define a "standard" area? -- Jean-Claude

Jean-Claude Wippler <jcw@equi4.com> wrote:
This may be off-topic, but has anyone considered what it would take to load shared libs out of an archive?
well, we do that in a number of applications. (lazy installers are really cool... if you've installed works, you've seen some weird stuff -- for example, when the application starts the first time, it's loading everything from inside the installer. the rest of the installation is done from within the application itself, using archives in the installation executable) I think things like this are better left for the application designers, though... </F>

Jean-Claude Wippler wrote:
I discovered the hard way this entry is not optional. I just used the archive file name for __file__.
IMHO putting shared libs in an archive is a bad idea because the OS can not use them there. They must be extracted as you say. But then storage is wasted by using space in the archive and the external file. Deleting them after use wastes time. Better to leave them out of the archive and provide for them in the installer. IMHO the archive is a basic simple feature, and people make installers on top of that. Archives shouldn't try to do it all. JimA

James C. Ahlstrom <jim@interet.com> wrote:
have you tried it? if not, why do you think you should be allowed to forbid others from doing it? in "the inmates are running the asylum", alan cooper points out that the *major* reason people all over the world love web applications are that there are no bloody installers. and here you are advocating that we all should be forced to use installers, when python makes it trivial to write self-installing apps. double-argh! (on the other hand, why do I complain? all pythonworks customers is going to be able to do all this anyway...). <rant size="major"> frankly, this "design by committee" (or is it "design by people who've never even been close to implementing something because they thought it was too hard, and thus think they're qualified to argue against those of us who didn't even realize that it was a hard problem"?) trend I've been seeing in all kinds of python forums makes me sooooo sad. the more of this I see (dist- utils-sig, doc-sig, here, c.l.python), the sadder I get, and the more I sympathise with John Skaller who's defining his own python-like universe... if someone needs me, I'll be down in the pub having a beer with the mad scientist, the shiny eff-bot, and mr. nitpicker. if we're not there, you'll find us in the lab, working on new string matching facilities for 1.6, SOAP [1], tkinter replacements for the masses, and whatever else we can come up with... see you! </rant> 1) http://www.newsalert.com/bin/story?StoryId=Coenz0bWbu0znmdKXqq

Fredrik Lundh wrote:
Huh ? Two points: 1. How can you be sure that people haven't tried implementing their ideas and for various reasons have come to some conclusion about those ideas ? 2. Would you seriously disqualify people from joining a discussion by the simple arguement that they have not implemented anything yet ? Just take the Unicode discussion as example: it was very lively and resulted in a decent proposal which is now subject to further investigation by the implementors ;-) Many people have joined in even though they did not and/or will not implement anything. Still, their arguments were very useful to show up weaknesses in the proposal. Now, let's rather have a beer in the pub around the corner than go on ranting about :-). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 27 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Fredrik Lundh wrote:
[snip]
<rant size="major">
frankly, this "design by committee"...
[snip]
... see you!
</rant>
C'mon /F, it's a battle of ideas and that's the way it works before filtering the good ones from the bad ones, then focusing on the appropriate implementation. I'm in sync with the discussion, although I haven't posted my partial notes on it due to lack of time. But let me say that overall, this discussion is a good thing and the more opinions we get, the better. BTW, you just _can't_ leave like this and start playing solitaire at the bar, first, because we need beer too and it's unlikely that you'll find a bar we don't know already, and second, because it was you who revived this discussion with 1 word, repeated 3 times:
Thus, with no visible argumentation (so don't shoot on others when they argue instead of you), and with this one word, you pushed Guido to the extreme of suggesting a complete redesign of the import machinery from scratch, based on a "Grand Architecture" :-). Right? -- Right! This is a fact and a fairly amount of the credits go entirely to you! Since then, however, I haven't really seen your arguments, and I believe that nobody here got exactly your point. I, for one, may well argue against imputil as being just another brick on top of the grand mess. But because I haven't made the time to write properly my notes, I don't dare to express a partial opinion, not blame those who argue good or bad in the meantime, when I'm silent. So, why are you showing us your back when you have clearly something to say, but like me, you haven't made the time to say it? Please don't waste my time with emotional rants ;-). Everybody here tries to contribute according to its knowledge, experience and availability. Later, -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Fredrik Lundh wrote:
James C. Ahlstrom <jim@interet.com> wrote:
IMHO putting shared libs in an archive is a bad idea because the OS
Dear Fredrik, I thought the point of Python-Dev was to propose designs and get feedback, right? Well, I got feedback :-). OK, I agree to alter my archive format so it provides the ability to store shared libs and not just *.pyd. I will add the string length and if needed a flag indicating the name is a shared lib. Now the details:
have you tried it? if not, why do you think you should be allowed to forbid others from doing it?
Yes I have tried it, and I am currently on my fourth version of an archive format which is based on formats by Greg Stein and Gordon McMillan. I hope it meets with the favor of the Grand Inquisition, and becomes the standard format. But maybe it won't. Oh well.
I am not forcing anyone to do anything, only proposing that shared libs are best handled directly by imputil and not the class within imputil which handles archive files. It is just a geeky design issue, nothing more. JimA

[Guido] big snip
I just left it alone (ie, as it was when I picked up the .pyc). Turns out OK, because then when the end user files a bug report, the developer can track it down.
As I recall: import foo.bar -> get_code(None, 'foo', 'foo') # returns foo -> get_code(<self>, 'bar', 'foo.bar')
Why are both present?
I think so the importer can choose between being tree structured or flat.
I have something working for Linux now. I froze exceptions.py. I hacked getpath.c so prefix = exec_prefix = executable's directory and the starting path is [prefix]. Although I did it differently, you could regard imputil.py and archive.py as frozen, too. (On WIndows it's somewhat different, because the result uses the stock python15.dll.) This somewhat oversimplifies; and I haven't really thought out all the ways people might try to use sym links. I'm inclined to think the starting path should contain both the executable's real directory and the sym link's directory.
.... I do withdraw the composition requirement though.
Hooray! - Gordon

On Thu, 2 Dec 1999, Guido van Rossum wrote:
Something like sys.set_import_hook() ? The other alternative that I see would be to have the C code scan sys.importers, assuming each are callable objects, and call them with the appropriate params (e.g. module name). Of course, to move this scanning into Python would require something like sys.set_import_hook() unless Python looks for a hard-coded module and entrypoint.
We can provide Python code to provide compatibility for "imp" and the two hooks. Nothing we can do to the C code, though. I'm not sure what the import API looks like from C, and whether they could all stay. A brief glance looks like most could stay. [ removing any would change Python's API version, which might be "okay" ]
This comes up several times throughout this message, and in some off-list mail Guido and I have exchanged. Namely, "should dynamic loading be part of the core, or performed via a module?" I would rather see it become a module, rather than inside the core (despite the fact that the module would have to be compiled into the interpreter). I believe this provides more flexibility for people looking to replace/augment/update/fix dynamic loading on various architectures. Rather than changing the core, a person can just drop in another module. The isolation between the core and modules is nicer, aesthetically, to me. The modules would also be exposing Just Another Importer Function, rather than a specialized API in the builtin imp module. Also note that it is easier to keep a module *out* of a Python-based application, than it is to yank functions out of the core of Python. Frozen apps, embedded apps, etc could easily leave out dynamic loading. Are there strict advantages? Not any that I can think of right now (beyond a bit of ease-of-use mentioned above). It just feels better to me.
The chaining is an aspect of the current, singular import hook that Python uses. In the past, I've suggested the installation of a "manager" that maintains a list. sys.importers is similar in practice. Note that this Manager would be present with the sys.set_import_hook() scheme, while the Manager is implied if the core scans sys.importers.
Yes, JimA pointed this out. The latest imputil has corrected this. I combined the builtin and frozen Importers because they were just so similar. I didn't want to iterate over two Importers when a single one sufficed quite well. *shrug* Could go either way, really.
Correct: while imputil doesn't address this, the standard/default Importer classes *definitely* can.
I yanked that code out of the DirectoryImporter so that the PathImporter could use it. I could see a reorg that creates a FileSystemImporter that defines the method, and the other two just subclass from that.
Agreed. It should be easy to have a mapping of extension to handler. One issue: should there be an ordering to the extensions? Exercise for the reader to alter the data structures...
I think, "too bad for them." :-) Having just a .pyc is a very nice feature. But how can you tell whether it was meant to be a plain .pyc or a mis-ordered one? To truly resolve that, you would need to scan the whole path, looking for a .py. However, maybe somebody put the .pyc there on purpose, to override the .py! --- begin slightly-off-topic --- Here is a neat little Bash script that allows you to use a .pyc as a CGI (to avoid parse overhead). Normally, you can't just drop a .pyc into the cgi-bin directory because the OS doesn't know how to execute it. Not a problem, I say... just append your .pyc to the following Bash script and execute! :-) #!/bin/bash exec - 3< $0 ; exec python -c 'import os,marshal ; f = os.fdopen(3, "rb") ; f.readline() ; f.readline() ; f.seek(8, 1) ; _c = marshal.load(f) ; del os, marshal, f ; exec _c' $@ (the script should be two lines; and no... you can't use readlines(2)) The above script will preserve stdin, stdout, and stderr. If the caller also use 3< ... well, that got overridden :-) The script doesn't work on Windows for two reasons, though: 1) Bash, 2) the "rb" mode followed by readline() Detailed info at the bottom of http://www.lyra.org/greg/python/ --- end of off-topic ---
Maybe. I'd still like to see plain .pyc files, but I know I can work around any change you might make here :-) (i.e. whatever you'd like to do... go for it)
I'm not in favor of this, but it is more-than-doable. Again: your discretion...
Specifically, the PathImporter would get "dissected" :-). No problem.
You don't (certainly in a way that is nice/compatible for modules that refer to it). This is why I don't like __file__ and __path__. They just don't make sense in archives or frozen code. Python code that relies on them will create problems when that code is placed into different packaging mechanisms.
See above.
I'm quite confident that something can be designed that would satisfy the needs here. Something akin to .pth files that a zip importer could read.
For most code, no, but as Fred mentioned (and I surmise), there are things out there assuming that sys.path contains strings which specify directories. Sure, we can do this (your discretion), but my feeling is to avoid it.
Default is just the two: BuiltinImporter and PathImporter. Adding ZipImporters (or anything else) at startup is TBD, but shouldn't pose a problem.
IFF caching is enabled for the particular platform and installation.
Correct -- the *subclasses*. I still maintain the imputil design of a single hook (get_code) is Right. I'll make a swipe at PathImporter in the next few weeks to add the capability for new extensions.
Gordon detailed this in another note... Yes, the multiple return values make it a bit more complicated, but I can't think of any reasonable alternatives. A bit more doc should do the trick, I'd guess.
Correct -- I've currently designed/implemented PathImporter as "final". I don't forsee a problem turning it into something that can be hooked at run-time, or subclassed at code-time. A detailing of the features needed would be handy: * allow alternative file suffixes, with functions or subclasses to map the file into a code/module object.
Gordon replies to this... All of the archives that myself, Gordon, and JimA have been using only store .pyc files. I don't see much code sharing between the filesystem and archive import code.
That would be my preference, rather than loading more into the Importer base class itself.
There isn't any infrastructure that needs to be accessed. get_code() is the call-point, and there is no mechanism provided to the callee to call back into the imputil system.
This is one of the benefits of imputil's single/narrow interface.
Plus its vague specs? :-)
Ouch. I thought I was actually doing quite a bit better than normal with that long doc-string on get_code :-(
True. My concern is an invader misusing one "type" of module for another. For example, let's say you've provided a selection of modules each exporting function FOO, and the user can configure which module to use. Can they do damage if some unrelated, frozen module also exports FOO? Minor issue, anyhow. All the functionality is there.
Heh :-) I have not spent *any* time working on optimization. Currently, each Importer in the chain redoes some work of the prior Importer. A bit of restructuring would split the common work out to a Manager, which then calls a method in the Importer (and passes all the computed work). Of course, a bit of profiling wouldn't hurt either. Some of the "imp" interfaces could possibly be refined to better support the BuiltinImporter or the dynamic load features. The question is still valid, though -- at the moment, I can't explain it because I haven't looked into it.
Note: after writing this, I realized there is really no need for the core to do the imputil import. site.py can easily do that.
It does, however, need to import builtin modules. imputil currently
Correct.
I knew about strop, but imputil would be harder to use today if it relied on the string methods. So... I've delayed that change. The struct module is used in a couple teeny cases, dealing with constructing a network-order, 4-byte, binary integer value. It would be easy enough to just do that with a bit of Python code instead.
I don't think that this chicken-or-egg problem is particularly problematic though.
Right. In my ideal world, the core couldn't do a dynamic load, so that would need to be considered within the bootstrap process.
I think we can find a better way to freeze modules and to use them. Especially for the cases where we have specific "core" functions implemented in Python. (e.g. freezing parsers, compilers, and/or the read-eval loop) I don't forsee an issue that the build process becomes more complicated. If we nuke "makesetup" in favor of a Python script, then we could create a stub Python executable which runs the build script which writes the Setup file and the getpath*.c file(s).
Correct, although I'll modify my statement to "two plus the builtins".
True. I threw that out as an alternative, and then presented the counter argument :-)
It simplifies the core's import code to not deal with that stuff at all.
The portable mechanism for freezing will always need a compiler. Platform specific mechanisms (e.g. append to the .EXE, or use the linker to create a new ELF section) can optimize the freeze process in different ways. I don't have a design in my head for the freeze issues -- I've been considering that the mechanism would remain about the same. However, I can easily see that different platforms may want to use different freeze processes... hmm...
Exactly. And importer classes to load from a Win32 resources (modifying a .EXE's resources post-link is cleaner than the append solution)
Sure.
All right. Doesn't Python already print a warning if it can't find site.py?
Yes. However, I've only recognized one so far. Propose more... I'm confident we can update the PathImporter design to accomodate (and retain the underlying imputil paradigm).
I do withdraw the composition requirement though.
:-)
Heh. Yah, that's what I meant :-)
Haven't thought on this. Should be doable, I'd think.
:-) You betcha! I believe my next order of business: * update PathImporter with the file-extension hook * dynload C code reorg, per the other email * create new-model site.py and trash import.c * review freeze mechanisms and process * design mechanism for frozen core functionality (eg. getpath*.c) (coding and building design) * shift core functions to Python, using above design I'll just plow ahead, but also recognize that any/all may change. ie. I'll build examples/finals/prototypes and Guido can pick/choose/reimplement/etc as needed. I'm out next week, but should start on the above items by the end of the month (will probably do another mod_dav release in there somewhere). Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg, Great response. I think we know where we each stand. Please go ahead with a new design. (That's trust, not carte blanche.) Just one thought: the more I think about it, the less I like sys.importers: functionality which is implemented through sys.importers must necessarily be placed either in front of all of sys.path or after it. While this is helpful for "canned" apps that want *everything* to be imported from a fixed archive, I think that for regular Python installations sys.path should remain the point of attack. In particular, installing a new package (e.g. PIL) should affect sys.path, regardless of the way of delivery of the modules (shared libs, .py files, .pyc files, or a zip archive). I'm not too worried about code that inspects sys.path and expects certain invariants; that code is most likely interfering with the import mechanism so should be revisited anyway. On the lone .pyc issue: I'd like to see this disappear when using the filesystem, I see no use for it there if we support .pyc files in zip archives. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Fri, 3 Dec 1999, Guido van Rossum wrote:
Accepted gratefully. Thx.
Okay. I'll design with respect to this model. To be explicit/clear and to be sure I'm hearing you right: sys.path may contain Importer instances. Given the name FOO, the system will step through sys.path looking for the first occurence of FOO (looking in a directory or delegating). FOO may be found with any number of (configurable) file extensions, which are ordered (e.g. ".so" before ".py" before ".isl").
The Benevolent Dictator has spoken. So be it. :-)
No problem. This actually creates a simplification in the system, as I'm seeing it now. I'm also seeing opportunities for a code reorg which may work towards MAL's issues with performance. I hope to have something in two or three weeks. I also hope people can be patient :-), but I certainly wouldn't mind seeing some alternative code! Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
On Fri, 3 Dec 1999, Guido van Rossum wrote:
This is basically a gripe about this design spec. So if the answer turns out to be "we need this functionality so shut up" then just say that and don't flame me. This spec is painful. Suppose sys.path has 10 elements, and there are six file extensions. Then the simple algorithm is slow: for path in sys.path: # Yikes, may not be a string! for ext in file_extensions: name = "%s.%s" % (module_name, ext) full_path = os.path.join(path, name) if os.path.isfile(full_path): # Process file here And sys.path can contain class instances which only makes things slower. You could do a readdir() and cache the results, but maybe that would be slower. A better algorithm might be faster, but a lot more complicated. In the context of archive files, it is also painful. It prevents you from saving a single dictionary of module names. Instead you must have len(sys.path) dictionaries. You could try to save in the archive information about whether (say) a foo.dll was present in the file system, but the list of extensions is extensible. The above problem only exists to support equally-named modules; that is, to support a run-time choice of whether to load foo.pyc, foo.dll, foo.isl, etc. I claim (without having written it) that the fastest algorithm to solve the unique-name case is much faster than the fastest algorithm to solve the choose-among-equal-names case. Do we really need to support the equal-name case [Jim runs for cover...]? If so, how about inventing a new way to support it. Maybe if equal names exist, these must be pre-loaded from a known location? JimA

On Sat, 4 Dec 1999, James C. Ahlstrom wrote:
This is the algorithm that Python uses today, and my standard Importers follow.
And sys.path can contain class instances which only makes things slower.
IMO, we don't know this, or whether it is significant.
Who knows. BUT: the import process is now in Python -- it makes it *much* easier to run these experiments. We could not really do this when the import process is "hard-coded" in C code.
I am not following this. What/where is the "single dictionary of module names" ? Are you referring to a cache? Or is this about building an archive? An archive would look just like we have now: map a name to a module. It would not need multiple dictionaries.
I don't understand what the problem is. I don't see one. We are still mapping a name to a module. sys.path defines a precedence. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
Agreed.
Agreed.
Agreed.
The "single dictionary of names" is in the single archive importer instance and has nothing to do with creating the archive. It is currently programmed this way. Suppose the user specifies by name 12 archive files to be searched. That is, the user hacks site.py to add archive names to the importer. The "single dictionary" means that the archive importer takes the 12 dictionaries in the 12 files and merges them together into one dictionary in order to speed up the search for a name. The good news is you can always just call the archive importer to get a module. The bad news is you can't do that for each entry on sys.path because there is no necessary identity between archive files and sys.path. The user specified the archive files by name, and they may or may not be on sys.path, and the user may or may not have specified them in the same order as sys.path even if they are. Suppose archive files must lie on sys.path and are processed in order. Then to find them you must know their name. But IMHO you want to avoid doing a readdir() on each element of sys.path and looking for files *.pyl. Suppose archive file names in general are the known name "lib.pyl" for the Python library, plus the names "package.pyl" where "package" can be the name of a Python package as a single archive file. Then if the user tries to import foo, imputil will search along sys.path looking for foo.pyc, foo.pyl, etc. If it finds foo.pyl, the archive importer will add it to its list of known archive files. But it must not add it to its single dictionary, because that would destroy the information about its position along sys.path. Instead, it must keep a separate dictionary for each element of sys.path and search the separate dictionaries under control of imputil. That is, get_code() needs a new argument for the element of sys.path being searched. Alternatively, you could create a new importer instance for each archive file found, but then you still have multiple dictionaries. They are in the multiple instances. All this is needed only to support import of identically named modules. If there are none, there is no problem because sys.path is being used only to find modules, not to disambiguate them. See also my separate reply to your other post which discusses this same issue. JimA

On Mon, 6 Dec 1999, James C. Ahlstrom wrote:
Ah. There is the problem. In Guido's suggestion for the "next path of inquiry" :-), there is no "single dictionary of names". Instead, you have Importer instances as items in sys.path. Each instance maintains its dictionary, and they are not (necessarily) combined. If we were to combine them, then we would need to maintain the ordering requirements implied by sys.path. However, this would be problematic if sys.path changed -- we would have to detect the situation and rebuild a merged dict.
The importer must be inserted into sys.path to establish a precedence. If the user wants to add 12 libraries... fine. But *all* of those modules will fall under a precedence defined by the Importer's position on sys.path.
I do not believe that we will arbitrarily locate and open library files. They must be specified explicitly.
If the user installs ".pyl" as a recognized extension (i.e. installs into the PathImporter), then the above scenario is possible. In my in-head-design, I had not imagined any state being retained for extension-recognizer hooks. Of course, state can be retained simply by using a bound-method for the hook function. get_code() would not need to change. The foo.pyl would be consulted at the appropriate time based on where it is found in sys.path. Note that file- extension hooks would definitely have a complete path to the target file. Those are not Importers, however (although they will closely follow the get_code() hook since the extension is called from get_code).

No need to worry about this: just don't merge the caches. Compared to the hundreds of failed open() calls that are done now, it's no big deal to do 12 failed Python dictionary lookups instead of one. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 7 Dec 1999, Guido van Rossum wrote:
Have no fear... I wasn't planning on this... complicates too much stuff for too little gain. Cheers, -g -- Greg Stein, http://www.lyra.org/
participants (14)
-
Andrew M. Kuchling
-
Barry A. Warsaw
-
David Beazley
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Gordon McMillan
-
Greg Stein
-
Guido van Rossum
-
James C. Ahlstrom
-
Jean-Claude Wippler
-
Jeremy Hylton
-
M.-A. Lemburg
-
Tim Peters
-
Vladimir Marangozov