Re: [Python-Dev] Re: .DLL vs .PYD search order

On the Mac I've introduced "magic cookies" into sys.path, which allow you to do interesting searches (like searching for a DLL or PYC-resource in the application itself) at known places in the import process. There isn't a cookie for "search along the standard MacOS dll search path" (which is somewhat similar to the Windows dll search path) because I haven't seen a reason for it, but there's nothing to stop it. And if you'd insert that cookie it would be perfectly clear (at least, it should be) that only dll modules will be found in that step, not .py modules. Actually I'm so happy with the magic cookie scheme that I've advocated at various times in the past that something similar also be used for determining where builtin modules and frozen modules appear in sys.path... -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm

I see the magic cookies as a poor man's (but more compatible!) version of a chain of importers as advocated by Greg Stein and other imputil fans. I like the idea, except that I think that the chain should be manipulatable more easily than the current imputil implementation. (I'll have more comments on Greg's comments later, when I've actually read them through.) --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, 2 Dec 1999, Guido van Rossum wrote:
Anything in sys.path that is not a string pointing to a directory is not very compatible. My current proposal keeps the existing semantics for sys.path (the proposal adds functionality thru other mechanisms, rather than changing/interfering with existing ones). I look forward to your comments! I'll definitely provide new solutions where you find problems :-) Cheers, -g -- Greg Stein, http://www.lyra.org/

Guido van Rossum <guido@CNRI.Reston.VA.US> wrote:
I know this has been asked before, but cannot recall any of the arguments against it: how about replacing Jack's magic cookies with importer objects? (in other words, if a path item is a string, import as usual. otherwise, ask the importer for a code object or maybe better, a module object). </F>

Fredrik Lundh wrote:
Plus, for backward compatibility, make sure that str(importerobj) returns something which resembles a non-existing directory. Note that the builtin importer skips non-string entries in sys.path, so the above will only be needed for existing import hooks. Still, I would like to rephrase my 0.02EUR which I already posted twice... why not start to think about what these importers would do first ? If there are only a handful of wishes we could just add them to the builtin machinery and be done with it... -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 29 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Thu, 2 Dec 1999, M.-A. Lemburg wrote:
I'd rather see the builtin machinery move to Python, regardless of what system is used and/or what features are added. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
In the long run that's probably the right direction, but right now we are only talking a very small set of additional features, which can easily be added to the existing code without too much fuzz. Plus it won't slow things down, which is important since Python startup time is already an issue all by itself. The imputil.py approach of doing (a whole bunch of) recursive Python function calls to all kinds of importers will not speed this up, I'm afraid. A on-disk lookup table would speed this up, but it would also break the current logic in imputil.py, which puts importer independence above all. -- IMHO, we should retreat to a more centralized interface, one which more resembles a manager rather than the agent interface implemented in imputil.py. Add-ons can then register themselves to say "hey, I can handle pyz-archives" or "I know how to import .so modules" or "I provide a search function which you can call to have me scan my module container (directory, web-site, archive)". The manager would take care of what to call and in which order, plus delegate requests to add-ons which implement the needed logic, e.g. add-ons for signature checking, unzipping archives, file system lookup tables, etc. It could also trace its actions and then keep an on-disk knowledge base for what it did in the past to find certain modules under certain conditions. Anyway, all this is extra magic for some future version of Python. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[Greg]
I'd rather see the builtin machinery move to Python, regardless of what system is used and/or what features are added.
[Marc]
I disagree. We should do the redisign right rather than tweaking the existing code.
I don't care about the current logic in imputil. It's only a prototype!
This makes sense.
I would say the manager API design and a basic set of specific handlers should go into 1.6. --Guido van Rossum (home page: http://www.python.org/~guido/)

MAL wrote:
but why? in my small-minded view of how python works, an importer carries out a very simple task: given a name, check if you have a module with that name, and install it. if you cannot, fail (in which case python asks the next importer along the path). why do you have to complicate things beyond that? why not just let Python provide a few base classes and mixins for people who want to create custom importers, and be done with it? rationale, please. </F>

Fredrik Lundh wrote:
Because importing in Python has become *much* more complicated over time. There are requests for new features which touch subjects such as storage mechanisms, lookups, signatures (for trusted code), lazy imports, etc. A chain of simple minded importers won't work together too well, duplicate work and downgrade performance considerably due to the many recursive function calls. Also, centralized caching strategies are hard to implement across import handlers. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

M.-A. Lemburg <mal@lemburg.com> wrote:
sorry, I still don't understand it. our applications already use different storage mechanisms, databases, signatures, lazy importing, version handling, etc, etc. now, if *we* have managed to build all that on top of an old version of imputil.py, how come it's not sufficient for the rest of you?
A chain of simple minded importers won't work together too well
why? it sure works for us...
duplicate work
avoiding duplicate work is what object oriented design is all about. and last time I checked, Python had excellent support for that.
and downgrade performance considerably due to the many recursive function calls
now that's what I call premature optimization. and this scares the hell out of me: if the rest of the python-dev crowd don't seriously believe that Python is (or can be made) fast enough to implement things like this, why the heck are you using Python at all? am I the only one here who doesn't believe in osterhout's talk about "the great system vs. scripting language divide"? </F>

On Sat, 4 Dec 1999, Fredrik Lundh wrote:
I agree. The imputil mechanism has been proven in combat to work for many scenarios. I have not (yet) heard of a case where the model has proven insufficient.
Exactly. "Why?" Please provide an example.
Don't worry Fredrik... I'm with you on this one. I do not believe there is a problem with the speed. Nobody has yet profiled imputil to find out where/how the time is being spent. Nobody has tried to speed it up. Therefore, any claims about its performance are simply FUD. I claim that its interface is correct, and you (Fredrik) stated it well: "given a name, please give me a module if you can (otherwise None)." Underneath that semantic, there are a lot of things that can be done to alter the performance and organization. Claims about speed are entirely premature. Yes, I'm biased. But, in truth, I haven't seen a better mechanism yet. I've tossed out a few ideas on how imputil could be improved (which are solely based on guess, rather than empirical evidence of profiling output). When those changes are completed and there is still an issue, then I'll admit defeat and wait for somebody else to provide a new design. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
See my reply to Fredrik.
Sorry, Greg, but that is simply not true. I've spend a few days on trying to get more performance out of it and have succeeded, but in the end it wasn't enough to convince me of the approach.
Therefore, any claims about its performance are simply FUD.
BTW, did anybody mention that an import manager wouldn't be able to provide an API which is useable for imputil style importers ? I'm not argueing against the possibility to use imputil style importers, just against making it the sole method of adding wisdom to Python imports. The imputil importers could well benefit from a manager providing logic to do basic things like importing shared libs, checking signatures, downloading modules from the web, etc. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 27 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Sat, 4 Dec 1999, M.-A. Lemburg wrote:
You sent me your changes... I don't believe that you were aggressive enough. As I've mentioned before, I think it is quite possible to retain the general Importer style and get_code() interface, but to shift some functionality out (to be computed once) to a higher-level mechanism. The patches that you sent me did not do this, so I'm not surprised that you hit a wall. Ack. See? Now I'm getting into discussions about performance and implementation without truly knowing where the timing is spent. Eyeballing it, I have an idea, but it would be best too see a profile output. My mantra is always "90% of the time you're wrong about where 90% of the time is being spent." I am unconcerned about performance, but will work on it so that I don't need to continue this conversation. That burden is on me.
Since the core will delegate out to Python (note: current working theory), then it certainly is not the "sole method" (since you can just replace the Python code). But there must be a default mechanism. The ihooks stuff was too complicated. imputil seems to be much easier. I'd love to see a third mechanism.... so I can steal ideas :-)
For shared libs, yes. For the others: geez... I don't want to see that in the core infrastructure. Shift that out to specialized Importers. The infrstructure ought to be teeny and agnostic about how to map a module name to a module. Side note to python-dev people: I apologize... I realize that I'm beginning to get a bit defensive here. I'm going to be at XML '99 until Friday, so that should give me a breather. When I get back, I'll skip the talk and do some code. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote: ...
My mantra is always "90% of the time you're wrong about where 90% of the time is being spent."
What a great sentence! We all know it, but many of us (especially me) forget about it during 90% of our coding time. Much better to spend this on design (as you did). thanks - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

M.-A. Lemburg wrote:
Greg Stein wrote:
Remember those comparisons of Perl and Python, to which you added cgipython? I've added to the list a version that uses an old version of imputil (probably the one you optimized) and a compressed std lib. Note that my Linux python (1.5.2) is built in the RedHat style - even struct and strop are .so's; so that accounts for the majority of the open calls. This is a full Python (runs code.py if you don't pass it a script name). For lack of a better name, I've called it "pykit". First, the size of log files (in lines), i.e. number of system calls: Solaris Linux IRIX[1] Perl 88 85 70 Python 425 316 257 cgipython 182 pykit 136 Next, the number of "open" calls: Solaris Linux IRIX Perl 16 10 9 Python 107 71 48 cgipython 33 pykit 9 And the number of unsuccessful "open" calls: Solaris Linux IRIX Perl 6 1 3 Python 77 49 32 cgipython 28 pykit 2 Number of "mmap" calls: Solaris Linux IRIX Perl 25 25 1 Python 36 24 1 cgipython 13 pykit 21 This test would show off more if it went beyond startup. An import of a standard lib module in my stock Python involves 2 failed stats and 6 failed opens, then 2 successful opens and 2 fstats before the module is loaded. None of these occur in pykit. The downside (asking my Importer for a .so or a module not in the importer) takes no system calls, and involves a dozen or so lines of Python and a check of a dictionary. - Gordon

Fredrik Lundh wrote:
I've tried to get (an older) imputil.py version up and running too. It did work, but only after some considerable tweaking and even with integrated cache mechanisms did not reach the performance of the builtin importer (which doesn't use the kinds of caching strategies I had built into imputil.py). Getting the whole setup to work wasn't easy at all, because of the way imputil importers delegate work and things get even more confusing when it starts to "take over" certain parts of packages by installing temselves as importers for a particular package.
An example: A path importer knows how to scan directories and how to use a path to tell the correct order. It can maybe also import .py/.pyc/.pyo files. Now what happens if it finds a shared lib as module... the usual imputil way would be to delegate the request to some other importer which can handle shared libs... but wait: how does the shared lib importer know where to look ? It will have to rescan the directories, etc...
See my example above. The agent approach used by imputil does not support OO design too well: even though you can avoid duplicate programming work on the importers by using a few base classes which implement dir scans, shared lib imports, etc. the imputil design does not provide means to avoid duplicate actions taken by the importers.
Looks like you are in ranting mode here ;-) Seriously, I've checked my imputil.py version (with caches enabled) against the builtin importer and noticed a performance downgrade by factor >2. This was enough to convince me of looking for other techniques to handle the problems I had at the time... you know, relative imports and things. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 27 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Sat, 4 Dec 1999, M.-A. Lemburg wrote:
1) yes, it was an older version and did not have the PathImporter class. As a by product, the DirectoryImporters that it *did* have were much slower. It still did not support builtins, frozen modules, or dynamic loads. All of that is present now, so it works "out of the box" much better. 2) Performance: as I wrote in the other email, I don't believe that is an argument against the design. The imputil approach *will* be slower than the current Python mechanism, but there is some more coding to do to truly see how much. The side benefits (e.g. ZipImporter and caching) may outweigh the result. Time will tell.
I don't understand this. If it is relevant, then please expand. Thx.
No, the "usual imputil way" is that the PathImporter understands searching a path and loading stuff from that path. An Importer is a combination of locating and loading (since they are, typically, tightly bound). The next rev will allow user-plugging of support for new file types.
There is always a balance to be struck between independence and coupling. I chose to reduce coupling and increase independence. If you shift a bunch of stuff out of the Importers, then you will increase the coupling between the imputil framework and the Importers. That coupling will then close off future possibilities. Within the framework itself (e.g. between _import_hook and get_code), there is a lot of opportunity for change. Since that is behind the covers, it is no big deal to shift functionality around. I plan to do so.
I have run a long series of tests. Without doing any performance work on imputil, the ratio is 9 to 13. The 13 may have bumped up to about 15 or 16 when I added some dynamic loading code (I forget). Regardless, it is definitely less than a 2X increase. And that is with zero optimization. *shrug* I'm done. I'll do some code in a couple weeks. Cheers, -g -- Greg Stein, http://www.lyra.org/

"M.-A. Lemburg" wrote:
The above refers to an earlier but still very recent version of imputil. On that basis is is perfectly accurate. Here is another example from my own experience almost identical to the above: One possible archive file format holds its list of archived *.pyc file names as keys in a dictionary. This is simple and efficient, but fails to correctly address the problem of shared libs (aka DLL's in Windows) with names identical to names of *.pyc files in the archive. For example, suppose foo.pyc is in the archive, and foo.dll is in a directory. Suppose sys.path is to be used to decide whether to load foo.pyc or foo.dll. Then an "archive importer" will fail to do this. Specifically you can't see if foo.pyc is in the archive and then check sys.path, nor can you do the reverse. You must call the "archive importer" repeatedly for each element of sys.path and search the directory at the same time. JimA

On Sat, 4 Dec 1999, James C. Ahlstrom wrote:
What? The archive is independent of each .pyc's original position in sys.path. There is no reason/need to carry that information into an archive. If the archive contains "foo", then you're done. If it doesn't, then move on to the next element of sys.path (directory or Importer instance) and look there. Basically: if you deploy an archive, then all of its files will take precedence over any file found later on sys.path. This is exactly what sys.path is about: establishing precedence. If I understand you correctly, then you're trying to say there is some sort of interleaving that must occur. If so, then I don't understand why. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
Sorry, I am a little slow today. My daughter got me up at 6 am to work on her computer video editor. No disk space, fragmentation, 2 gig limit on AVI files, ........ Are you saying this? If foo is imported, the archive importer is consulted first to see if it can provide foo. If not, sys.path is searched for foo.pyc, foo.pyl etc., and if foo.pyl is found, then its contents are added to the single archive importer dictionary. The order of addition to the archive dictionary is determined by sys.path, and duplicate names are not entered because they lie later on sys.path. But once a file is recognized as in an archive, it effectively precedes all of sys.path. Or this? If foo is imported, sys.path is searched for foo.pyc, foo.pyl, etc., and also all archive files found at each element of sys.path are searched for foo. If "bar" is imported, it may be found in foo.pyl. That is, there is an instance of an archive importer for each element of sys.path. What if the user names an archive file not on sys.path? What order does it have? JimA

hmm. I think I see the problem here... you obviously attempted to use imputil to implement non-standard import behaviour on top of the standard storage system -- while we've used it to implement standard import behaviour on top of non-standard storage systems. I don't know if imputil is good enough for the former, and I don't think I care... I've spent too many nights debugging code that relied on clever, non-standard hacks. </F> PS. on the performance side of things, did you know that 're' can be up to ten times slower than 'regex'? but people don't complain -- probably because it allows them to do things they couldn't do before...

[/F]
Bad example: people do complain about this. Those who care a lot continue to use regex, temporarily pacified by the promise that re.py will get recoded in C and thus regain a good chunk of regex's speed. Those who care a whale of a lot continue to use Perl <0.9 wink>.

Guido van Rossum wrote:
Ok, then...
BTW, is there a timeline for the 1.6 release ? I mean which things will have to be in 1.6 ? Some recent topics as hints: 1. Unicode 2. Import Manager API + default handlers 3. Python style coercion at C type level 4. Rich comparisons 5. __doc__ string extraction tool -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
Greg Stein wrote:
I volunteer to write a Python archive in either Python or C. In fact I currently have prototypes for both. But I have to agree with Greg here. I think a Python importer is the way to go. The C code is 300 lines mostly in import.c and parallel to existing code. The Python archive is about 100 lines and is prettier, easy to read, alter and re-use (obviously).
Plus it won't slow things down, which is important since Python startup time is already an issue all by itself. The
I think archive files should be able to be fast, and should help, not hurt, startup time. Provided that the use of sys.path is curtailed, os.readdir() is not needed, and the specifications are not complicated. Although archive files are my special concern, I realize that imputil is not just about archives. JimA

I see the magic cookies as a poor man's (but more compatible!) version of a chain of importers as advocated by Greg Stein and other imputil fans. I like the idea, except that I think that the chain should be manipulatable more easily than the current imputil implementation. (I'll have more comments on Greg's comments later, when I've actually read them through.) --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, 2 Dec 1999, Guido van Rossum wrote:
Anything in sys.path that is not a string pointing to a directory is not very compatible. My current proposal keeps the existing semantics for sys.path (the proposal adds functionality thru other mechanisms, rather than changing/interfering with existing ones). I look forward to your comments! I'll definitely provide new solutions where you find problems :-) Cheers, -g -- Greg Stein, http://www.lyra.org/

Guido van Rossum <guido@CNRI.Reston.VA.US> wrote:
I know this has been asked before, but cannot recall any of the arguments against it: how about replacing Jack's magic cookies with importer objects? (in other words, if a path item is a string, import as usual. otherwise, ask the importer for a code object or maybe better, a module object). </F>

Fredrik Lundh wrote:
Plus, for backward compatibility, make sure that str(importerobj) returns something which resembles a non-existing directory. Note that the builtin importer skips non-string entries in sys.path, so the above will only be needed for existing import hooks. Still, I would like to rephrase my 0.02EUR which I already posted twice... why not start to think about what these importers would do first ? If there are only a handful of wishes we could just add them to the builtin machinery and be done with it... -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 29 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Thu, 2 Dec 1999, M.-A. Lemburg wrote:
I'd rather see the builtin machinery move to Python, regardless of what system is used and/or what features are added. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
In the long run that's probably the right direction, but right now we are only talking a very small set of additional features, which can easily be added to the existing code without too much fuzz. Plus it won't slow things down, which is important since Python startup time is already an issue all by itself. The imputil.py approach of doing (a whole bunch of) recursive Python function calls to all kinds of importers will not speed this up, I'm afraid. A on-disk lookup table would speed this up, but it would also break the current logic in imputil.py, which puts importer independence above all. -- IMHO, we should retreat to a more centralized interface, one which more resembles a manager rather than the agent interface implemented in imputil.py. Add-ons can then register themselves to say "hey, I can handle pyz-archives" or "I know how to import .so modules" or "I provide a search function which you can call to have me scan my module container (directory, web-site, archive)". The manager would take care of what to call and in which order, plus delegate requests to add-ons which implement the needed logic, e.g. add-ons for signature checking, unzipping archives, file system lookup tables, etc. It could also trace its actions and then keep an on-disk knowledge base for what it did in the past to find certain modules under certain conditions. Anyway, all this is extra magic for some future version of Python. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[Greg]
I'd rather see the builtin machinery move to Python, regardless of what system is used and/or what features are added.
[Marc]
I disagree. We should do the redisign right rather than tweaking the existing code.
I don't care about the current logic in imputil. It's only a prototype!
This makes sense.
I would say the manager API design and a basic set of specific handlers should go into 1.6. --Guido van Rossum (home page: http://www.python.org/~guido/)

MAL wrote:
but why? in my small-minded view of how python works, an importer carries out a very simple task: given a name, check if you have a module with that name, and install it. if you cannot, fail (in which case python asks the next importer along the path). why do you have to complicate things beyond that? why not just let Python provide a few base classes and mixins for people who want to create custom importers, and be done with it? rationale, please. </F>

Fredrik Lundh wrote:
Because importing in Python has become *much* more complicated over time. There are requests for new features which touch subjects such as storage mechanisms, lookups, signatures (for trusted code), lazy imports, etc. A chain of simple minded importers won't work together too well, duplicate work and downgrade performance considerably due to the many recursive function calls. Also, centralized caching strategies are hard to implement across import handlers. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

M.-A. Lemburg <mal@lemburg.com> wrote:
sorry, I still don't understand it. our applications already use different storage mechanisms, databases, signatures, lazy importing, version handling, etc, etc. now, if *we* have managed to build all that on top of an old version of imputil.py, how come it's not sufficient for the rest of you?
A chain of simple minded importers won't work together too well
why? it sure works for us...
duplicate work
avoiding duplicate work is what object oriented design is all about. and last time I checked, Python had excellent support for that.
and downgrade performance considerably due to the many recursive function calls
now that's what I call premature optimization. and this scares the hell out of me: if the rest of the python-dev crowd don't seriously believe that Python is (or can be made) fast enough to implement things like this, why the heck are you using Python at all? am I the only one here who doesn't believe in osterhout's talk about "the great system vs. scripting language divide"? </F>

On Sat, 4 Dec 1999, Fredrik Lundh wrote:
I agree. The imputil mechanism has been proven in combat to work for many scenarios. I have not (yet) heard of a case where the model has proven insufficient.
Exactly. "Why?" Please provide an example.
Don't worry Fredrik... I'm with you on this one. I do not believe there is a problem with the speed. Nobody has yet profiled imputil to find out where/how the time is being spent. Nobody has tried to speed it up. Therefore, any claims about its performance are simply FUD. I claim that its interface is correct, and you (Fredrik) stated it well: "given a name, please give me a module if you can (otherwise None)." Underneath that semantic, there are a lot of things that can be done to alter the performance and organization. Claims about speed are entirely premature. Yes, I'm biased. But, in truth, I haven't seen a better mechanism yet. I've tossed out a few ideas on how imputil could be improved (which are solely based on guess, rather than empirical evidence of profiling output). When those changes are completed and there is still an issue, then I'll admit defeat and wait for somebody else to provide a new design. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
See my reply to Fredrik.
Sorry, Greg, but that is simply not true. I've spend a few days on trying to get more performance out of it and have succeeded, but in the end it wasn't enough to convince me of the approach.
Therefore, any claims about its performance are simply FUD.
BTW, did anybody mention that an import manager wouldn't be able to provide an API which is useable for imputil style importers ? I'm not argueing against the possibility to use imputil style importers, just against making it the sole method of adding wisdom to Python imports. The imputil importers could well benefit from a manager providing logic to do basic things like importing shared libs, checking signatures, downloading modules from the web, etc. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 27 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Sat, 4 Dec 1999, M.-A. Lemburg wrote:
You sent me your changes... I don't believe that you were aggressive enough. As I've mentioned before, I think it is quite possible to retain the general Importer style and get_code() interface, but to shift some functionality out (to be computed once) to a higher-level mechanism. The patches that you sent me did not do this, so I'm not surprised that you hit a wall. Ack. See? Now I'm getting into discussions about performance and implementation without truly knowing where the timing is spent. Eyeballing it, I have an idea, but it would be best too see a profile output. My mantra is always "90% of the time you're wrong about where 90% of the time is being spent." I am unconcerned about performance, but will work on it so that I don't need to continue this conversation. That burden is on me.
Since the core will delegate out to Python (note: current working theory), then it certainly is not the "sole method" (since you can just replace the Python code). But there must be a default mechanism. The ihooks stuff was too complicated. imputil seems to be much easier. I'd love to see a third mechanism.... so I can steal ideas :-)
For shared libs, yes. For the others: geez... I don't want to see that in the core infrastructure. Shift that out to specialized Importers. The infrstructure ought to be teeny and agnostic about how to map a module name to a module. Side note to python-dev people: I apologize... I realize that I'm beginning to get a bit defensive here. I'm going to be at XML '99 until Friday, so that should give me a breather. When I get back, I'll skip the talk and do some code. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote: ...
My mantra is always "90% of the time you're wrong about where 90% of the time is being spent."
What a great sentence! We all know it, but many of us (especially me) forget about it during 90% of our coding time. Much better to spend this on design (as you did). thanks - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

M.-A. Lemburg wrote:
Greg Stein wrote:
Remember those comparisons of Perl and Python, to which you added cgipython? I've added to the list a version that uses an old version of imputil (probably the one you optimized) and a compressed std lib. Note that my Linux python (1.5.2) is built in the RedHat style - even struct and strop are .so's; so that accounts for the majority of the open calls. This is a full Python (runs code.py if you don't pass it a script name). For lack of a better name, I've called it "pykit". First, the size of log files (in lines), i.e. number of system calls: Solaris Linux IRIX[1] Perl 88 85 70 Python 425 316 257 cgipython 182 pykit 136 Next, the number of "open" calls: Solaris Linux IRIX Perl 16 10 9 Python 107 71 48 cgipython 33 pykit 9 And the number of unsuccessful "open" calls: Solaris Linux IRIX Perl 6 1 3 Python 77 49 32 cgipython 28 pykit 2 Number of "mmap" calls: Solaris Linux IRIX Perl 25 25 1 Python 36 24 1 cgipython 13 pykit 21 This test would show off more if it went beyond startup. An import of a standard lib module in my stock Python involves 2 failed stats and 6 failed opens, then 2 successful opens and 2 fstats before the module is loaded. None of these occur in pykit. The downside (asking my Importer for a .so or a module not in the importer) takes no system calls, and involves a dozen or so lines of Python and a check of a dictionary. - Gordon

Fredrik Lundh wrote:
I've tried to get (an older) imputil.py version up and running too. It did work, but only after some considerable tweaking and even with integrated cache mechanisms did not reach the performance of the builtin importer (which doesn't use the kinds of caching strategies I had built into imputil.py). Getting the whole setup to work wasn't easy at all, because of the way imputil importers delegate work and things get even more confusing when it starts to "take over" certain parts of packages by installing temselves as importers for a particular package.
An example: A path importer knows how to scan directories and how to use a path to tell the correct order. It can maybe also import .py/.pyc/.pyo files. Now what happens if it finds a shared lib as module... the usual imputil way would be to delegate the request to some other importer which can handle shared libs... but wait: how does the shared lib importer know where to look ? It will have to rescan the directories, etc...
See my example above. The agent approach used by imputil does not support OO design too well: even though you can avoid duplicate programming work on the importers by using a few base classes which implement dir scans, shared lib imports, etc. the imputil design does not provide means to avoid duplicate actions taken by the importers.
Looks like you are in ranting mode here ;-) Seriously, I've checked my imputil.py version (with caches enabled) against the builtin importer and noticed a performance downgrade by factor >2. This was enough to convince me of looking for other techniques to handle the problems I had at the time... you know, relative imports and things. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 27 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Sat, 4 Dec 1999, M.-A. Lemburg wrote:
1) yes, it was an older version and did not have the PathImporter class. As a by product, the DirectoryImporters that it *did* have were much slower. It still did not support builtins, frozen modules, or dynamic loads. All of that is present now, so it works "out of the box" much better. 2) Performance: as I wrote in the other email, I don't believe that is an argument against the design. The imputil approach *will* be slower than the current Python mechanism, but there is some more coding to do to truly see how much. The side benefits (e.g. ZipImporter and caching) may outweigh the result. Time will tell.
I don't understand this. If it is relevant, then please expand. Thx.
No, the "usual imputil way" is that the PathImporter understands searching a path and loading stuff from that path. An Importer is a combination of locating and loading (since they are, typically, tightly bound). The next rev will allow user-plugging of support for new file types.
There is always a balance to be struck between independence and coupling. I chose to reduce coupling and increase independence. If you shift a bunch of stuff out of the Importers, then you will increase the coupling between the imputil framework and the Importers. That coupling will then close off future possibilities. Within the framework itself (e.g. between _import_hook and get_code), there is a lot of opportunity for change. Since that is behind the covers, it is no big deal to shift functionality around. I plan to do so.
I have run a long series of tests. Without doing any performance work on imputil, the ratio is 9 to 13. The 13 may have bumped up to about 15 or 16 when I added some dynamic loading code (I forget). Regardless, it is definitely less than a 2X increase. And that is with zero optimization. *shrug* I'm done. I'll do some code in a couple weeks. Cheers, -g -- Greg Stein, http://www.lyra.org/

"M.-A. Lemburg" wrote:
The above refers to an earlier but still very recent version of imputil. On that basis is is perfectly accurate. Here is another example from my own experience almost identical to the above: One possible archive file format holds its list of archived *.pyc file names as keys in a dictionary. This is simple and efficient, but fails to correctly address the problem of shared libs (aka DLL's in Windows) with names identical to names of *.pyc files in the archive. For example, suppose foo.pyc is in the archive, and foo.dll is in a directory. Suppose sys.path is to be used to decide whether to load foo.pyc or foo.dll. Then an "archive importer" will fail to do this. Specifically you can't see if foo.pyc is in the archive and then check sys.path, nor can you do the reverse. You must call the "archive importer" repeatedly for each element of sys.path and search the directory at the same time. JimA

On Sat, 4 Dec 1999, James C. Ahlstrom wrote:
What? The archive is independent of each .pyc's original position in sys.path. There is no reason/need to carry that information into an archive. If the archive contains "foo", then you're done. If it doesn't, then move on to the next element of sys.path (directory or Importer instance) and look there. Basically: if you deploy an archive, then all of its files will take precedence over any file found later on sys.path. This is exactly what sys.path is about: establishing precedence. If I understand you correctly, then you're trying to say there is some sort of interleaving that must occur. If so, then I don't understand why. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
Sorry, I am a little slow today. My daughter got me up at 6 am to work on her computer video editor. No disk space, fragmentation, 2 gig limit on AVI files, ........ Are you saying this? If foo is imported, the archive importer is consulted first to see if it can provide foo. If not, sys.path is searched for foo.pyc, foo.pyl etc., and if foo.pyl is found, then its contents are added to the single archive importer dictionary. The order of addition to the archive dictionary is determined by sys.path, and duplicate names are not entered because they lie later on sys.path. But once a file is recognized as in an archive, it effectively precedes all of sys.path. Or this? If foo is imported, sys.path is searched for foo.pyc, foo.pyl, etc., and also all archive files found at each element of sys.path are searched for foo. If "bar" is imported, it may be found in foo.pyl. That is, there is an instance of an archive importer for each element of sys.path. What if the user names an archive file not on sys.path? What order does it have? JimA

hmm. I think I see the problem here... you obviously attempted to use imputil to implement non-standard import behaviour on top of the standard storage system -- while we've used it to implement standard import behaviour on top of non-standard storage systems. I don't know if imputil is good enough for the former, and I don't think I care... I've spent too many nights debugging code that relied on clever, non-standard hacks. </F> PS. on the performance side of things, did you know that 're' can be up to ten times slower than 'regex'? but people don't complain -- probably because it allows them to do things they couldn't do before...

[/F]
Bad example: people do complain about this. Those who care a lot continue to use regex, temporarily pacified by the promise that re.py will get recoded in C and thus regain a good chunk of regex's speed. Those who care a whale of a lot continue to use Perl <0.9 wink>.

Guido van Rossum wrote:
Ok, then...
BTW, is there a timeline for the 1.6 release ? I mean which things will have to be in 1.6 ? Some recent topics as hints: 1. Unicode 2. Import Manager API + default handlers 3. Python style coercion at C type level 4. Rich comparisons 5. __doc__ string extraction tool -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 28 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
Greg Stein wrote:
I volunteer to write a Python archive in either Python or C. In fact I currently have prototypes for both. But I have to agree with Greg here. I think a Python importer is the way to go. The C code is 300 lines mostly in import.c and parallel to existing code. The Python archive is about 100 lines and is prettier, easy to read, alter and re-use (obviously).
Plus it won't slow things down, which is important since Python startup time is already an issue all by itself. The
I think archive files should be able to be fast, and should help, not hurt, startup time. Provided that the use of sys.path is curtailed, os.readdir() is not needed, and the specifications are not complicated. Although archive files are my special concern, I realize that imputil is not just about archives. JimA
participants (9)
-
Christian Tismer
-
Fredrik Lundh
-
Gordon McMillan
-
Greg Stein
-
Guido van Rossum
-
Jack Jansen
-
James C. Ahlstrom
-
M.-A. Lemburg
-
Tim Peters