Complexity of import.c

I see that my patch to import.c has been criticized as adding to the complexity of an already complex module. I don't entirely agree. If you want to have the standard library available as a zip module, you must have zip file import available in C. The patch I proposed in PEP 273 was a minimalist patch; it adds as little as possible while still keeping package import and preserving all import semantics. IMHO, the real complexity of import.c comes not from the double loop for path in sys.path: for suffix in (".pyc", ".pyo",...): fopen(...) and not from the caching of zip archive names. It comes from the use of import.c to perform package imports. Package imports are performed by creating recursive calls to functions that would otherwise be flat, and by replacing sys.path with the package path. This is darned confusing. At the time PEP 273 was discussed, I proposed moving package imports out of import.c into a Python module such as Gordon's iu.py or Greg's imputil.py. This was rejected because package users feared that package imports would be slowed down. The speed of Python startup and import was a concern. I understand that people want to generalize zip imports, and create a better import hook mechanism. Some feel that now is the time to do it. This is my humble opinion: Replacing import.c plus zip import with an equally complex import.c is a fundamental mistake. The current import.c is understood (by few), and the zip file import patch adds the minimum required. If a different import.c is accepted, then it should be MUCH simpler, and support only flat directory searches and zip directories. Package imports and import hooks should be moved to a Python module such as Gordon's iu.py. And import.c must bootstrap iu.py and all its imports from either a directory search or a zip file. An improved import hook mechanism should have its own PEP. I tried my best not to break the Macintosh port, which has a lot of special code. A replacement import.c should do the same. My patch modified many other Python .c and .py files to solve several difficult bootstrap problems. If you replace just import.c you will get a painful lesson in bootstrapping. You probably need these other patches even if you reject my import.c patch. Jim Ahlstrom

This is not the case. Much of the complexity of the patch stems from an (atleast perceived) attempt to reorganize import.c. For example: - why did is_builtin have to go? - Was the ExternalNames dictionary really necessary? - Why was the HAVE_READLINK chunk moved? - What is the purpose of Get/SetScriptName? Regards, Martin

Martin v. Löwis wrote:
I see that my patch to import.c has been criticized as adding to the complexity of an already complex module.
This is not the case. Much of the complexity of the patch stems from an (atleast perceived) attempt to reorganize import.c. For example:
I think package imports really are to blame, because they require the zip import mechanism to be within import.c. Remember that PEP 273 semantics require that a zip archive is equivalent to a directory hierarchy, not just a single directory. Then this must work on __path__: /a/b/c/myarchive.zip/subdir/ And consider a package import which changes the path, and then imports subpackages, each of which may also alter the path. I don't see how you can jump out of import.c on a sys.path item and make this work, because the new path is coming from package import, not the original sys.path.
- why did is_builtin have to go?
I introduced a dictionary of zip file contents to avoid repeated linear searches of the zip file. Since builtins are a constant list, it was easy to add builtin names too. I was not necessary to ditch is_builtin, but it was more natural to do so.
- Was the ExternalNames dictionary really necessary?
Yikes! that is where I record the zip file contents. That avoids the linear search. OK, so once I had a dictionary of zip names, it was natural to add other info, such as directory file names too.
- Why was the HAVE_READLINK chunk moved? - What is the purpose of Get/SetScriptName?
These changes and several others were necessary to be able to import the Python standard lib from a zip file. The problem was the attempt to import site.py before the sys.path was fully set up. This is arguably a bug in Python. The changes moved all sys.path setup to getpath and out of sysmodule.c. This isn't an issue of zip importing per se. It is the avoidance of phase errors during start up. For example, you need a full valid sys.path to be able to import zlib.pyd (on Windows) so you read a compressed zip archive. Then you import site.py plus whatever it imports. This must be available in the zip standard lib file, so zip importing must already work. Thus the need for careful bootstrapping. JimA

So what would have been the problem with linear search that makes this extra data structure strictly necessary to have?
I think you can't really cover all cases. For example, your patch fails to find zlib in the Unix build tree, as the build directory is added to sys.path only in site.py, and your patch won't attempt to import zlib outside InitZip. So it might have been good to explain somewhere what bootstrapping scenarios this patch is meant to support. Regards, Martin

James C. Ahlstrom wrote:
This has not been questioned.
I think my counterpatch (even if it's not done yet) shows this is simply not true.
Erm, I think package import is very recursive by *nature*. I'm not sure what function you mean that could be flat.
This is darned confusing.
So true!
It's also dubious how this would help: the current code works well and is stable. I would fear that moving any of this to Python would cause bootstrapping problems.
Yes, and luckily noone is proposing to do that. [ ... ]
I've learned them indeed over the last couple of days, but I didmanage to get it working by only touching import.c and importdl.h. But my current working version also touches pythonrun.[hc], as it has a new _PyImportHooks_Init function. (The reason this was neccesary was I wanted a subclass of ImportError in the zipimport module; I learned the hard way that you can't create new exceptions from C before the exceptions module is loaded... So I had to move my init code from _PyImport_Init to a separate function, which is called by Py_Initialize().)
You probably need these other patches even if you reject my import.c patch.
I doubt it. Just

On Monday, Dec 9, 2002, at 18:24 Europe/Amsterdam, James C. Ahlstrom wrote:
Actually, if the new improved import has the right set of hooks then the macintosh stuff can go away. 90% of it does basically the same as what zip-import does: it allows you to put files on sys.path and looks in those files for modules. The mumbo-jumbo about interning sys.path strings is so the code can cache whether a sys.path entry is a file or a directory, without that normal imports were slowed down by 30% or so. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

This is not the case. Much of the complexity of the patch stems from an (atleast perceived) attempt to reorganize import.c. For example: - why did is_builtin have to go? - Was the ExternalNames dictionary really necessary? - Why was the HAVE_READLINK chunk moved? - What is the purpose of Get/SetScriptName? Regards, Martin

Martin v. Löwis wrote:
I see that my patch to import.c has been criticized as adding to the complexity of an already complex module.
This is not the case. Much of the complexity of the patch stems from an (atleast perceived) attempt to reorganize import.c. For example:
I think package imports really are to blame, because they require the zip import mechanism to be within import.c. Remember that PEP 273 semantics require that a zip archive is equivalent to a directory hierarchy, not just a single directory. Then this must work on __path__: /a/b/c/myarchive.zip/subdir/ And consider a package import which changes the path, and then imports subpackages, each of which may also alter the path. I don't see how you can jump out of import.c on a sys.path item and make this work, because the new path is coming from package import, not the original sys.path.
- why did is_builtin have to go?
I introduced a dictionary of zip file contents to avoid repeated linear searches of the zip file. Since builtins are a constant list, it was easy to add builtin names too. I was not necessary to ditch is_builtin, but it was more natural to do so.
- Was the ExternalNames dictionary really necessary?
Yikes! that is where I record the zip file contents. That avoids the linear search. OK, so once I had a dictionary of zip names, it was natural to add other info, such as directory file names too.
- Why was the HAVE_READLINK chunk moved? - What is the purpose of Get/SetScriptName?
These changes and several others were necessary to be able to import the Python standard lib from a zip file. The problem was the attempt to import site.py before the sys.path was fully set up. This is arguably a bug in Python. The changes moved all sys.path setup to getpath and out of sysmodule.c. This isn't an issue of zip importing per se. It is the avoidance of phase errors during start up. For example, you need a full valid sys.path to be able to import zlib.pyd (on Windows) so you read a compressed zip archive. Then you import site.py plus whatever it imports. This must be available in the zip standard lib file, so zip importing must already work. Thus the need for careful bootstrapping. JimA

So what would have been the problem with linear search that makes this extra data structure strictly necessary to have?
I think you can't really cover all cases. For example, your patch fails to find zlib in the Unix build tree, as the build directory is added to sys.path only in site.py, and your patch won't attempt to import zlib outside InitZip. So it might have been good to explain somewhere what bootstrapping scenarios this patch is meant to support. Regards, Martin

James C. Ahlstrom wrote:
This has not been questioned.
I think my counterpatch (even if it's not done yet) shows this is simply not true.
Erm, I think package import is very recursive by *nature*. I'm not sure what function you mean that could be flat.
This is darned confusing.
So true!
It's also dubious how this would help: the current code works well and is stable. I would fear that moving any of this to Python would cause bootstrapping problems.
Yes, and luckily noone is proposing to do that. [ ... ]
I've learned them indeed over the last couple of days, but I didmanage to get it working by only touching import.c and importdl.h. But my current working version also touches pythonrun.[hc], as it has a new _PyImportHooks_Init function. (The reason this was neccesary was I wanted a subclass of ImportError in the zipimport module; I learned the hard way that you can't create new exceptions from C before the exceptions module is loaded... So I had to move my init code from _PyImport_Init to a separate function, which is called by Py_Initialize().)
You probably need these other patches even if you reject my import.c patch.
I doubt it. Just

On Monday, Dec 9, 2002, at 18:24 Europe/Amsterdam, James C. Ahlstrom wrote:
Actually, if the new improved import has the right set of hooks then the macintosh stuff can go away. 90% of it does basically the same as what zip-import does: it allows you to put files on sys.path and looks in those files for modules. The mumbo-jumbo about interning sys.path strings is so the code can cache whether a sys.path entry is a file or a directory, without that normal imports were slowed down by 30% or so. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
participants (4)
-
Jack Jansen
-
James C. Ahlstrom
-
Just van Rossum
-
Martin v. Löwis