<br><br><div><span class="gmail_quote">On 9/27/06, <b class="gmail_sendername">Phillip J. Eby</b> <<a href="mailto:pje@telecommunity.com">pje@telecommunity.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
At 02:11 PM 9/27/2006 -0700, Brett Cannon wrote:<br>>But it has been suggested here that the import machinery be rewritten in<br>>Python. Now I have never touched the import code since it has always had<br>>the reputation of being less than friendly to work with. I am asking for
<br>>opinions from people who have worked with the import machinery before if<br>>it is so bad that it is worth trying to re-implement the import semantics<br>>in pure Python or if in the name of time to just work with the C
<br>>code. Basically I will end up breaking up built-in, .py, .pyc, and<br>>extension modules into individual importers and then have a chaining class<br>>to act as a combined .pyc/.py combination importer (this will also make
<br>>writing out to .pyc files an optional step of the .py import).<br><br>The problem you would run into here would be supporting zip imports.</blockquote><div><br>I have not looked at zipimport so I don't know the exact issue in terms of how it hooks into the import machinery. But a C level API will most likely be needed.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> It<br>would probably be more useful to have a mapping of file types to "format
<br>handlers", because then a filesystem importer or zip importer would then be<br>able to work with any .py/.pyc/.pyo/whatever formats, along with any new<br>ones that are invented, without reinventing the wheel.</blockquote>
<div><br>So you are saying the zipimporter would then pull out of the zip file the individual file to import and pass that to the format-specific importer?<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Thus, whether it's file import, zip import, web import, or whatever, the<br>same handlers would be reusable, and when people invent new extensions like<br>.ptl, .kid, etc., they can just register format handlers instead.</blockquote>
<div><br>So a sepration of data store from data interpretation for importation. My only worry is a possible explosion of checks for the various data types. If you are using the file data store and had .py, .pyc, .so, module.so
, .ptl, and .kid registered that might suck in terms of performance hit. And I am assuming for a web import that it would decide based on the extension of the resulting web address? And checking for the various types might not work well for other data store types. Guess you would need a way to register with the data store exactly what types of data interpretation you might want to check.
<br><br>Other option is to just have the data store do its magic and somehow know what kind of data interpretation is needed for the string returned (e.g., a database data store might implicitly only store .py code and thus know that it will only return a string of source). Then that string and the supposed file extension is passed ot the next step of creating a module from that data string.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Format handlers could of course be based on the PEP 302 protocol, and<br>simply accept a "parent importer" with a get_data() method. So, let's say
<br>you have a PyImporter:<br><br> class PyImporter:<br> def __init__(self, parent_importer):<br> self.parent = parent_importer<br><br> def find_module(self, fullname):<br> path =
fullname.split('.')[-1]+'.py'<br> try:<br> source = self.parent.get_data(path)<br> except IOError:<br> return None<br> else:<br> return PySourceLoader(source)
<br><br>See what I mean? The importers and loaders thus don't have to do direct<br>filesystem operations.</blockquote><div><br>I think so. Basically you want more of a way to stack imports so that the basic importers are just passed the string of what it is supposed to load from. Other importers higher in the chain can handle getting that string.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Of course, to fully support .pyc timestamp checking and writeback, you'd<br>need some sort of "stat" or "getmtime" feature on the parent importer, as
<br>well as perhaps an optional "save_data" method. These would be extensions<br>to PEP 302, but welcome ones.</blockquote><div><br>Could pass the string representing the location of where the string came from. That would allow for the required stat calls for .pyc files as needed without having to implement methods just for this one use case.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Anyway, based on my previous work with pkg_resource, pkgutil, zipimport,<br>import.c
, etc. I would say this is how I'd want to structure a<br>reimplementation of the core system. And if it were for Py3K, I'd probably<br>treat sys.path and all the import hooks associated with it as a single<br>meta-importer on
sys.meta_path -- listed after a meta-importer for handling<br>frozen and built-in modules. (I.e., the meta-importer that uses sys.path<br>and its path hooks would be last on sys.meta_path.)</blockquote><div><br>Ah, interesting idea! Could even go as far as removing
sys.path and just making it an attribute of the base importer if you really wanted to make it just meta_path for imports.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
In other words, sys.meta_path is really the only critical import hook from<br>the raw interpreter's point of view. sys.path, however, (along with<br>sys.path_hooks and sys.path_importer_cache) is critical from the<br>perspective of users, applications, etc., as there has to be some way to
<br>get things onto Python's path in the first place.<br><br></blockquote></div><br>Yeah, I think I get it. I don't know how much it simplifies things for users but I think it might make it easier for alternative import writers.
<br><br>-Brett<br>