Mailman 3 December 2002 - Python-Dev

Proposed changes to linuxaudiodev
by Greg Ward Dec. 10, 2002

Dec. 10, 2002

Hi all -- I've been hacking sporadically on the linuxaudiodev module for the last couple of days. Initial results are in patch #645786 (www.python.org/sf/645786). The main goal of this patch is to make linuxaudiodev objects a thinner wrapper around the underlying OSS device driver, while still providing some highish-level convenience methods -- see my comments in the patch for details. There's a slight theoretical risk of backwards incompatibility, though. Since this module has never been documented, and since its current behaviour is such that doing anything really funky is severely curtailed, I very much doubt this will be a problem. Is anyone doing anything with linuxaudiodev more sophisticated than playing silly beep sounds? Should I ask around on python-list to be sure I won't ruin anyone's day with this change? Oh yeah: someone noted in the CVS history that the module is misnamed. This is true; it should probably have been called ossaudiodev -- OSS is the current standard audio API used by Linux and, I think, various BSDs. It's also available for a bunch of commercial Unices. Greg -- Greg Ward <gward(a)python.net> http://www.gerg.ca/ Just because you're paranoid doesn't mean they *aren't* out to get you.

3 6

Complexity of import.c
by James C. Ahlstrom Dec. 10, 2002

Dec. 10, 2002

I see that my patch to import.c has been criticized as adding to the complexity of an already complex module. I don't entirely agree. If you want to have the standard library available as a zip module, you must have zip file import available in C. The patch I proposed in PEP 273 was a minimalist patch; it adds as little as possible while still keeping package import and preserving all import semantics. IMHO, the real complexity of import.c comes not from the double loop for path in sys.path: for suffix in (".pyc", ".pyo",...): fopen(...) and not from the caching of zip archive names. It comes from the use of import.c to perform package imports. Package imports are performed by creating recursive calls to functions that would otherwise be flat, and by replacing sys.path with the package path. This is darned confusing. At the time PEP 273 was discussed, I proposed moving package imports out of import.c into a Python module such as Gordon's iu.py or Greg's imputil.py. This was rejected because package users feared that package imports would be slowed down. The speed of Python startup and import was a concern. I understand that people want to generalize zip imports, and create a better import hook mechanism. Some feel that now is the time to do it. This is my humble opinion: Replacing import.c plus zip import with an equally complex import.c is a fundamental mistake. The current import.c is understood (by few), and the zip file import patch adds the minimum required. If a different import.c is accepted, then it should be MUCH simpler, and support only flat directory searches and zip directories. Package imports and import hooks should be moved to a Python module such as Gordon's iu.py. And import.c must bootstrap iu.py and all its imports from either a directory search or a zip file. An improved import hook mechanism should have its own PEP. I tried my best not to break the Macintosh port, which has a lot of special code. A replacement import.c should do the same. My patch modified many other Python .c and .py files to solve several difficult bootstrap problems. If you replace just import.c you will get a painful lesson in bootstrapping. You probably need these other patches even if you reject my import.c patch. Jim Ahlstrom

4 6

Re: [Patches] Patch for xmlrpc encoding
by M.-A. Lemburg Dec. 10, 2002

Dec. 10, 2002

Ragnar Kjørstad wrote: > Hi > > The dumps-method in xmlrpclib has the following comment: > All 8-bit strings in the data structure are assumed to use the > packet encoding. Unicode strings are automatically converted, > where necessary. > > > This doesn't work very well. In our particular case we're using latin_1 > as our default encoding, and we're using UTF-8 for the packet encoding. > We can't really change the default encoding, because the sql-modules > transfer latin_1 encoded data and we can't change the packet encoding to > latin_1 because the xmlrpc-client (php) doesn't work with that. > > The attached patch changes xmlrpclib to convert strings to unicode using > the default encoding, and then convert them back to strings with the > packet encoding. If unicode is not available it falls back to the old > behaviour. I believe this is overkill. If you need this behaviour, subclass the Marshaller in xmlrpclib and add your feature to that subclass. Then replace the Marshaller class in xmlrpclib with your subclass. Aside: xmlrpclib should support subclassing the Marshaller and Unmarshaller more transparently. Currently, the two are hard-coded into the rest of xmlrpclib without the possibility to provide your own subclasses without tweaking xmlrpclib from the outside. > I guess for performance it could check if the defaultencoding is the > same as the packet-encoding, but my guess is that it hardly ever is, so > no reason to optimize for it. > > Note; I'm not at all sure this is the best way to fix the problem. If > it's not, please feel free to ignore this patch, or even better - tell > me what the preferable fix is. Please post patches using the SourceForge patch manager. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

6 7

zipimport, round 4
by Just van Rossum Dec. 9, 2002

Dec. 9, 2002

Here's a new patch. Changes include: - pyc time stamp checking (thanks Skip and </F>!) - better python -v output - also works when zipimport is built dynamically I've written a note about various aspects of the patch (pasted below) but I'm not sure it's PEP-ready yet. Comments are more than welcome! Just --------------------- This note is in a way an addendum to PEP 273. I fully agree the points and goals of the PEP, except for the sections "Efficiency", "Directory Imports" and "Custom Imports". However, I disagree strongly with much of the design and implementation of PEP 273. This note presents an alternative, with a matching implementation. A brief history of import.c -- digging through its cvs log. When Python was brand new, there were only builtin modules and .py files. Then .pyc support was added and about half a year later support for dynamically loaded C extension was implemented. Then came frozen modules. Then Guido rewrote much of import.c, introducing the filedescr struct {suffix, mode, type}, allowing for some level of (builtin) extension of the import mechanism. This was just before Python 1.1 was released. Since then, the only big change has been package support (in 1997), which added a lot of complexity. (The __import__ hook was quietly added in 1995, it's not even mentioned in the log entry of ceval.c r2.69, I had to do a manual binary search to find it...) All later import extensions were either implemented using the filedescr mechanism and/or hardcoded in find_module and load_module. This ranges from reading byte code from Macintosh resources to Windows registry-based imports. Every single time this involved another test in find_module() and another branch in the load_module() switch. "This has to stop." The PEP 273 implementation. Obviously the PEP 273 implementation has to add *something* to import.c, but it makes a few mistakes: - it's badly factored (for example it adds the full implementation of reading zip files to import.c.) - it adds a questionable new feature: directory caching. The original author claimed this was needed for zip imports, but instead solving the problem locally for zip files the feature is added to the builtin import mechanism as a whole. Although this causes some speedup (especially for network file system imports), this is bad, for several reasons: - it's not strictly *needed* for builtin import - it's not a frequent feature request from users (as far as I know) - it makes import.c even more complicated than it already is (note that I say "complicated", not "complex") - it changes semantics: if a module is added to the file system *after* the directory contents has been cached, it will not be found. This might only be a real issue for an IDE that runs code inside the IDE process, but still. A different approach. An __import__ hook is close to useless for the problem at hand, as it needs to reimplement much of import.c from scratch. This can be witnessed in Guido's old ihooks.py, Greg Stein's imputils.py and Gordon McMillan's iu.py, each of which are failry complex, and not compatible with any other. So we either have to add just another import type to import.c for zip archives, or we can add a more general import hook. Let's assume for a moment we want to do the *former*. The most important goal is for zip file names on sys.path and PYTHONPATH to "just work" -- as if a zip archive is just another directory. So when traversing sys.path, each item must be checked for zip-file-ness, and if it is, the zip file's file index needs to be read so we can determine whether the module being imported is in there. I went for an OO approach, and represent a zip file with an instance of the zipimporter class. Obviously it's quite expensive to read the zip file index again and again, so we have to maintain a cache of zipimporter objects. The most Pythonic approach would be to use a dict, using the sys.path item as the key. This cache could be private to the zip import mechanism, but it makes sense to also cache the fact that a sys.path item is *not* a zip file. A simple solution is to map such a path item to None. By now it makes more sense to have this cache available in import.c. The zipimporter protocol. The zipimporter's constructor takes one argument: a path to a zip archive. It will raise an exception if the file is not found or if it's not a zip file. The import mechanism works in two steps: 1) find the module, 2) if found, load the module. The zipimporter object follows this pattern, it has two methods: find_module(name): Returns None if the module wasn't found, or the zipimporter object itself if it was. load_module(fullname): Load the module, return it (or propagate an exception). The main path traversing loop in import.c will then look like this (omitting the caching mechanics for brevity): def find_module(name, path): if isbuiltin(name): return builtin_filedescr if isfrozen(name): return frozen_filedescr if path is None: path = sys.path for p in sys.path: try: v = zipimporter(p) except ZipImportError: pass else: w = v.find_module(name) if w is not None: return w ...handle builtin file system import... Packages. Paths to subdirectories of the zip archive must also work, on sys.path for one, but most importantly for pkg.__path__. For example: "Archive.zip/distutils/". Such a path will most likely be added *after* "StdLib.zip" has been read (after all, the parent package is *always* loaded before any submodules), so all I need to do is strip the sub path, and look up the bare .zip path in the cache. A *new* zipimporter instance is then created, which references the same (internal, but visible) file directory info as the "bare" zipimporter object. A .prefix contains the sub path: >>> from zipimport import zipimporter >>> z = zipimporter("Archive.zip/distutils") >>> z.archive 'Archive.zip' >>> z.prefix 'distutils/' >>> Beyond zipimport. So there we are, zipimport works, with just a relatively minor impact on import.c. The obvious next question is: what about other import types, whether future needs for the core, or third party needs? It turns out the above approach is *easily* generalized to handle *arbitrary* path-based import hooks. Instead of just checking for zip-ness, it can check a list of candidates (again, caching cruft omitted): def find_module(name, path): if isbuiltin(name): return builtin_filedescr if isfrozen(name): return frozen_filedescr if path is None: path = sys.path for p in sys.path: v = None for hook in sys.import_hooks: try: v = hook(p) except ImportError: pass else: break if v is not None: w = v.find_module(name) if w is not None: return w ...handle builtin file system import... Now, one tiny step further, and we have something that fairly closely mimics Gordon McMillan's iu.py. That tiny step is what Gordon calls the "metapath". It works like this: def find_module(name, path): for v in sys.meta_path: w = v.find_module(name, path) if w is not None: return w # fall through to builtin behavior if isbuiltin(name): return builtin_filedescr [ rest same as above ] An item on sys.meta_path can override *anything*, and does not need an item on sys.path to get invoked. The find_module() method of such an object has an extra argument: path. It is None or parent.__path__. If it's None, sys.path is implied. The Patch. I've modified import.c to support all of the above. Even the path handler cache is exposed: sys.path_importers. (This is what Gordon calls "shadowpath"; I'm not happy with either name, I'm open to suggestions.) The sys.meta_path addition isn't strictly neccesary, but it's a useful feature and I think generalizes and exposes the import mechanism to the maximum of what is possible with the current state of import.c. The protocol details are open to discussion. They are partly based on what's relatively easily doable in import.c. Other than that I've tried to follow common sense as to what is practical for writing import hooks. The patch is not yet complete, especially regarding integration with the imp module: you can't currently use the imp module to invoke any import hook. I have some ideas on how to do this, but I'd like to focus on the basics first. Also: the reload() function is currently not supported. This will be easy to fix later. I *thought* about allowing objects on sys.path (which would then work as an importer object) but for now I've not done it as sys.meta_path makes it somewhat redundant. It would be easy to do, though: it would add another 10 lines or so to import.c. I've tested the zipimporter module both as a builtin and as a shared lib: it works for me in both configurations. But when building it dynamically: it _has_ to be available on sys.path *before* site.py is run. When running from the build dir on unix: add the appropriate build/lib.* dir to your PYTHONPATH and it should work.

1 0

zipimport, round 3 (or would that be that 37?)
by Just van Rossum Dec. 9, 2002

Dec. 9, 2002

Here's the current state of my stuff. I'd rather post it now than to fiddle more: I'm too tired, I'd only break stuff... I currently build the zipimport module as a builtin to avoids some bootstrapping pitfalls (*). It should still work as a shared lib *provided* it is available on sys.path *before* site.py is run. The same goes more or less for zlib unless you're using uncompressed archives. The test suite ran fine with the std lib packaged as a zip file (minus the test suite it, as it globs for *.py in Lib/test/), apart from three tests that do stuff with module.__file__ (eg. email can't find it's data files ;-). Highlights: - zip file names on sys.path work - zip file names PYTHONPATH work - sys.path still contains strings - module.__file__ is still a regular string, looking like this: '/path/to/archive.zip/path/to/module.pyc' - pkg.__path__ still contains a regular string, that looks like this: '/path/to/archive.zip/path/to/packagedir/' I'll write a bit about the mechanics tomorrow, but in the meantime I would like to see some test results/bug reports/bug fixes from Windows developers. Please don't start with the full monty (ie. the std lib), try a small test archive first please ;-) There's still plenty to do regarding imp module integration, minor things like reload() not being supported yet, etc. Nevertheless I think the import.c patch is remarkably small for what it accomplishes. Needless to say, the zipimport module itself also needs work. Fun, fun! Just *) It's mostly a problem when you're running Python from the source directory, when site.py adds stuff to sys.path to get a running system: by then it's too late to find zipimport. It should be less of a problem with a regular install, but I haven't tried.

6 16

RE: [Python-Dev] New and Improved Import Hooks
by Moore, Paul Dec. 9, 2002

Dec. 9, 2002

From: Gordon McMillan [mailto:gmcm@hypernet.com] > BTW, iu does allow non string objects in sys.path. > If it exposes "getmod(name)", that will be tried > in preference to iu's own machinery. I just tried this out. As far as I can see, iu works perfectly with a simple getmod() method. Why does your patch require find_module() and load_module()? I'm not criticising, just trying to understand the requirements... (Anything to make writing an import hook easier!!!) You'll notice I'm wavering over my dislike of having objects on sys.path :-) My interest in import hooks stems from wanting to be able to implement something like Java's 'executable jar file' support, where an application, data and all, can be packaged up as a single file. In theory, there's no real problem with this (I can write import hooks to load from a zip file right now), but in practice, the bootstrapping problems are quite nasty. The more in-core (or at least standard library) support there is for these features, the less of a bootstrap problem I have... Paul. """ Example code for iu (adds an object to sys.path which satisfies *all* requests for modules and packages with a new empty module...) """ import iu import imp import sys iu.ImportManager().install() class Tester: def getmod(self, name): print "Requested", name, locals() m = imp.new_module(name) # Either set __path__ to contain an object, # or set __path__ to contain a path, and # __importsub__ to the object... m.__path__ = [ Tester() ] sys.modules[name] = m return m sys.path.append(Tester())

5 9

Zip imports, PEP 273, and the .zip extension
by Paul Moore Dec. 9, 2002

Dec. 9, 2002

PEP 273 does not mandate a particular file extension for zip files. However, both current implementations (Jim Ahlstrom's patch and Just's import hooks) recognise zip files via the extension ".zip". I'd like to suggest that this should not be the case, and that zipfiles should be recognised by the embedded signature, as is done by zipfile.is_zipfile(). This will (slightly) slow down the initial checking for zipfiles, as it requires a file read, but if the result of the check is cached the performance impact will be minimal. The main advantage of this is in generality. However, I have a specific usage in mind - namely, concatenating a Python script with its supporting code as a zip archive. (Zip archives can have arbitrary data prepended without affecting their validity). Then, simply by executing sys.path.append(sys.argv[0]) the script can access its own embedded modules. One problem with getting this to work is that at present, the Python parser will choke on data appended to the script. This can be avoided under Windows by a system-specific hack, of embedding a ^Z character in the script (which text-mode I/O treats as EOF) just before the appended data. In the longer term, a language-mandated "stop parsing here" token, like Perl's __END__, might be useful. But that's not something I'm proposing right now (unless an expert in Python's parser steps up and says "oh, that's easy - I can implement that in a couple of lines of code" :-)) Any comments? Paul. -- This signature intentionally left blank

4 3

Import compromise
by Aahz Dec. 9, 2002

Dec. 9, 2002

Guido, I call for a Pronouncement: I'm seeing some convergence on basic criteria needed for a Grand Import Redesign, but I'm not seeing convergence on the actual design itself. Time is running short for 2.3, and people making hasty decisions make poor decisions. OTOH, the complaints raised about the current patch for PEP 273 have merit, I think. On the gripping hand, we really want PEP 273 in some form for 2.3. So here's my proposed compromise: Split the PEP 273 patch into two parts. The first part is the absolutely critical modifications to import.c, which basically boils down to making a function call every time you access a '*.zip' string on sys.path. Everything else goes into zipimport.c, a compiled-in module like socket. (I'd even support making zipimport.c an include file for import.c) Yes, this means we lose the caching for non-ZIP files for 2.3 -- I vote YAGNI. Yes, this makes more work over the long haul, but it saves stress and time now. Yes, it's more work than just accepting the current PEP 273 patch, but I think there are too many good arguments against that. zipimport will *NOT* be a publicly exposed library for 2.3, possibly never. -- Aahz (aahz(a)pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy(a)lion.austin.ibm.com

9 14

__getstate__() not inherited when __slots__ present
by Greg Ward Dec. 9, 2002

Dec. 9, 2002

If a class and its superclass both define __slots__, it appears that __getstate__() is not inherited from the superclass. Example: """ class Base (object): __slots__ = [] def __getstate__ (self): return tuple([getattr(self, attr) for attr in self.__slots__]) class Thing (Base): __slots__ = ['a', 'b'] def __init__ (self): self.a = 42 self.b = "boo!" abstract = Base() thing = Thing() print abstract.__getstate__ print thing.__getstate__ """ When I run this with a not-quite-current CVS Python: <bound method Base.__getstate__ of <__main__.Base object at 0x401f73b8>> <built-in function __getstate__> The upshot of this is that I can't just define one __getstate__() in the superclass of a bunch of __slots__-using classes -- I guess I'll have to set __getstate__ manually for each class. ;-( If this is a feature, is it documented anywhere? (BTW, I see the same behaviour with Python 2.2.2.) Greg -- Greg Ward <gward(a)python.net> http://www.gerg.ca/ "Very funny, Scotty. Now beam my *clothes* down."

3 2

Re: Another approach for the import mechanism
by Chermside, Michael Dec. 9, 2002

Dec. 9, 2002

Gustavo Niemeyer writes: > - Don't have to change path to use compressed packages (at least > not if you want to provide compressed packages, individual > compressed modules or the standard library). > > - Don't have to specify the compression type hardcoded. > > - Allows one to ship a package inside a zip file, without asking > the user to change his path, and without hacking the package. > > - Allows one to compress a single file (foobar.py.bz2). > > I belive that my propose is quite clear now. If there are no > additional supporters, there's no reason to go on. I think this proposal would make sense IF compression were an important goal here. But to me, it isn't. Zip does two things... it aggregates into a single file (maintaining directory structure) and it compresses. Of the two, I find the aggregation important and the compression a mere side effect. This is why I really don't care much about switching to a different compression (or aggregation) format -- one standard way to do it is more useful to me than a BETTER way. It is why I don't care about compressing a single file. Basically, disk space is cheap, but effort to keep track of (and distribute) complex file hierarchies isn't. > Thanks to everyone who discussed. And thanks for your contributions too. I don't realize exactly what I was looking for from .zip until you explained so clearly the benefits of your proposal. -- Michael Chermside

4 3