Re: [Python-Dev] Autoloading? (Making Queue.Queue easier to use)
Guido van Rossum writes: Code that *doesn't* need Queue but does use threading shouldn't have to pay for loading Queue.py.
Greg Ewing responds:
What we want in this kind of situation is some sort of autoloading mechanism, so you can import something from a module and have it trigger the loading of another module behind the scenes to provide it.
John Camera comments:
Bad idea unless it is tied to a namespace. So that users knows where this auto-loaded functionality is coming from. Otherwise it's just as bad as 'from xxx import *'.
John, I think what Greg is suggesting is that we include Queue in the threading module, but that we use a Clever Trick(TM) to address Guido's point by not actually loading the Queue code until the first time (if ever) that it is used. I'm not familiar with the clever trick Greg is proposing, but I do agree that _IF_ everything else were equal, then Queue seems to belong in the threading module. My biggest reason is that I think anyone who is new to threading probably shouldn't use any communication mechanism OTHER than Queue or something similar which has been carefully designed by someone knowlegable. -- Michael Chermside
Michael> I'm not familiar with the clever trick Greg is proposing, but I Michael> do agree that _IF_ everything else were equal, then Queue seems Michael> to belong in the threading module. My biggest reason is that I Michael> think anyone who is new to threading probably shouldn't use any Michael> communication mechanism OTHER than Queue or something similar Michael> which has been carefully designed by someone knowlegable. Is the Queue class very useful outside a multithreaded context? The notion of a queue as a data structure has meaning outside of threaded applications. Its presence might seduce a new programmer into thinking it is subtly different than it really is. A cursory test suggests that it works, though q.get() on a empty queue seems a bit counterproductive. Also, Queue objects are probably quite a bit less efficient than lists. Taken as a whole, perhaps a stronger attachment with the threading module isn't such a bad idea. Skip
On 10/12/05, skip@pobox.com <skip@pobox.com> wrote:
Is the Queue class very useful outside a multithreaded context?
No. It was designed specifically for inter-thread communication. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On 10/12/05, Michael Chermside <mcherm@mcherm.com> wrote:
I'm not familiar with the clever trick Greg is proposing, but I do agree that _IF_ everything else were equal, then Queue seems to belong in the threading module. My biggest reason is that I think anyone who is new to threading probably shouldn't use any communication mechanism OTHER than Queue or something similar which has been carefully designed by someone knowlegable.
I *still* disagree. At some level, Queue is just an application of threading, while the threading module provides the basic API (never mind that there's an even more basic API, the thread module -- it's too low-level to consider and we actively recommend against it, at least I hope we do). While at this point there may be no other "applications" of threading in the standard library, that may not remain the case; it's quite possble that some of the discussions of threading APIs will eventually lead to a PEP proposing a different threading paradigm build on top of the threading module. I'm using the word "application" loosely here because I realize one person's application is another's primitive operation. But I object to the idea that just because A and B are often used together or A is recommended for programs using B that A and B should live in the same module. We don't put urllib and httplib in the socket module either! Now, if we had a package structure, I would sure like to see threading and Queue end up as neighbors in the same package. But I don't think it's right to package them all up in the same module. (Not to say that autoloading is a bad idea; I'm -0 on it for myself, but I can see use cases; but it doesn't change my mind on whether Queue should become threading.Queue. I guess I didn't articulate my reasoning for being against that well previously and tried to hide behind the load time argument.) BTW, Queue.Queue violates a recent module naming standard; it is now considered bad style to name the class and the module the same. Modules and packages should have short all-lowercase names, classes should be CapWords. Even the same but different case is bad style. (I'd suggest queueing.Queue except nobody can type that right. :) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> At some level, Queue is just an application of threading, while Guido> the threading module provides the basic API ... While Queue is built on top of threading Lock and Condition objects, it is a highly useful synchronization mechanism in its own right, and is almost certainly easier to use correctly (at least for novices) than the lower-level synchronization objects the threading module provides. If threading is the "friendly" version of thread, perhaps Queue should be considered the "friendly" synchronization object. (I'm playing the devil's advocate here. I'm fine with Queue being where it is.) Skip
skip@pobox.com wrote:
Guido> At some level, Queue is just an application of threading, while Guido> the threading module provides the basic API ...
While Queue is built on top of threading Lock and Condition objects, it is a highly useful synchronization mechanism in its own right, and is almost certainly easier to use correctly (at least for novices) than the lower-level synchronization objects the threading module provides. If threading is the "friendly" version of thread, perhaps Queue should be considered the "friendly" synchronization object.
(I'm playing the devil's advocate here. I'm fine with Queue being where it is.)
If we *don't* make Queue a part of the basic threading API (and I think Guido is right that it doesn't need to be), then I suggest we create a threadtools module. So the thread-related API would actually have three layers: - _thread (currently "_thread") for the low-level guts - threading for the basic thread API that any threaded app needs - threadtools for the more complex "application-specific" items Initially threadtools would just contain Queue, but other candidates for inclusion in the future might be standard implementations of: - PeriodicTimer (see below) - FutureCall (threading out a call, only blocking when you need the result - QueueThread (a thread with "inbox" and "outbox" Queues) - ThreadPool (up to the application to make sure the Threads are reusable) - threading related decorators Cheers, Nick. P.S. PeriodicTimer would be a variant of threading Timer which simply replaces the run method with: def run(): while 1: self.finished.wait(self.interval) if self.finished.isSet(): break self.function(*self.args, **self.kwds) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
Nick> So the thread-related API would actually have three layers: Nick> - _thread (currently "_thread") for the low-level guts Nick> - threading for the basic thread API that any threaded app needs Nick> - threadtools for the more complex "application-specific" items Nick> Initially threadtools would just contain Queue, but other candidates for Nick> inclusion in the future might be standard implementations of: Nick> - PeriodicTimer (see below) Nick> - FutureCall (threading out a call, only blocking when you need the result Nick> - QueueThread (a thread with "inbox" and "outbox" Queues) Nick> - ThreadPool (up to the application to make sure the Threads are reusable) Nick> - threading related decorators Given your list of stuff to go in a threadtools module, I still think you need something to hold Lock, RLock, Condition and Semaphore. See my previous post (subject: Threading and synchronization primitives) about a threadutils module to hold these somewhat lower-level sync primitives. In most cases I don't think programmers need them. OTOH, providing some higher level abstractions seems to make sense. (I have to admit I have no idea what a QueueThread's outbox queue would be used for. Queues are generally multi-producer, single-consumer objects. It makes sense for a thread to have an inbox. I'm not so sure about an outbox.) Skip
On Thu, Oct 13, 2005, skip@pobox.com wrote:
Given your list of stuff to go in a threadtools module, I still think you need something to hold Lock, RLock, Condition and Semaphore. See my previous post (subject: Threading and synchronization primitives) about a threadutils module to hold these somewhat lower-level sync primitives. In most cases I don't think programmers need them. OTOH, providing some higher level abstractions seems to make sense. (I have to admit I have no idea what a QueueThread's outbox queue would be used for. Queues are generally multi-producer, single-consumer objects. It makes sense for a thread to have an inbox. I'm not so sure about an outbox.)
If you look at my thread tutorial, the spider thread pool uses a single-producer, multiple-consumer queue to feed URLs to the retrieving threads. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair
Guido van Rossum wrote:
BTW, Queue.Queue violates a recent module naming standard; it is now considered bad style to name the class and the module the same. Modules and packages should have short all-lowercase names, classes should be CapWords. Even the same but different case is bad style.
unfortunately, this standard seem to result in generic "spamtools" modules into which people throw everything that's even remotely related to "spam", followed by complaints about bloat and performance from users, followed by various more or less stupid attempts to implement lazy loading of hidden in- ternal modules, followed by more complaints from users who no longer has a clear view of what's really going on in there... I think I'll stick to the old standard for a few more years... </F>
On 10/13/05, Fredrik Lundh <fredrik@pythonware.com> wrote:
Guido van Rossum wrote:
BTW, Queue.Queue violates a recent module naming standard; it is now considered bad style to name the class and the module the same. Modules and packages should have short all-lowercase names, classes should be CapWords. Even the same but different case is bad style.
unfortunately, this standard seem to result in generic "spamtools" modules into which people throw everything that's even remotely related to "spam", followed by complaints about bloat and performance from users, followed by various more or less stupid attempts to implement lazy loading of hidden in- ternal modules, followed by more complaints from users who no longer has a clear view of what's really going on in there...
I think I'll stick to the old standard for a few more years...
Yeah, until you've learned to use packages. :( -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
BTW, Queue.Queue violates a recent module naming standard; it is now considered bad style to name the class and the module the same. Modules and packages should have short all-lowercase names, classes should be CapWords. Even the same but different case is bad style.
unfortunately, this standard seem to result in generic "spamtools" modules into which people throw everything that's even remotely related to "spam", followed by complaints about bloat and performance from users, followed by various more or less stupid attempts to implement lazy loading of hidden in- ternal modules, followed by more complaints from users who no longer has a clear view of what's really going on in there...
I think I'll stick to the old standard for a few more years...
Yeah, until you've learned to use packages. :(
what does packages has to do with this ? does this new module naming standard only apply to toplevel package names ? </F>
unfortunately, this standard seem to result in generic "spamtools" modules into which people throw everything that's even remotely related to "spam", followed by complaints about bloat and performance from users, followed by various more or less stupid attempts to implement lazy loading of hidden in- ternal modules, followed by more complaints from users who no longer has a clear view of what's really going on in there...
BTW, what's the performance problem in importing unnecessary stuff (assuming pyc files are already generated) ? Has it been evaluated somewhere ?
Antoine Pitrou wrote:
unfortunately, this standard seem to result in generic "spamtools" modules into which people throw everything that's even remotely related to "spam", followed by complaints about bloat and performance from users, followed by various more or less stupid attempts to implement lazy loading of hidden in- ternal modules, followed by more complaints from users who no longer has a clear view of what's really going on in there...
BTW, what's the performance problem in importing unnecessary stuff (assuming pyc files are already generated) ?
larger modules can easily take 0.1-0.2 seconds to import (at least if they use enough external dependencies). that may not be a lot of time in itself, but it can result in several seconds extra startup time for a larger program. importing unneeded modules also add to the process size, of course. you don't need to import too many modules to gobble up a couple of megabytes... </F>
>> BTW, what's the performance problem in importing unnecessary stuff >> (assuming pyc files are already generated) ? Fredrik> larger modules can easily take 0.1-0.2 seconds to import (at Fredrik> least if they use enough external dependencies). I wish it was that short. At work we use lots of SWIG-wrapped C++ libraries. Whole lotta dynamic linking goin' on... In our case I don't think autoloading would help all that much. We actually use all that stuff. The best we could do would be to defer the link step for a couple seconds. Skip
Michael Chermside wrote:
John, I think what Greg is suggesting is that we include Queue in the threading module, but that we use a Clever Trick(TM) to address Guido's point by not actually loading the Queue code until the first time (if ever) that it is used.
I wasn't actually going so far as to suggest doing this, rather pointing out that, if we had an autoloading mechanism, this would be an obvious use case for it.
I'm not familiar with the clever trick Greg is proposing,
I'll see if I can cook up an example of it to show. Be warned, it is very hackish... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
I wrote:
I'll see if I can cook up an example of it to show. Be warned, it is very hackish...
Well, here it is. It's even slightly uglier than I thought it would be due to the inability to change the class of a module these days. When you run it, you should get Imported my_module Loading the spam module Glorious processed meat product! Glorious processed meat product! #-------------------------------------------------------------- # # test.py # import my_module print "Imported my_module" my_module.spam() my_module.spam() # # my_module.py # import autoloading autoloading.register(__name__, {'spam': 'spam_module'}) # # spam_module.py # print "Loading the spam module" def spam(): print "Glorious processed meat product!" # # autoloading.py # import sys class AutoloadingModule(object): def __getattr__(self, name): modname = self.__dict__['_autoload'][name] module = __import__(modname, self.__dict__, {}, [name]) value = getattr(module, name) setattr(self, name, value) return value def register(module_name, mapping): module = sys.modules[module_name] m2 = AutoloadingModule() m2.__name__ = module.__name__ m2.__dict__ = module.__dict__ # Drop all references to the original module before assigning # the _autoload attribute. Otherwise, when the original module # gets cleared, _autoload is set to None. sys.modules[module_name] = m2 del module m2._autoload = mapping #-------------------------------------------------------------- -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
I just tried to implement an autoloader using a technique I'm sure I used in an earlier Python version, but it no longer seems to be allowed. I'm trying to change the __class__ of a newly-imported module to a subclass of types.ModuleType, but I'm getting TypeError: __class__ assignment: only for heap types Have the rules concerning assignent to __class__ been made more restrictive recently? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
At 01:47 PM 10/13/2005 +1300, Greg Ewing wrote:
I just tried to implement an autoloader using a technique I'm sure I used in an earlier Python version, but it no longer seems to be allowed.
I'm trying to change the __class__ of a newly-imported module to a subclass of types.ModuleType, but I'm getting
TypeError: __class__ assignment: only for heap types
Have the rules concerning assignent to __class__ been made more restrictive recently?
It happened in Python 2.3, actually. The best way to work around this is to add an instance of your subclass to sys.modules *first*, then call reload() on it to make the normal import process work. PEAK uses this to implement lazy loading. Actually, for your purposes, you might be able to just replace the module object and copy its contents to the new module's dictionary.
Phillip J. Eby wrote:
At 01:47 PM 10/13/2005 +1300, Greg Ewing wrote:
I'm trying to change the __class__ of a newly-imported module to a subclass of types.ModuleType
It happened in Python 2.3, actually.
Is there a discussion anywhere about the reason this was done? It would be useful if this capability could be regained somehow without breaking things. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Phillip J. Eby wrote:
At 01:47 PM 10/13/2005 +1300, Greg Ewing wrote:
I'm trying to change the __class__ of a newly-imported module to a subclass of types.ModuleType
It happened in Python 2.3, actually.
Is there a discussion anywhere about the reason this was done? It would be useful if this capability could be regained somehow without breaking things.
Well, I think it's undesirable that you be able to do this to, e.g., strings. Modules are something of a greyer area, I guess. Cheers, mwh -- You sound surprised. We're talking about a government department here - they have procedures, not intelligence. -- Ben Hutchings, cam.misc
At 04:02 PM 10/13/2005 +0100, Michael Hudson wrote:
Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Phillip J. Eby wrote:
At 01:47 PM 10/13/2005 +1300, Greg Ewing wrote:
I'm trying to change the __class__ of a newly-imported module to a subclass of types.ModuleType
It happened in Python 2.3, actually.
Is there a discussion anywhere about the reason this was done? It would be useful if this capability could be regained somehow without breaking things.
Well, I think it's undesirable that you be able to do this to, e.g., strings. Modules are something of a greyer area, I guess.
Actually, it's desirable to be *able* to do it for anything. But certainly for otherwise-immutable objects it can lead to aliasing issues. For mutable objects, it's *very* desirable, and I think the rules added in 2.3 might have been overly strict, as they disallow you changing any built-in type to a non built-in type, even if the allocator is the same. It seems to me the safety check could perhaps be reduced to just checking whether the old and new classes have the same tp_free. (Apart from the layout and other inheritance-related checks, I mean.) (By the way, for an example use case other than modules, note that somebody wrote an "observables" package that could detect mutation of lists and dictionaries in Python 2.2 using __class__ changes, which then became useless as of Python 2.3.)
Why not lazily import modules by importing them when they are needed (i.e inside functions), and not in the top-level module scope? On 10/13/05, Phillip J. Eby <pje@telecommunity.com> wrote:
At 04:02 PM 10/13/2005 +0100, Michael Hudson wrote:
Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Phillip J. Eby wrote:
At 01:47 PM 10/13/2005 +1300, Greg Ewing wrote:
I'm trying to change the __class__ of a newly-imported module to a subclass of types.ModuleType
It happened in Python 2.3, actually.
Is there a discussion anywhere about the reason this was done? It would be useful if this capability could be regained somehow without breaking things.
Well, I think it's undesirable that you be able to do this to, e.g., strings. Modules are something of a greyer area, I guess.
Actually, it's desirable to be *able* to do it for anything. But certainly
for otherwise-immutable objects it can lead to aliasing issues.
For mutable objects, it's *very* desirable, and I think the rules added in 2.3 might have been overly strict, as they disallow you changing any built-in type to a non built-in type, even if the allocator is the same. It seems to me the safety check could perhaps be reduced to just checking whether the old and new classes have the same tp_free. (Apart from the layout and other inheritance-related checks, I mean.)
(By the way, for an example use case other than modules, note that somebody
wrote an "observables" package that could detect mutation of lists and dictionaries in Python 2.2 using __class__ changes, which then became useless as of Python 2.3.)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/eyal.lotem%40gmail.com
Eyal Lotem <eyal.lotem@gmail.com> wrote:
Why not lazily import modules by importing them when they are needed (i.e inside functions), and not in the top-level module scope?
Because then it wouldn't be automatic. The earlier portion of this discussion came from... import module #module.foo does not reference a module module.foo #now module.foo references a module The discussion is about how we can get that kind of behavior. - Josiah
Josiah Carlson wrote:
The earlier portion of this discussion came from...
import module #module.foo does not reference a module module.foo #now module.foo references a module
Or more generally, module.foo now references *something*, not necessarily a module. (In my use case it's a class.) -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
Phillip J. Eby wrote:
Actually, it's desirable to be *able* to do it for anything. But certainly for otherwise-immutable objects it can lead to aliasing issues.
Even for immutables, it could be useful to be able to add behaviour that doesn't mutate anything. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg.ewing@canterbury.ac.nz +--------------------------------------+
At 01:32 PM 10/14/2005 +1300, Greg Ewing wrote:
Phillip J. Eby wrote:
Actually, it's desirable to be *able* to do it for anything. But certainly for otherwise-immutable objects it can lead to aliasing issues.
Even for immutables, it could be useful to be able to add behaviour that doesn't mutate anything.
I meant that just changing its class is a mutation, and since immutables can be shared or cached, that could lead to problems. So I do think it's a reasonable implementation limit to disallow changing the __class__ of an immutable.
Phillip J. Eby wrote:
I meant that just changing its class is a mutation, and since immutables can be shared or cached, that could lead to problems. So I do think it's a reasonable implementation limit to disallow changing the __class__ of an immutable.
That's a fair point. Although I was actually thinking recently of a use case for changing the class of a tuple, inside a Pyrex module for database access. The idea was that the user would be able to supply a custom subclass of tuple for returning the records. To avoid extra copying of the data, I was going to create a normal uninitialised tuple, stuff the data into it, and then change its class to the user-supplied one. But seeing as all this would be happening in Pyrex where the normal restrictions don't apply anyway, I suppose it wouldn't matter if user code wasn't allowed to do this. Greg
participants (12)
-
Aahz -
Antoine Pitrou -
Eyal Lotem -
Fredrik Lundh -
Greg Ewing -
Guido van Rossum -
Josiah Carlson -
Michael Chermside -
Michael Hudson -
Nick Coghlan -
Phillip J. Eby -
skip@pobox.com