Re: [Twisted-Python] What to do when a service fails to start, also, deferred and startService
Hi Glyph Thanks for the detailed reply.
"glyph" == glyph <glyph@divmod.com> writes: glyph> On 27 Nov, 05:16 pm, terry@jon.es wrote:
glyph> For me, baroque and elaborate start-up dances are a code smell. glyph> Services should be as independent as possible. Of course, sometimes glyph> some kind of initialization conversation is unavoidable, but I do glyph> like to try to keep it as short as possible. I do too. Sometimes it takes (me at least) a few iterations before you see how best to do that. glyph> I think you're misunderstanding what a "service" is. The word is, glyph> perhaps, a bit to lofty for its humble job. A service is just an glyph> event notification mechanism that tells you when it's time to start glyph> up, and when it's time to shut down. glyph> I can understand why it would be attractive to misunderstand in this glyph> way, though: IService doesn't do very much, you have requirements glyph> that it doesn't cover, and if it were the thing you understand it to glyph> be then it would cover those requirements. I'm sure that would be glyph> nicer for you :). glyph> This might seem a bit inconsistent, since stopService uses the glyph> return of a Deferred. However, this is for a very specific reason, glyph> not a generalized error-handling case: you may need to prevent the glyph> *rest* of the system (specifically, the reactor) from completely glyph> shutting down until you've managed to cleanly shut down whatever glyph> you're trying to shut down on potentially remote systems. glyph> startService has no such problem though; the service subsystem has glyph> told you "It's time to start up!" - its job is done, and the reactor glyph> isn't going away as part of service startup, so it's your glyph> responsibility as an application author to make sure your other glyph> dependencies are properly initialized. OK, this is helpful - I have been looking at it from a different point of view, as you've guessed.
But if something does go wrong, you've got a failure propagating its way down a errback chain, eventually (unless an errback switches you back to the callback chain) popping out the end and causing the reactor to issue an Unhandled Error message. So you can't indicate that the service has failed to start by throwing, because the exception is going to pop harmlessly out the end of the deferred chain as a generic unhandled error and will not cause Twisted to know that the service couldn't start.
glyph> The key question here is: indicate to whom? If you want to indicate glyph> it to some other object, well, try:except: or addErrback and call a glyph> method on that object. Nothing magic about it. I have code written as a Twisted plugin. So I have a class implementing IServiceMaker and IPlugin, and I create an instance of that class which gets found when I invoke twistd from the command line. So in my case I want to indicate to twistd that the service that my class creates a makeService method to create, but which I do not set in motion, has failed to start and that twistd should exit, or do something other than cheerfully tell me that there's been an Unhandled Error. Does that make more sense? Sorry, I should have said I was using twistd. glyph> In what way would you expect the service mechanism to "deal with" glyph> returning a Deferred? Stop starting other services? Print out some glyph> different log message? I'm not sure what should happen. I'm sitting at the command line, I've asked twistd to start something for me, there's clearly been a problem doing so (and this doesn't have to be baroque, maybe I just couldn't listen on a specific port I wanted, or maybe my code somehow raised an Exception), but I don't seem to have a mechanism for having twistd take any notice at all. I'm just talking about the case where startService calls something that returns a deferred and there's an Exception that comes back down the Deferred chain as a failure. I suppose if startService raises an Exception itself directly, something else happens - maybe twistd exits. glyph> IService is a very, very simple interface. If you want to respond glyph> to failures from startService (deferred failures, exceptions, or glyph> whatever else) in a useful way, then you can write your own glyph> implementation of it which manages startup order, keeps track of glyph> dependencies, and maintains a state machine that handles stopService glyph> appropriately if called in mid- startup. glyph> I don't think that having to implement an interface with 6 methods glyph> on it could be considered "cruel and unusual". If you think so you glyph> may want to investigate options other than Twisted: you will glyph> frequently be expected to implement interfaces with methods on them glyph> ;-). :-) glyph> There's no need to "track down and subclass" lots of things. Your glyph> IService wants the things that it contains to have a richer glyph> interface which allows for error handling, dependencies, and glyph> propagation, so simply write a single wrapper for simpler IService glyph> objects that expands the interface to do the other things that glyph> you're interested in. In the case of a service being started by twistd, it doesn't seem as simple as you describe, but maybe that's my lack of understanding again. I can easily subclass IService, but something else is calling the startService method of that subclass. And that thing, whatever it is, is not expecting me to return a deferred. So if my startService has for some reason got its hands on a deferred, it can't simply hand it back to its caller and have something (twistd in my case) see that an error occurred. It does feel like I have to track down what this something else might be. Either working from my IServiceMaker implementation or working from /usr/bin/twistd to find where startService is not trivial (you guys wrote it, I'm sure the logic is all much clearer to you). After looking through a few files I wind up at twisted/application/app.py, which has a startApplication function that calls service.IService(application).startService(). So I guess that's what is calling my startService. So I could make my own startApplication function, but I then have the same problem, I wind up with a deferred on my hands and my caller is not expecting me to return it. Plus, the startApplication function sits at the top level of twisted/application/app.py, so I have to find whatever is calling that. That seems to be twisted/scripts/_twistd_unix.py, which imports twisted.application.app and has a top-level startApplication that calls app.startApplication. But who is calling that? Looks like twisted/scripts/twistd.py is, and that's called by /usr/bin/twistd. So should I write my own twistd? All this doesn't seem to be a matter of simple subclassing. Plus, I can't just go in and start editing the top-level functions in twisted/application/app.py and twisted/scripts/_twistd_unix.py or code that imports app, etc. Sorry for so many questions - I really don't know if I'm missing something simple here. I do enjoy digging into all this, and I appreciate your apparently limitless patience. I wish I knew it all better. Twisted is complex and it's a pretty good bet that anything you think of or run into as a n00b has been thought of or encountered before, and that whatever way you think of to solve it will probably be non-optimal, or plain wrong, or in ignorance of a solution someone much more experienced has already implemented, or... etc. Hence my many questions. glyph> This all strikes me as totally straightforward and easy, and I don't glyph> think I'm any kind of super-genius for being able to write a few glyph> Python classes that call a few simple start/stop methods in the glyph> order that I want them to run in :). I should have mentioned that I want to use twistd. In fact I have something like a process pool running in one service and I talk to it from another machine. I say "hey, process pool, start me up the following service (a twistd service)" and I would then like to know if that service started, and if not then why not. So having twistd fail or report an error if it can't start a service would be useful. glyph> Doing either of those things would definitely be wrong. There's no glyph> reason to sys.exit or reactor.stop if your application can't start glyph> up, unless your management system specifically calls for such a glyph> thing. Maybe this is a case where it's (semi-)justified. At least if I called sys.exit the twistd process would go away, instead of sitting there acting as though nothing's wrong :-) I can also, of course, try interacting with the service I think I just started on the remote machine, and if I can't then I can tell the original process pool to kill the twistd process. But that seems a pretty roundabout alternative to just having twistd notice that something went awry when calling startService. glyph> In the future, even the Twisted plugin code might be starting some glyph> things in addition to your application. As I mentioned above, a glyph> good reason to do that is to perform diagnostics on failed startups glyph> :). I assume you really mean "startups" and not "services". In which case, I'm 100% sure there's something funny here, but I can't figure it out. I'd love to know, and I'm smiling broadly in any case. Too bad email loses so much humor..... we can always try though. Thanks again, Terry
On 28 Nov, 03:38 pm, terry@jon.es wrote:
Hi Glyph
Thanks for the detailed reply.
No problem, sorry it took so long to get back to this; I definitely left the conversation halfway through.
"glyph" == glyph <glyph@divmod.com> writes: glyph> On 27 Nov, 05:16 pm, terry@jon.es wrote:
glyph> The key question here is: indicate to whom? If you want to indicate glyph> it to some other object, well, try:except: or addErrback and call a glyph> method on that object. Nothing magic about it.
I have code written as a Twisted plugin. So I have a class implementing IServiceMaker and IPlugin, and I create an instance of that class which gets found when I invoke twistd from the command line.
So in my case I want to indicate to twistd that the service that my class creates a makeService method to create, but which I do not set in motion, has failed to start and that twistd should exit, or do something other than cheerfully tell me that there's been an Unhandled Error.
Does that make more sense? Sorry, I should have said I was using twistd.
Yes. And I think that this is a good use-case for making IService a bit deeper than it is. twistd (and, one day, other tools that might be able to start IService objects) potentially needs some more information so it can make a more intelligent high-level decision about what to do next. My previous messages were basically saying "you can't do what you want with IService". i.e. you can't write your own code to do something clever and somehow have dependencies and notifications to users of "twistd" fall out of that. But that shouldn't be taken as an indication that twistd itself shouldn't be improved. For reasons I brought up in previous messages, exiting is not always the right thing to do. But in some cases (*none* of the services on offer were able to start up, for example) it might be.
I'm not sure what should happen. I'm sitting at the command line, I've asked twistd to start something for me, there's clearly been a problem doing so (and this doesn't have to be baroque, maybe I just couldn't listen on a specific port I wanted, or maybe my code somehow raised an Exception), but I don't seem to have a mechanism for having twistd take any notice at all.
Keep in mind that you may be sitting on the other end of a blackberry which needs to get an email when 'twistd' fails to start up after a server reboot - you're not necessarily at a command line. One of the reasons that no mechanism has yet evolved for propagating interesting status messages about services is that there's no obvious channel for those messages to be communicated to.
In the case of a service being started by twistd, it doesn't seem as simple as you describe, but maybe that's my lack of understanding again. I can easily subclass IService, but something else is calling the startService method of that subclass. And that thing, whatever it is, is not expecting me to return a deferred. So if my startService has for some reason got its hands on a deferred, it can't simply hand it back to its caller and have something (twistd in my case) see that an error occurred.
It does feel like I have to track down what this something else might be.
It's not that you've failed to understand something - you have correctly identified that there is nothing to understand :). You need to go find that thing ("whatever it is" ;-)) in twistd that's invoking the very first call to IService.startService (and privilegedStartService as well) and propose a concrete way to make it smarter. (I've elided your exploration of twisted code, but you're basically looking in the right direction.)
So should I write my own twistd?
No way, man, it's open source! Not writing your own is the whole point! Open a ticket, write a patch. If you're not really sure what the patch should do, we can continue this discussion about reporting startup errors and service dependencies here. This poorly-specified ticket indicates that other developers have had similar problems in the past: http://twistedmatrix.com/trac/ticket/1572
All this doesn't seem to be a matter of simple subclassing. Plus, I can't just go in and start editing the top-level functions in twisted/application/app.py and twisted/scripts/_twistd_unix.py or code that imports app, etc.
Sure you can. Then, you take the results of running 'svn diff' on the copy of Twisted where you did that, and... :)
Sorry for so many questions - I really don't know if I'm missing something simple here. I do enjoy digging into all this, and I appreciate your apparently limitless patience. I wish I knew it all better. Twisted is complex and it's a pretty good bet that anything you think of or run into as a n00b has been thought of or encountered before, and that whatever way you think of to solve it will probably be non-optimal, or plain wrong, or in ignorance of a solution someone much more experienced has already implemented, or... etc. Hence my many questions.
Unfortunately it's equally likely that it's come up a million times and a half dozen people know it's an unresolved issue but it's never been filed as a ticket and never been clearly framed as a specific problem rather than a vague architectural queasiness. The value of a good ticket should not be underestimated.
In fact I have something like a process pool running in one service and I talk to it from another machine. I say "hey, process pool, start me up the following service (a twistd service)" and I would then like to know if that service started, and if not then why not. So having twistd fail or report an error if it can't start a service would be useful.
Quite so. But, this is a great example of how this is a use-case beyond twistd's current capabilities, but not out of the scope of its eventual ambition. It isn't currently designed to manage processes on remote machines. Maybe ampoule has something which does? I don't know, that's a tricky area which requires a lot of thought beyond this one feature.
glyph> Doing either of those things would definitely be wrong. There's no glyph> reason to sys.exit or reactor.stop if your application can't start glyph> up, unless your management system specifically calls for such a glyph> thing.
Maybe this is a case where it's (semi-)justified. At least if I called sys.exit the twistd process would go away, instead of sitting there acting as though nothing's wrong :-) I can also, of course, try interacting with the service I think I just started on the remote machine, and if I can't then I can tell the original process pool to kill the twistd process. But that seems a pretty roundabout alternative to just having twistd notice that something went awry when calling startService.
It may be worthwhile, before writing a patch, to devise your *own* error-reporting channel on top of twistd. For example, a dedicated error-report-processing server listening on a dedicated port. Then you can specify your own enhanced-IService interface that handles dependencies and deferreds and whatever else you need, and an adapter to hooks it up to twistd via IService but reports errors to your external error-reporting channel. Once you're sure that design works, it should be straightforward to change the implementation of twistd to honor your expanded interface, and provide an alternate implementation of the error-reporting channel.
glyph> In the future, even the Twisted plugin code might be starting some glyph> things in addition to your application. As I mentioned above, a glyph> good reason to do that is to perform diagnostics on failed startups glyph> :).
I assume you really mean "startups" and not "services". In which case, I'm 100% sure there's something funny here, but I can't figure it out. I'd love to know, and I'm smiling broadly in any case. Too bad email loses so much humor..... we can always try though.
By "startups" I simply meant "attempts to start-up a service" i.e. calls to startService. I'm sure there is a truckload of subtle and completely unintentional humor in there somewhere though.
participants (2)
-
glyph@divmod.com
-
Terry Jones