[Twisted-Python] Logging observers and threads

When an error occurs in a database connection thread, adbapi invokes log.err from that thread, which in turns causes log observers to be invoked in this thread. Mantissa's TracebackCollector has a log observer which logs items as Items in an Axiom store; this means that the log observer tries to run SQLite queries in the wrong thread, and explodes. However, I'm not sure whose fault this is; it seems there are three possibilities:
1) log.err and log.msg should not be called from a non-reactor thread, making this an adbapi bug. 2) twisted.python.log should not invoke log observers in a non-reactor thread, making this a twisted.python.log bug. 3) log observers should expect to be inovked in non-reactor threads, making this a Mantissa bug.
Can anyone give a definitive answer on this? The "nothing except reactor.callFromThread is thread-safe" principle would suggest that 1) is the case, but logging seems like it might be a special case.

On 07/12/2012 05:17 AM, Tristan Seligmann wrote:
- log observers should expect to be inovked in non-reactor threads,
making this a Mantissa bug.
In theory this is the case - the logging howto says "The observer needs to be thread safe if you anticipate using threads in your program." The API docs don't mention threads, though..
In practice however, probably every custom observer gets this wrong, so we really need some infrastructure to make this easier or to fix it. Possibly we should have some way to indicate an observer is thread-safe, assume by default they are not, and if they are not wrap them in a reactor.callFromThread wrapper.
I would also like some infrastructure to make the file log observer run in its own thread, but that's almost the opposite use case :)
-Itamar

On 11:32 am, itamar@itamarst.org wrote:
On 07/12/2012 05:17 AM, Tristan Seligmann wrote:
- log observers should expect to be inovked in non-reactor threads,
making this a Mantissa bug.
In theory this is the case - the logging howto says "The observer needs to be thread safe if you anticipate using threads in your program." The API docs don't mention threads, though..
In practice however, probably every custom observer gets this wrong, so we really need some infrastructure to make this easier or to fix it. Possibly we should have some way to indicate an observer is thread- safe, assume by default they are not, and if they are not wrap them in a reactor.callFromThread wrapper.
The logging system is already a bottleneck in some applications. Doing even more work to try to make it properly thread-safe will probably slow it down even more.
I think it was a mistake to try to make it possible to use twisted.python.log free-threaded. I think we should think about getting rid of this claimed feature. Perhaps it could be replaced with a more explicit, probably more restricted, logging feature for non-reactor threads.
Unless someone can magically fix the threading issues without hurting performance. Then, great, go for it.
I would also like some infrastructure to make the file log observer run in its own thread, but that's almost the opposite use case :)
Sure, but that's probably based on top of an asynchronous file I/O API that's not explicitly thread-based (but probably is thread-based in practice, because what other kind of asynchronous file I/O even exists).
Also, to explicitly answer Tristan's question:
It's a bug in the Mantissa log observer.
Jean-Paul

For logging could we look at implementing posix aio on platforms that support it? I would be willing to help code that.
Justin On Jul 12, 2012 8:51 AM, exarkun@twistedmatrix.com wrote:
On 11:32 am, itamar@itamarst.org wrote:
On 07/12/2012 05:17 AM, Tristan Seligmann wrote:
- log observers should expect to be inovked in non-reactor threads,
making this a Mantissa bug.
In theory this is the case - the logging howto says "The observer needs to be thread safe if you anticipate using threads in your program." The API docs don't mention threads, though..
In practice however, probably every custom observer gets this wrong, so we really need some infrastructure to make this easier or to fix it. Possibly we should have some way to indicate an observer is thread- safe, assume by default they are not, and if they are not wrap them in a reactor.callFromThread wrapper.
The logging system is already a bottleneck in some applications. Doing even more work to try to make it properly thread-safe will probably slow it down even more.
I think it was a mistake to try to make it possible to use twisted.python.log free-threaded. I think we should think about getting rid of this claimed feature. Perhaps it could be replaced with a more explicit, probably more restricted, logging feature for non-reactor threads.
Unless someone can magically fix the threading issues without hurting performance. Then, great, go for it.
I would also like some infrastructure to make the file log observer run in its own thread, but that's almost the opposite use case :)
Sure, but that's probably based on top of an asynchronous file I/O API that's not explicitly thread-based (but probably is thread-based in practice, because what other kind of asynchronous file I/O even exists).
Also, to explicitly answer Tristan's question:
It's a bug in the Mantissa log observer.
Jean-Paul
Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

On 01:56 pm, justin.venus@gmail.com wrote:
For logging could we look at implementing posix aio on platforms that support it? I would be willing to help code that.
As far as I know, all of the implementations of POSIX AIO are relatively low quality. They include a number of tricky, low-level constraints (eg, char* to write must be page aligned) as well as some sad functional limitations such as blocking instead of being asynchronous if too many AIOs are started concurrently (with this limit applying to AIOs *system wide* in some cases).
If there are any good implementations, or prospects of thes implementations getting good sometime in the near future, it might be worth exploring, but I don't think it's something we can rely on now.
See also this stackoverflow question: http://bit.ly/NqWd5b
Corrections welcome, I haven't checked on the state of this in a few months.
Jean-Paul
Justin On Jul 12, 2012 8:51 AM, exarkun@twistedmatrix.com wrote:
On 11:32 am, itamar@itamarst.org wrote:
On 07/12/2012 05:17 AM, Tristan Seligmann wrote:
- log observers should expect to be inovked in non-reactor threads,
making this a Mantissa bug.
In theory this is the case - the logging howto says "The observer
needs
to be thread safe if you anticipate using threads in your program."
The
API docs don't mention threads, though..
In practice however, probably every custom observer gets this wrong,
so
we really need some infrastructure to make this easier or to fix it. Possibly we should have some way to indicate an observer is thread- safe, assume by default they are not, and if they are not wrap them in a reactor.callFromThread wrapper.
The logging system is already a bottleneck in some applications. Doing even more work to try to make it properly thread-safe will probably slow it down even more.
I think it was a mistake to try to make it possible to use twisted.python.log free-threaded. I think we should think about getting rid of this claimed feature. Perhaps it could be replaced with a more explicit, probably more restricted, logging feature for non-reactor threads.
Unless someone can magically fix the threading issues without hurting performance. Then, great, go for it.
I would also like some infrastructure to make the file log observer
run
in its own thread, but that's almost the opposite use case :)
Sure, but that's probably based on top of an asynchronous file I/O API that's not explicitly thread-based (but probably is thread-based in practice, because what other kind of asynchronous file I/O even exists).
Also, to explicitly answer Tristan's question:
It's a bug in the Mantissa log observer.
Jean-Paul
Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

As far as I know, all of the implementations of POSIX AIO are relatively low quality. They include a number of tricky, low-level constraints (eg, char* to write must be page aligned) as well as some sad functional limitations such as blocking instead of being asynchronous if too many AIOs are started concurrently (with this limit applying to AIOs *system wide* in some cases).
If there are any good implementations, or prospects of thes implementations getting good sometime in the near future, it might be worth exploring, but I don't think it's something we can rely on now.
I have been looking into "libuv" a little .. it not only provides wrappers around epoll/kqueue/IOCP for networking, but also provides a unified API for asynchronous File I/O .. I _think_ on Posix it uses the usual background thread pool approach to work around flaky AIO (and also to provide asynch _open_ and everyhting else from Posix File IO), but it also on Windows uses Overlapped IO when possible (which is the right thing to do there, since asynch File IO works since Windows NT 3.5 using that). It also does asynch DNS lookup etc.

Le Jul 12, 2012 à 8:26 AM, Tobias Oberstein tobias.oberstein@tavendo.de a écrit :
As far as I know, all of the implementations of POSIX AIO are relatively low quality. They include a number of tricky, low-level constraints (eg, char* to write must be page aligned) as well as some sad functional limitations such as blocking instead of being asynchronous if too many AIOs are started concurrently (with this limit applying to AIOs *system wide* in some cases).
If there are any good implementations, or prospects of thes implementations getting good sometime in the near future, it might be worth exploring, but I don't think it's something we can rely on now.
I have been looking into "libuv" a little .. it not only provides wrappers around epoll/kqueue/IOCP for networking, but also provides a unified API for asynchronous File I/O .. I _think_ on Posix it uses the usual background thread pool approach to work around flaky AIO (and also to provide asynch _open_ and everyhting else from Posix File IO), but it also on Windows uses Overlapped IO when possible (which is the right thing to do there, since asynch File IO works since Windows NT 3.5 using that). It also does asynch DNS lookup etc.
libuv is a giant pile of new, unaudited C code that didn't even have a passing test suite until a month or two ago. Please feel free to write a reactor for it; better interop with node.js would be great, but it's definitely not a place we should go looking for new core features.
The tricky part of this is not implementation; on today's OSes (except maaaaybe solaris?), you just have to use threads or subprocesses, end of story. The tricky part is finding an interface for things that is nice and general and isn't just a Deferred that fires with a Deferred with a Deferred on top.
So, http://tm.tl/1956, probably. If we're all lucky I'll have some time to move the ball on that on Saturday.
-glyph

libuv is a giant pile of new, unaudited C code that didn't even have a passing test suite until a month or two ago. Please feel free to write a reactor for it;
From what I've read, libuv has designed it's API around the pattern
A. "do operation and get notified when done"
instead of
B. "get notified when ready and then do operation".
The native Windows API is A., Posix the B and operation = read, write, open, stat, ..
I think A. is more natural, since how do wait for a file to be ready to get open'ed/stat'ed?
Do the open/stat and get notified when done seems to be straight forward rgd API.
Do you think a library with API pattern A would make a nice basis for a Twisted reactor?
better interop with node.js would be great, but it's definitely not a place we should go looking for new core features.
;) I knew it would be controversial .. if only for the fact that it's from the node guys ... with support by Microsoft.
The tricky part of this is not implementation; on today's OSes (except maaaaybe solaris?), you just have to use threads or subprocesses, end of story.
Weeeell, Windows. People say IOCP/OverlappedIO works. Without threads. For file I/O also .. don't think for open/stat/.. though.
On Linux there is eventfd ..
The tricky part is finding an interface for things that is nice and general and isn't just a Deferred that fires with a Deferred with a Deferred on top.
So, http://tm.tl/1956, probably. If we're all lucky I'll have some time to move the ball on that on Saturday.
-glyph _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

On 12 Jul, 07:33 pm, tobias.oberstein@tavendo.de wrote:
libuv is a giant pile of new, unaudited C code that didn't even have a passing test suite until a month or two ago. Please feel free to write a reactor for it;
From what I've read, libuv has designed it's API around the pattern
A. "do operation and get notified when done"
instead of
B. "get notified when ready and then do operation".
The native Windows API is A., Posix the B and operation = read, write, open, stat, ..
I think A. is more natural, since how do wait for a file to be ready to get open'ed/stat'ed?
This isn't really attacking the important part of the matter. I suppose it's a necessary decision to make, but it still leaves 99% of the API unspecified.
Twisted essentially always aims for A, so it's probably been an unspoken assumption between the people (at least with whom I've been) discussing this topic that this is what we'd want in Twisted.
Having answered that question, there's still the question of what operations to provide and what the interface to those operations is.
For the recorded efforts so far, see this wiki page:
http://twistedmatrix.com/trac/wiki/Specification/AsynchronousFileInputOutput
which has been sitting, largely untouched, for several years now, due to apparent lack of interest.
Rather than beginning the entire effort anew, I'd rather see additions to and criticisms of the work done already.
Jean-Paul

Le Jul 12, 2012 à 6:47 AM, exarkun@twistedmatrix.com a écrit :
I think it was a mistake to try to make it possible to use twisted.python.log free-threaded. I think we should think about getting rid of this claimed feature. Perhaps it could be replaced with a more explicit, probably more restricted, logging feature for non-reactor threads.
Perhaps so. Does this really need to be more complicated than just reactor.callFromThread, though?
-glyph
participants (6)
-
exarkun@twistedmatrix.com
-
Glyph
-
Itamar Turner-Trauring
-
Justin Venus
-
Tobias Oberstein
-
Tristan Seligmann