multiprocessing module in async db query

Tue Mar 8 15:25:46 EST 2011

This looks like a tornado problem, but trust me, it is almost all
about the mechanism of multiprocessing module.

I borrowed the idea from http://gist.github.com/312676 to implement an
async db query web service using tornado.

p = multiprocessing.Pool(4)
class QueryHandler(tornado.web.RequestHandler):
    ...
    @tornado.web.asynchronous
    def get(self):
        ...
        p.apply_async(async_func, [sql_command, arg1, arg2, arg3, ],
callback_func)

    def callback_func(self, data):
        self.write(data)

def async_func(sql_command, arg1, arg2, arg3):
    '''
    do the actual query job
    '''
    ...
    # data is the query result by executing sql_command
    return data

So the workflow is like this,

get() --> fork a subprocess to process the query request in
async_func() -> when async_func() returns, callback_func uses the
return result of async_func as the input argument, and send the query
result to the client.

So the problem is the the query result as the result of sql_command
might be too big to store them all in the memory, which in our case is
stored in the variable "data". Can I send return from the async method
early, say immediately after the query returns with the first result
set, then stream the results to the browser. In other words, can
async_func somehow notify callback_func to prepare receiving the data
before async_func actually returns?