Question about asyncio and blocking operations

Sat Jan 23 10:44:33 EST 2016

On Sat, Jan 23, 2016 at 7:38 AM, Frank Millman <frank at chagford.com> wrote:
> Here is the difficulty. The recommended way to handle a blocking operation
> is to run it as task in a different thread, using run_in_executor(). This
> method is a coroutine. An implication of this is that any method that calls
> it must also be a coroutine, so I end up with a chain of coroutines
> stretching all the way back to the initial event that triggered it.

This seems to be a common misapprehension about asyncio programming.
While coroutines are the focus of the library, they're based on
futures, and so by working at a slightly lower level you can also
handle them as such. So  while this would be the typical way to use
run_in_executor:

async def my_coroutine(stuff):
    value = await get_event_loop().run_in_executor(None,
blocking_function, stuff)
    result = await do_something_else_with(value)
    return result

This is also a perfectly valid way to use it:

def normal_function(stuff):
    loop = get_event_loop()
    coro = loop.run_in_executor(None, blocking_function, stuff)
    task = loop.create_task(coro)
    task.add_done_callback(do_something_else)
    return task

> I use a cache to store frequently used objects, but I wait for the first
> request before I actually retrieve it from the database. This is how it
> worked -
>
> # cache of database objects for each company
> class DbObject(dict):
>    def __missing__(self, company):
>        db_object = self[company] = get_db_object _from_database()
>        return db_object
> db_objects = DbObjects()
>
> Any function could ask for db_cache.db_objects[company]. The first time it
> would be read from the database, on subsequent requests it would be returned
> from the dictionary.
>
> Now get_db_object_from_database() is a coroutine, so I have to change it to
>        db_object = self[company] = await get_db_object _from_database()
>
> But that is not allowed, because __missing__() is not a coroutine.
>
> I fixed it by replacing the cache with a function -
>
> # cache of database objects for each company
> db_objects = {}
> async def get_db_object(company):
>    if company not in db_objects:
>        db_object = db_objects[company] = await get_db_object
> _from_database()
>    return db_objects[company]
>
> Now the calling functions have to call 'await
> db_cache.get_db_object(company)'
>
> Ok, once I had made the change it did not feel so bad.

This all sounds pretty reasonable to me.

> Now I have another problem. I have some classes which retrieve some data
> from the database during their __init__() method. I find that it is not
> allowed to call a coroutine from __init__(), and it is not allowed to turn
> __init__() into a coroutine.
>
> I imagine that I will have to split __init__() into two parts, put the
> database functionality into a separately-callable method, and then go
> through my app to find all occurrences of instantiating the object and
> follow it with an explicit call to the new method.
>
> Again, I can handle that without too much difficulty. But at this stage I do
> not know what other problems I am going to face, and how easy they will be
> to fix.
>
> So I thought I would ask here if anyone has been through a similar exercise,
> and if what I am going through sounds normal, or if I am doing something
> fundamentally wrong.

This is where it would make sense to me to use callbacks instead of
subroutines. You can structure your __init__ method like this:

def __init__(self, params):
    self.params = params
    self.db_object_future = get_event_loop().create_task(
            get_db_object(params))

async def method_depending_on_db_object():
    db_object = await self.db_object_future
    result = do_something_with(db_object)
    return result

The caveat with this is that while __init__ itself doesn't need to be
a coroutine, any method that depends on the DB lookup does need to be
(or at least needs to return a future).