async/sync library reusage
Hello everyone, After using asyncio for a while, I'm struggling to find information about how to support both synchronous and asynchronous use cases for the same library. I.e. imagine you have a package for http requests and you want to give the user the choice to use a synchronous or an asynchronous interface. Right now the approach the community is following is creating separate libraries one for each version. This is far from ideal for several reasons, some I can think of: - Code duplication, most of the functionality is the same in both libraries, only difference is the sync/async behaviors - Some new async libraries lack functionality compared to their sync siblings. Others will introduce bugs that the sync version already solved long ago, etc. - Different interfaces for the user for the same exact functionality. In summary, in some cases it looks like reinventing the wheel. So now comes the question, is there any documentation, guide on what would be best practice supporting this kind of duality? I've been playing a bit with that on my own but I really don't know if I'm doing something stupid or not. Simple example: """ import asyncio class MyConnector: @classmethod async def get(cls, key): return key class AsyncClient: async def get(self, key): return await MyConnector.get(key) class SyncClient: def __init__(self): self.loop = asyncio.get_event_loop() def get(self, key): return self.loop.run_until_complete(MyConnector.get(key)) def sync_call(): client = SyncClient() print(client.get("sync_key")) async def async_call(): client = AsyncClient() print(await client.get("async_key")) if __name__ == "__main__": loop = asyncio.get_event_loop() loop.run_until_complete(async_call()) sync_call() """ This is in case the underlying connector is asynchronous already. If its synchronous and you want to support both modes, you have to rewrite the IO interactions of MyConnector into a new AsyncMyConnector to support asyncio and then use one or the other accordingly in the upper classes. Am I doing it right or there is another better/alternative way? Thanks for your time, Manuel
Hello, Manuel. The answer to your problem is to refactor the libraries in the "sans I/O" style. Take a look here: http://sans-io.readthedocs.io/ On Thu, 8 Jun 2017 at 19:32 manuel miranda <manu.mirandad@gmail.com> wrote:
Hello everyone,
After using asyncio for a while, I'm struggling to find information about how to support both synchronous and asynchronous use cases for the same library.
I.e. imagine you have a package for http requests and you want to give the user the choice to use a synchronous or an asynchronous interface. Right now the approach the community is following is creating separate libraries one for each version. This is far from ideal for several reasons, some I can think of:
- Code duplication, most of the functionality is the same in both libraries, only difference is the sync/async behaviors - Some new async libraries lack functionality compared to their sync siblings. Others will introduce bugs that the sync version already solved long ago, etc. - Different interfaces for the user for the same exact functionality.
In summary, in some cases it looks like reinventing the wheel. So now comes the question, is there any documentation, guide on what would be best practice supporting this kind of duality? I've been playing a bit with that on my own but I really don't know if I'm doing something stupid or not. Simple example:
""" import asyncio
class MyConnector:
@classmethod async def get(cls, key): return key
class AsyncClient:
async def get(self, key): return await MyConnector.get(key)
class SyncClient:
def __init__(self): self.loop = asyncio.get_event_loop()
def get(self, key): return self.loop.run_until_complete(MyConnector.get(key))
def sync_call(): client = SyncClient() print(client.get("sync_key"))
async def async_call(): client = AsyncClient() print(await client.get("async_key"))
if __name__ == "__main__": loop = asyncio.get_event_loop() loop.run_until_complete(async_call()) sync_call() """
This is in case the underlying connector is asynchronous already. If its synchronous and you want to support both modes, you have to rewrite the IO interactions of MyConnector into a new AsyncMyConnector to support asyncio and then use one or the other accordingly in the upper classes.
Am I doing it right or there is another better/alternative way?
Thanks for your time,
Manuel _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Technical Principal at ThoughtWorks | Twitter: @ramalhoorg
On Thu, Jun 8, 2017 at 3:32 PM, manuel miranda <manu.mirandad@gmail.com> wrote:
Hello everyone,
After using asyncio for a while, I'm struggling to find information about how to support both synchronous and asynchronous use cases for the same library.
I.e. imagine you have a package for http requests and you want to give the user the choice to use a synchronous or an asynchronous interface. Right now the approach the community is following is creating separate libraries one for each version. This is far from ideal for several reasons, some I can think of:
- Code duplication, most of the functionality is the same in both libraries, only difference is the sync/async behaviors - Some new async libraries lack functionality compared to their sync siblings. Others will introduce bugs that the sync version already solved long ago, etc. - Different interfaces for the user for the same exact functionality.
In summary, in some cases it looks like reinventing the wheel. So now comes the question, is there any documentation, guide on what would be best practice supporting this kind of duality?
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc. If you figure out a good approach I think everyone would be interested to hear it :-) -n [1] https://github.com/shazow/urllib3/pull/1068#issuecomment-294422348 [2] Here's the same API implemented three different ways: Using deferreds: https://github.com/pypa/packaging/pull/87 "traditional" sans-IO: https://github.com/pypa/packaging/pull/88 Using the "effect" library: https://github.com/dstufft/packaging/pull/1 -- Nathaniel J. Smith -- https://vorpus.org
On Fri, Jun 9, 2017 at 12:48 AM Nathaniel Smith <njs@pobox.com> wrote:
Hello everyone,
After using asyncio for a while, I'm struggling to find information about how to support both synchronous and asynchronous use cases for the same library.
I.e. imagine you have a package for http requests and you want to give
user the choice to use a synchronous or an asynchronous interface. Right now the approach the community is following is creating separate libraries one for each version. This is far from ideal for several reasons, some I can think of:
- Code duplication, most of the functionality is the same in both
On Thu, Jun 8, 2017 at 3:32 PM, manuel miranda <manu.mirandad@gmail.com> wrote: the libraries,
only difference is the sync/async behaviors - Some new async libraries lack functionality compared to their sync siblings. Others will introduce bugs that the sync version already solved long ago, etc. - Different interfaces for the user for the same exact functionality.
In summary, in some cases it looks like reinventing the wheel. So now comes the question, is there any documentation, guide on what would be best practice supporting this kind of duality?
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
If you figure out a good approach I think everyone would be interested to hear it :-)
Just to leave this breadcrumb here - I've said this before, but not thought in depth about it a lot, but pretty sure that in something like Python4, async needs to become "first class citizen," that is from the inside out, right in the bowels of the repl loop. If async is the default, and synchronous calls just a special case (e.g. single-task async), then I'd expect two things (at least): developers would have an easier time, make fewer mistakes in async programming (the language would handle more), and libraries would be unified as async & sync would be the same. Maybe there's something that would make this not make sense, but I'd be really surprised. Larry's gil removal work intuitively seems an enabler for this kind of (potential) work... -y
-n
[1] https://github.com/shazow/urllib3/pull/1068#issuecomment-294422348
[2] Here's the same API implemented three different ways: Using deferreds: https://github.com/pypa/packaging/pull/87 "traditional" sans-IO: https://github.com/pypa/packaging/pull/88 Using the "effect" library: https://github.com/dstufft/packaging/pull/1
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Yarko Tymciurak kirjoitti 09.06.2017 klo 09:19:
On Fri, Jun 9, 2017 at 12:48 AM Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On Thu, Jun 8, 2017 at 3:32 PM, manuel miranda <manu.mirandad@gmail.com <mailto:manu.mirandad@gmail.com>> wrote: > Hello everyone, > > After using asyncio for a while, I'm struggling to find information about > how to support both synchronous and asynchronous use cases for the same > library. > > I.e. imagine you have a package for http requests and you want to give the > user the choice to use a synchronous or an asynchronous interface. Right now > the approach the community is following is creating separate libraries one > for each version. This is far from ideal for several reasons, some I can > think of: > > - Code duplication, most of the functionality is the same in both libraries, > only difference is the sync/async behaviors > - Some new async libraries lack functionality compared to their sync > siblings. Others will introduce bugs that the sync version already solved > long ago, etc. > - Different interfaces for the user for the same exact functionality. > > In summary, in some cases it looks like reinventing the wheel. So now comes > the question, is there any documentation, guide on what would be best > practice supporting this kind of duality?
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
If you figure out a good approach I think everyone would be interested to hear it :-)
Just to leave this breadcrumb here - I've said this before, but not thought in depth about it a lot, but pretty sure that in something like Python4, async needs to become "first class citizen," that is from the inside out, right in the bowels of the repl loop.
If async is the default, and synchronous calls just a special case (e.g. single-task async), then I'd expect two things (at least): developers would have an easier time, make fewer mistakes in async programming (the language would handle more), and libraries would be unified as async & sync would be the same. Are you suggesting the removal of the "await", "async with" and "async for" structures? Those were added deliberately so developers can spot
Python 4 will be nothing more than the next minor release after 3.9. Because Guido hates double digit minor versions :) the yield points in a coroutine function. Not having them would give us something like gevent where you can never tell when your task is going to be adjourned in favor of another.
Maybe there's something that would make this not make sense, but I'd be really surprised. Larry's gil removal work intuitively seems an enabler for this kind of (potential) work...
-y
-n
[1] https://github.com/shazow/urllib3/pull/1068#issuecomment-294422348
[2] Here's the same API implemented three different ways: Using deferreds: https://github.com/pypa/packaging/pull/87 "traditional" sans-IO: https://github.com/pypa/packaging/pull/88 Using the "effect" library: https://github.com/dstufft/packaging/pull/1
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org <mailto:Async-sig@python.org> https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Fri, Jun 9, 2017 at 3:05 AM Alex Grönholm <alex.gronholm@nextday.fi> wrote:
Yarko Tymciurak kirjoitti 09.06.2017 klo 09:19:
On Fri, Jun 9, 2017 at 12:48 AM Nathaniel Smith <njs@pobox.com> wrote:
Hello everyone,
After using asyncio for a while, I'm struggling to find information about how to support both synchronous and asynchronous use cases for the same library.
I.e. imagine you have a package for http requests and you want to give
user the choice to use a synchronous or an asynchronous interface. Right now the approach the community is following is creating separate libraries one for each version. This is far from ideal for several reasons, some I can think of:
- Code duplication, most of the functionality is the same in both
On Thu, Jun 8, 2017 at 3:32 PM, manuel miranda <manu.mirandad@gmail.com> wrote: the libraries,
only difference is the sync/async behaviors - Some new async libraries lack functionality compared to their sync siblings. Others will introduce bugs that the sync version already solved long ago, etc. - Different interfaces for the user for the same exact functionality.
In summary, in some cases it looks like reinventing the wheel. So now comes the question, is there any documentation, guide on what would be best practice supporting this kind of duality?
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
If you figure out a good approach I think everyone would be interested to hear it :-)
Just to leave this breadcrumb here - I've said this before, but not thought in depth about it a lot, but pretty sure that in something like Python4, async needs to become "first class citizen," that is from the inside out, right in the bowels of the repl loop.
Python 4 will be nothing more than the next minor release after 3.9. Because Guido hates double digit minor versions :)
If async is the default, and synchronous calls just a special case (e.g. single-task async), then I'd expect two things (at least): developers would have an easier time, make fewer mistakes in async programming (the language would handle more), and libraries would be unified as async & sync would be the same.
Are you suggesting the removal of the "await", "async with" and "async for" structures? Those were added deliberately so developers can spot the yield points in a coroutine function. Not having them would give us something like gevent where you can never tell when your task is going to be adjourned in favor of another.
actually I was bot thinking of that... but I was thinking of processing in the language, rather than a library... In any case, I don't have answers, only a vision which keeps coming up. My interest is not in providing "a solution", rather generating a reasoned discussion...
Maybe there's something that would make this not make sense, but I'd be really surprised. Larry's gil removal work intuitively seems an enabler for this kind of (potential) work...
-y
-n
[1] https://github.com/shazow/urllib3/pull/1068#issuecomment-294422348
[2] Here's the same API implemented three different ways: Using deferreds: https://github.com/pypa/packaging/pull/87 "traditional" sans-IO: https://github.com/pypa/packaging/pull/88 Using the "effect" library: https://github.com/dstufft/packaging/pull/1
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing listAsync-sig@python.orghttps://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Yarko Tymciurak kirjoitti 09.06.2017 klo 11:49:
On Fri, Jun 9, 2017 at 3:05 AM Alex Grönholm <alex.gronholm@nextday.fi <mailto:alex.gronholm@nextday.fi>> wrote:
Yarko Tymciurak kirjoitti 09.06.2017 klo 09:19:
On Fri, Jun 9, 2017 at 12:48 AM Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On Thu, Jun 8, 2017 at 3:32 PM, manuel miranda <manu.mirandad@gmail.com <mailto:manu.mirandad@gmail.com>> wrote: > Hello everyone, > > After using asyncio for a while, I'm struggling to find information about > how to support both synchronous and asynchronous use cases for the same > library. > > I.e. imagine you have a package for http requests and you want to give the > user the choice to use a synchronous or an asynchronous interface. Right now > the approach the community is following is creating separate libraries one > for each version. This is far from ideal for several reasons, some I can > think of: > > - Code duplication, most of the functionality is the same in both libraries, > only difference is the sync/async behaviors > - Some new async libraries lack functionality compared to their sync > siblings. Others will introduce bugs that the sync version already solved > long ago, etc. > - Different interfaces for the user for the same exact functionality. > > In summary, in some cases it looks like reinventing the wheel. So now comes > the question, is there any documentation, guide on what would be best > practice supporting this kind of duality?
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
If you figure out a good approach I think everyone would be interested to hear it :-)
Just to leave this breadcrumb here - I've said this before, but not thought in depth about it a lot, but pretty sure that in something like Python4, async needs to become "first class citizen," that is from the inside out, right in the bowels of the repl loop.
Python 4 will be nothing more than the next minor release after 3.9. Because Guido hates double digit minor versions :)
If async is the default, and synchronous calls just a special case (e.g. single-task async), then I'd expect two things (at least): developers would have an easier time, make fewer mistakes in async programming (the language would handle more), and libraries would be unified as async & sync would be the same.
Are you suggesting the removal of the "await", "async with" and "async for" structures? Those were added deliberately so developers can spot the yield points in a coroutine function. Not having them would give us something like gevent where you can never tell when your task is going to be adjourned in favor of another.
actually I was bot thinking of that... but I was thinking of processing in the language, rather than a library...
In any case, I don't have answers, only a vision which keeps coming up. My interest is not in providing "a solution", rather generating a reasoned discussion...
Then explain what you mean by making async a first class citizen in Python. In my mind it already is, by courtesy of having the "async def", "await" et al added to the language syntax itself and the inclusion of the asyncio module in the standard library. The only other thing that could've been done is to tie the language syntax to a single event loop implementation but that was deliberately left out.
Maybe there's something that would make this not make sense, but I'd be really surprised. Larry's gil removal work intuitively seems an enabler for this kind of (potential) work...
-y
-n
[1] https://github.com/shazow/urllib3/pull/1068#issuecomment-294422348
[2] Here's the same API implemented three different ways: Using deferreds: https://github.com/pypa/packaging/pull/87 "traditional" sans-IO: https://github.com/pypa/packaging/pull/88 Using the "effect" library: https://github.com/dstufft/packaging/pull/1
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org <mailto:Async-sig@python.org> https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org <mailto:Async-sig@python.org> https://mail.python.org/mailman/listinfo/async-sig Code of Conduct:https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org <mailto:Async-sig@python.org> https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Fri, Jun 9, 2017 at 3:57 AM Alex Grönholm <alex.gronholm@nextday.fi> wrote:
Yarko Tymciurak kirjoitti 09.06.2017 klo 11:49:
On Fri, Jun 9, 2017 at 3:05 AM Alex Grönholm <alex.gronholm@nextday.fi> wrote:
Yarko Tymciurak kirjoitti 09.06.2017 klo 09:19:
On Fri, Jun 9, 2017 at 12:48 AM Nathaniel Smith <njs@pobox.com> wrote:
Hello everyone,
After using asyncio for a while, I'm struggling to find information about how to support both synchronous and asynchronous use cases for the same library.
I.e. imagine you have a package for http requests and you want to give
user the choice to use a synchronous or an asynchronous interface. Right now the approach the community is following is creating separate libraries one for each version. This is far from ideal for several reasons, some I can think of:
- Code duplication, most of the functionality is the same in both
On Thu, Jun 8, 2017 at 3:32 PM, manuel miranda <manu.mirandad@gmail.com> wrote: the libraries,
only difference is the sync/async behaviors - Some new async libraries lack functionality compared to their sync siblings. Others will introduce bugs that the sync version already solved long ago, etc. - Different interfaces for the user for the same exact functionality.
In summary, in some cases it looks like reinventing the wheel. So now comes the question, is there any documentation, guide on what would be best practice supporting this kind of duality?
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
If you figure out a good approach I think everyone would be interested to hear it :-)
Just to leave this breadcrumb here - I've said this before, but not thought in depth about it a lot, but pretty sure that in something like Python4, async needs to become "first class citizen," that is from the inside out, right in the bowels of the repl loop.
Python 4 will be nothing more than the next minor release after 3.9. Because Guido hates double digit minor versions :)
If async is the default, and synchronous calls just a special case (e.g. single-task async), then I'd expect two things (at least): developers would have an easier time, make fewer mistakes in async programming (the language would handle more), and libraries would be unified as async & sync would be the same.
Are you suggesting the removal of the "await", "async with" and "async for" structures? Those were added deliberately so developers can spot the yield points in a coroutine function. Not having them would give us something like gevent where you can never tell when your task is going to be adjourned in favor of another.
actually I was bot thinking of that... but I was thinking of processing in the language, rather than a library...
In any case, I don't have answers, only a vision which keeps coming up. My interest is not in providing "a solution", rather generating a reasoned discussion...
Then explain what you mean by making async a first class citizen in Python. In my mind it already is, by courtesy of having the "async def", "await" et al added to the language syntax itself and the inclusion of the asyncio module in the standard library. The only other thing that could've been done is to tie the language syntax to a single event loop implementation but that was deliberately left out.
i'm sorry - I thought that was clear by saying it would be in the repl loop itself and not in a library.
and those it wouldn't require two versions of every library. That's what I meant. that is right now it's coming from the outside in, that is to say from applications, closer in, to an attempt at a common library. i'm suggesting it start from the inside of the language out so that all things have that support and that it is not just a library thus any code can take advantage of either single or multiple async tasks, goal being that there only need be on version of libraries. at least that's the discussion I'm calling for. does that help?
Maybe there's something that would make this not make sense, but I'd be really surprised. Larry's gil removal work intuitively seems an enabler for this kind of (potential) work...
-y
-n
[1] https://github.com/shazow/urllib3/pull/1068#issuecomment-294422348
[2] Here's the same API implemented three different ways: Using deferreds: https://github.com/pypa/packaging/pull/87 "traditional" sans-IO: https://github.com/pypa/packaging/pull/88 Using the "effect" library: https://github.com/dstufft/packaging/pull/1
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing listAsync-sig@python.orghttps://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Yarko, I think your vision is too far out. Maybe something like that could become a reality in Python 5 -- it would require all extensions to become aware of the async stuff (adding it to Python doesn't automatically add it to C!). Also the GIL has nothing to do with this, async tasks all run in the same thread, and if there was no GIL it would not be any different (else two cooperating tasks could be run on different threads and you'd be back on pre-emptive scheduling and the ensuing race conditions). (Note that I refer to Python 4 as Python after the Gilectomy -- it needs to be a new major version since the C API changes dramatically as C extensions will no longer have the protection of the GIL.) --Guido On Fri, Jun 9, 2017 at 2:08 AM, Yarko Tymciurak <yarkot1@gmail.com> wrote:
On Fri, Jun 9, 2017 at 3:57 AM Alex Grönholm <alex.gronholm@nextday.fi> wrote:
Yarko Tymciurak kirjoitti 09.06.2017 klo 11:49:
On Fri, Jun 9, 2017 at 3:05 AM Alex Grönholm <alex.gronholm@nextday.fi> wrote:
Yarko Tymciurak kirjoitti 09.06.2017 klo 09:19:
On Fri, Jun 9, 2017 at 12:48 AM Nathaniel Smith <njs@pobox.com> wrote:
Hello everyone,
After using asyncio for a while, I'm struggling to find information about how to support both synchronous and asynchronous use cases for the same library.
I.e. imagine you have a package for http requests and you want to give the user the choice to use a synchronous or an asynchronous interface. Right now the approach the community is following is creating separate
for each version. This is far from ideal for several reasons, some I can think of:
- Code duplication, most of the functionality is the same in both
On Thu, Jun 8, 2017 at 3:32 PM, manuel miranda <manu.mirandad@gmail.com> wrote: libraries one libraries,
only difference is the sync/async behaviors - Some new async libraries lack functionality compared to their sync siblings. Others will introduce bugs that the sync version already solved long ago, etc. - Different interfaces for the user for the same exact functionality.
In summary, in some cases it looks like reinventing the wheel. So now comes the question, is there any documentation, guide on what would be best practice supporting this kind of duality?
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
If you figure out a good approach I think everyone would be interested to hear it :-)
Just to leave this breadcrumb here - I've said this before, but not thought in depth about it a lot, but pretty sure that in something like Python4, async needs to become "first class citizen," that is from the inside out, right in the bowels of the repl loop.
Python 4 will be nothing more than the next minor release after 3.9. Because Guido hates double digit minor versions :)
If async is the default, and synchronous calls just a special case (e.g. single-task async), then I'd expect two things (at least): developers would have an easier time, make fewer mistakes in async programming (the language would handle more), and libraries would be unified as async & sync would be the same.
Are you suggesting the removal of the "await", "async with" and "async for" structures? Those were added deliberately so developers can spot the yield points in a coroutine function. Not having them would give us something like gevent where you can never tell when your task is going to be adjourned in favor of another.
actually I was bot thinking of that... but I was thinking of processing in the language, rather than a library...
In any case, I don't have answers, only a vision which keeps coming up. My interest is not in providing "a solution", rather generating a reasoned discussion...
Then explain what you mean by making async a first class citizen in Python. In my mind it already is, by courtesy of having the "async def", "await" et al added to the language syntax itself and the inclusion of the asyncio module in the standard library. The only other thing that could've been done is to tie the language syntax to a single event loop implementation but that was deliberately left out.
i'm sorry - I thought that was clear by saying it would be in the repl loop itself and not in a library.
and those it wouldn't require two versions of every library. That's what I meant.
that is right now it's coming from the outside in, that is to say from applications, closer in, to an attempt at a common library. i'm suggesting it start from the inside of the language out so that all things have that support and that it is not just a library thus any code can take advantage of either single or multiple async tasks, goal being that there only need be on version of libraries. at least that's the discussion I'm calling for.
does that help?
Maybe there's something that would make this not make sense, but I'd be really surprised. Larry's gil removal work intuitively seems an enabler for this kind of (potential) work...
-y
-n
[1] https://github.com/shazow/urllib3/pull/1068#issuecomment-294422348
[2] Here's the same API implemented three different ways: Using deferreds: https://github.com/pypa/packaging/pull/87 "traditional" sans-IO: https://github.com/pypa/packaging/pull/88 Using the "effect" library: https://github.com/dstufft/packaging/pull/1
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing listAsync-sig@python.orghttps://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 9 Jun 2017, at 06:48, Nathaniel Smith <njs@pobox.com> wrote:
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
Let me take a moment to elaborate on some of the thinking that has gone on for urllib3/Requests. We have an unusual set of constraints that are worth understanding, and so I’ll throw out all the ideas we had and why they were rejected (and indeed, why you may not want to reject them). 1. Implement the core library in asyncio, add a synchronous shim on top of it in terms of asyncio.run_until_complete(). This works great in many ways: you get a nice async-based library implementation, you correctly prioritise people using the async case over those using the synchronous one, and you can expect wide support and interop thanks to asyncio’s role as the common event loop implementation. However, you don’t support more novel async paradigms like those used by curio and trio. More damningly for urllib3/Requests, this also limits your supported Python versions to 3.5 and later. There are also some efficiency concerns. Finally, unless you’re willing to only support 3.7 you end up needing to pass loop arguments around which is pretty gross. 2. Have an abstract low-level I/O interface and “bleach” it (remove the keywords async/await) on Python 2. This would require you write all your code in terms of a small number of abstract I/O operations with “async” in front of their name, e.g. “async def send”, “async def recv”, and so-on. You can then implement these across multiple I/O backends, and also provide a synchronous one that still has “async” in front of it and just doesn’t ever use the word “await”. You can then provide a code transformation at install time on Python 2 that transforms that codebase, removing all the words “async” and “await” and leaving behind a synchronous-only codebase. The advantages here are better support for novel async paradigms (e.g. curio and trio), the ability to write more native backends for non-asyncio I/O models (e.g. Twisted/Tornado), and having a single codebase that handles sync and async. There are many myriad disadvantages. The first is the most obvious: the code your users run is not the same as the code you shipped. While the transformation is small and pretty easy to understand, that doesn’t remove its risks. It also makes debugging harder and more painful. On top of that, your Python 3 synchronous code looks pretty ugly because you have to write the word “await” around it even though it is not in fact asynchronous (technically you *don’t* have to do that but I guarantee IDEs will get mad). More subtly, this causes problems for backpressure and task management on event loops. It turns out defining your low-level I/O primitives is not trivial. In urllib3’s case, one of the things we’d need is either the equivalent of ‘async def select()’ or ‘async def new_task’. In the first case, to write this would require a careful management of futures/deferreds and various bits of state in order to correctly suspect execution on event loops. In the second case, the synchronous version of this is called “threading.Thread” and that has a number of issues. I’d say that if you’re going to use threads you may as well just always use threads, but more importantly it has substantially different semantics to all async task management which make it difficult to reason about and to ensure that the code is sensible. This approach is also entirely untested, at any scale. It’s simply not clear that it works yet. All the tooling would need to be written. 3. Just use Twisted/Tornado. This variation on number (1) turns out to get you surprisingly close to our actual goal. Twisted and Tornado support Python 2 and Python 3, when async/await are present they integrate fairly nicely with them, and they give you the added advantage of allowing your Python 2 users to do asynchronous code so long as they buy into the relevant async ecosystem. It also means that you can use the run_until_complete model for your Python 2 synchronous code. However, these also have some downsides. Twisted, the library I know better, doesn’t yet integrate as cleanly with async/await as we’d like: that’s coming sometime this year, probably with the landing of 3.7. Additionally, Twisted has no equivalent of asyncio.run_until_complete(), which would mean that someone would have to add the relevant Twisted support (either restartable or instantiable reactors, neither of which Twisted has yet). This also adds a potentially sizeable external dependency, which isn’t necessarily all that fun. 4. ??? Who knows. Right now there is no clarity about what we’re going to do. It’s possible that the answer will end up being “nothing at the moment’ and that we’ll wait for the ecosystem to progress for a while before making the change. Either way, it’s clear that there is no easy answer to this problem. Cory
Cory, I really like your approach #1. You can make it work all the way back to Python 3.3 by using @coroutine and yield from. That's not pretty, but for libraries the goal shouldn't primarily be prettiness of the implementation -- prettiness of the API is much more important, and that's preserved by asyncio's compatibility (code you write that's compatible with Python 3.3 and the latest asyncio from PyPI should still run on Python 3.7 and provide a modern async/await-based API for applications written for 3.7). Also, I don't think the situation with explicitly passing loop= is so terrible as you seem to think. If you rely on the default event loop, you rely on there *being* a default event loop, but there will always be one unless an app goes out of its way to create an event loop and then make it not the default loop. Only the asyncio tests do that. There are a few things you can't do unless you pass an event loop (such as scheduling callbacks before the event loop is started) but other than that it's really not such a big deal as people seem to think it is. (You mostly see the pattern because asyncio itself uses that pattern, because it needs to be robust for the extreme use case where someone *does* hide the active event loop. But there will never be two active event loops.) --Guido On Fri, Jun 9, 2017 at 2:06 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
On 9 Jun 2017, at 06:48, Nathaniel Smith <njs@pobox.com> wrote:
I would say that this is something that we as a community are still figuring out. I really like the Sans-IO approach, and it's a really valuable piece of the solution, but it doesn't solve the whole problem by itself - you still need to actually do I/O, and this means things like error handling and timeouts that aren't obviously a natural fit to the Sans-IO approach, and this means you may still have some tricky code that can end up duplicated. (Or maybe the Sans-IO approach can be extended to handle these things too?) There are active discussions happening in projects like urllib3 [1] and packaging [2] about what the best strategy to take is. And the options vary a lot depending on whether you need to support python 2 etc.
Let me take a moment to elaborate on some of the thinking that has gone on for urllib3/Requests. We have an unusual set of constraints that are worth understanding, and so I’ll throw out all the ideas we had and why they were rejected (and indeed, why you may not want to reject them).
1. Implement the core library in asyncio, add a synchronous shim on top of it in terms of asyncio.run_until_complete().
This works great in many ways: you get a nice async-based library implementation, you correctly prioritise people using the async case over those using the synchronous one, and you can expect wide support and interop thanks to asyncio’s role as the common event loop implementation. However, you don’t support more novel async paradigms like those used by curio and trio.
More damningly for urllib3/Requests, this also limits your supported Python versions to 3.5 and later. There are also some efficiency concerns. Finally, unless you’re willing to only support 3.7 you end up needing to pass loop arguments around which is pretty gross.
2. Have an abstract low-level I/O interface and “bleach” it (remove the keywords async/await) on Python 2.
This would require you write all your code in terms of a small number of abstract I/O operations with “async” in front of their name, e.g. “async def send”, “async def recv”, and so-on. You can then implement these across multiple I/O backends, and also provide a synchronous one that still has “async” in front of it and just doesn’t ever use the word “await”. You can then provide a code transformation at install time on Python 2 that transforms that codebase, removing all the words “async” and “await” and leaving behind a synchronous-only codebase.
The advantages here are better support for novel async paradigms (e.g. curio and trio), the ability to write more native backends for non-asyncio I/O models (e.g. Twisted/Tornado), and having a single codebase that handles sync and async.
There are many myriad disadvantages. The first is the most obvious: the code your users run is not the same as the code you shipped. While the transformation is small and pretty easy to understand, that doesn’t remove its risks. It also makes debugging harder and more painful. On top of that, your Python 3 synchronous code looks pretty ugly because you have to write the word “await” around it even though it is not in fact asynchronous (technically you *don’t* have to do that but I guarantee IDEs will get mad).
More subtly, this causes problems for backpressure and task management on event loops. It turns out defining your low-level I/O primitives is not trivial. In urllib3’s case, one of the things we’d need is either the equivalent of ‘async def select()’ or ‘async def new_task’. In the first case, to write this would require a careful management of futures/deferreds and various bits of state in order to correctly suspect execution on event loops. In the second case, the synchronous version of this is called “threading.Thread” and that has a number of issues. I’d say that if you’re going to use threads you may as well just always use threads, but more importantly it has substantially different semantics to all async task management which make it difficult to reason about and to ensure that the code is sensible.
This approach is also entirely untested, at any scale. It’s simply not clear that it works yet. All the tooling would need to be written.
3. Just use Twisted/Tornado.
This variation on number (1) turns out to get you surprisingly close to our actual goal. Twisted and Tornado support Python 2 and Python 3, when async/await are present they integrate fairly nicely with them, and they give you the added advantage of allowing your Python 2 users to do asynchronous code so long as they buy into the relevant async ecosystem. It also means that you can use the run_until_complete model for your Python 2 synchronous code.
However, these also have some downsides. Twisted, the library I know better, doesn’t yet integrate as cleanly with async/await as we’d like: that’s coming sometime this year, probably with the landing of 3.7. Additionally, Twisted has no equivalent of asyncio.run_until_complete(), which would mean that someone would have to add the relevant Twisted support (either restartable or instantiable reactors, neither of which Twisted has yet).
This also adds a potentially sizeable external dependency, which isn’t necessarily all that fun.
4. ??? Who knows.
Right now there is no clarity about what we’re going to do. It’s possible that the answer will end up being “nothing at the moment’ and that we’ll wait for the ecosystem to progress for a while before making the change. Either way, it’s clear that there is no easy answer to this problem.
Cory
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 9 Jun 2017, at 16:40, Guido van Rossum <guido@python.org> wrote:
Also, I don't think the situation with explicitly passing loop= is so terrible as you seem to think. If you rely on the default event loop, you rely on there *being* a default event loop, but there will always be one unless an app goes out of its way to create an event loop and then make it not the default loop. Only the asyncio tests do that. There are a few things you can't do unless you pass an event loop (such as scheduling callbacks before the event loop is started) but other than that it's really not such a big deal as people seem to think it is. (You mostly see the pattern because asyncio itself uses that pattern, because it needs to be robust for the extreme use case where someone *does* hide the active event loop. But there will never be two active event loops.)
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish that with a synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process. Generally speaking I’ve not had positive results with libraries spawning their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6. I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the problems were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least. Cory
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote:
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish that with a synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may perform better but libraries that spawn threads are problematic.
Generally speaking I’ve not had positive results with libraries spawning their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards). Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient: https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83f951758c96775a43... -Ben
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the problems were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic does anybody know the reason behind this change? [1] https://github.com/python/cpython/commit/600a349781bfa0a8239e1cb95fac29c7c4a... On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote:
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote:
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish that with a synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may perform better but libraries that spawn threads are problematic.
Generally speaking I’ve not had positive results with libraries spawning their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards).
Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient: https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83f951758c96775a43...
-Ben
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the problems were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --pau
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former. On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic
does anybody know the reason behind this change?
[1] https://github.com/python/cpython/commit/ 600a349781bfa0a8239e1cb95fac29c7c4a3302e
On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote:
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote:
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish that
with a
synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may perform better but libraries that spawn threads are problematic.
Generally speaking I’ve not had positive results with libraries spawning their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the
face of
a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards).
Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient: https://github.com/tornadoweb/tornado/blob/ 62e47215ce12aee83f951758c96775a43e80475b/tornado/httpclient.py#L54
-Ben
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the problems were more user-error than anything else. However, I feel comfortable
saying
that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example? Btw thanks for the first explanation El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former.
On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic
does anybody know the reason behind this change?
[1] https://github.com/python/cpython/commit/600a349781bfa0a8239 e1cb95fac29c7c4a3302e
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote:
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish that
with a
synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may perform better but libraries that spawn threads are problematic.
Generally speaking I’ve not had positive results with libraries
spawning
their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards).
Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient: https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83 f951758c96775a43e80475b/tornado/httpclient.py#L54
-Ben
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the
On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote: problems
were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.) On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example?
Btw thanks for the first explanation
El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former.
On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic
does anybody know the reason behind this change?
[1] https://github.com/python/cpython/commit/600a349781bfa0a8239 e1cb95fac29c7c4a3302e
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote:
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish that
with a
synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may perform better but libraries that spawn threads are problematic.
Generally speaking I’ve not had positive results with libraries
spawning
their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards).
Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient: https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83 f951758c96775a43e80475b/tornado/httpclient.py#L54
-Ben
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the
On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote: problems
were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
In Tornado this comes up sometimes in initialization scenarios: def main(): # Since main is synchronous, we need a synchronous HTTP client with tornado.httpclient.HTTPClient() as client: # HTTPClient creates its own event loop and runs it behind the scenes. # This is not the same as the event loop under which main() is running. resp = client.fetch(url) if __name__ == '__main__': IOLoop.current().add_callback(main) IOLoop.current().start() This is never an ideal scenario (it would be better to make main() a coroutine and use an async HTTP client), but it does sometimes come up as the most expedient option. This scenario is also why methods like EventLoop.is_running() tend to be misguided - the question of "can I use this event loop" is not directly related to "is this event loop running". -Ben On Mon, Jun 12, 2017 at 11:58 AM Guido van Rossum <guido@python.org> wrote:
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.)
On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example?
Btw thanks for the first explanation
El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former.
On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic
does anybody know the reason behind this change?
[1] https://github.com/python/cpython/commit/600a349781bfa0a8239e1cb95fac29c7c4a...
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote:
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish
On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote: that with a
synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may perform better but libraries that spawn threads are problematic.
Generally speaking I’ve not had positive results with libraries
spawning
their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards).
Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient:
https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83f951758c96775a43...
-Ben
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the
problems
were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
Unit tests at least. Running every test in own loop is crucial fro tests isolation. On Mon, Jun 12, 2017 at 7:04 PM Guido van Rossum <guido@python.org> wrote:
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.)
On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example?
Btw thanks for the first explanation
El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former.
On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic
does anybody know the reason behind this change?
[1] https://github.com/python/cpython/commit/600a349781bfa0a8239e1cb95fac29c7c4a...
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote:
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish
On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote: that with a
synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may perform better but libraries that spawn threads are problematic.
Generally speaking I’ve not had positive results with libraries
spawning
their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards).
Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient:
https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83f951758c96775a43...
-Ben
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the
problems
were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
Cory _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
Yes, but not co-existing, I hope! On Mon, Jun 12, 2017 at 9:25 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Unit tests at least. Running every test in own loop is crucial fro tests isolation.
On Mon, Jun 12, 2017 at 7:04 PM Guido van Rossum <guido@python.org> wrote:
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.)
On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example?
Btw thanks for the first explanation
El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former.
On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic
does anybody know the reason behind this change?
[1] https://github.com/python/cpython/commit/ 600a349781bfa0a8239e1cb95fac29c7c4a3302e
On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote: > > > > My concern with multiple loops boils down to the fact that urllib3 > supports being used in a multithreaded context where each thread can > independently make forward progress on one request. To establish
> synchronous codebase you either need one event loop per thread or you need > to spawn a background thread on startup that owns the only event loop in the > process.
Yeah, one event loop per thread is probably the way to go for integration with synchronous codebases. A dedicated event loop thread may
but libraries that spawn threads are problematic.
> > > Generally speaking I’ve not had positive results with libraries spawning > their own threads in Python. In my experience this has tended to lead to > programs that deadlock mysteriously or that fail to terminate in
> a Ctrl+C. So I tend to prefer to have users spawn their own
> would make me want a “one-event-loop-per-thread” model: hence, needing a > loop parameter to pass around prior to 3.6.
You can avoid the loop parameter on older versions of asyncio (at least as long as the default event loop policy is used) by manually setting your event loop as current before calling run_until_complete (and resetting it afterwards).
Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), and Tornado supports multiple IOLoops in this way. We use this to expose a synchronous version of our AsyncHTTPClient: https://github.com/tornadoweb/tornado/blob/ 62e47215ce12aee83f951758c96775a43e80475b/tornado/httpclient.py#L54
-Ben
> > > I admit that my concerns here regarding libraries spawning their own > threads may be overblown: after my series of negative experiences I > basically never went back to that model, and it may be that the
On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote: that with a perform better the face of threads, which problems
> were more user-error than anything else. However, I feel comfortable saying > that libraries spawning their own Python threads is definitely subtle and > hard to get right, at the very least. > > Cory > _______________________________________________ > Async-sig mailing list > Async-sig@python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
Yes, but with one exception: default event loop created on module import stage might co-exist with a loop created for test. It leads to mystic hangs, you know. Please recall code like: class A: mongodb = motor.motor_asyncio.AsyncIOMotorClient() On Mon, Jun 12, 2017 at 7:37 PM Guido van Rossum <guido@python.org> wrote:
Yes, but not co-existing, I hope!
On Mon, Jun 12, 2017 at 9:25 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Unit tests at least. Running every test in own loop is crucial fro tests isolation.
On Mon, Jun 12, 2017 at 7:04 PM Guido van Rossum <guido@python.org> wrote:
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.)
On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example?
Btw thanks for the first explanation
El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former.
On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
Sorry a bit of topic, but I would like to figure out why older python versions, prior this commit [1], the get_event_loop is not considered deterministic
does anybody know the reason behind this change?
[1] https://github.com/python/cpython/commit/600a349781bfa0a8239e1cb95fac29c7c4a...
On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> wrote: > On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> wrote: >> >> >> >> My concern with multiple loops boils down to the fact that urllib3 >> supports being used in a multithreaded context where each thread can >> independently make forward progress on one request. To establish that with a >> synchronous codebase you either need one event loop per thread or you need >> to spawn a background thread on startup that owns the only event loop in the >> process. > > > Yeah, one event loop per thread is probably the way to go for integration > with synchronous codebases. A dedicated event loop thread may perform better > but libraries that spawn threads are problematic. > >> >> >> Generally speaking I’ve not had positive results with libraries spawning >> their own threads in Python. In my experience this has tended to lead to >> programs that deadlock mysteriously or that fail to terminate in the face of >> a Ctrl+C. So I tend to prefer to have users spawn their own threads, which >> would make me want a “one-event-loop-per-thread” model: hence, needing a >> loop parameter to pass around prior to 3.6. > > > You can avoid the loop parameter on older versions of asyncio (at least as > long as the default event loop policy is used) by manually setting your > event loop as current before calling run_until_complete (and resetting it > afterwards). > > Tornado's run_sync() method is equivalent to asyncio's run_until_complete(), > and Tornado supports multiple IOLoops in this way. We use this to expose a > synchronous version of our AsyncHTTPClient: > https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83f951758c96775a43... > > -Ben > >> >> >> I admit that my concerns here regarding libraries spawning their own >> threads may be overblown: after my series of negative experiences I >> basically never went back to that model, and it may be that the problems >> were more user-error than anything else. However, I feel comfortable saying >> that libraries spawning their own Python threads is definitely subtle and >> hard to get right, at the very least. >> >> Cory >> _______________________________________________ >> Async-sig mailing list >> Async-sig@python.org >> https://mail.python.org/mailman/listinfo/async-sig >> Code of Conduct: https://www.python.org/psf/codeofconduct/ > > > _______________________________________________ > Async-sig mailing list > Async-sig@python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/ >
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
Honestly I think we're in agreement. There's never a use for one loop running while another is the default. There are some rare use cases for multiple loops running but before the mentioned commit it was up to the app to ensure to switch the default loop when running a loop. The commit took the ability to screw up there out of the user's hand. On Mon, Jun 12, 2017 at 9:57 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Yes, but with one exception: default event loop created on module import stage might co-exist with a loop created for test. It leads to mystic hangs, you know. Please recall code like: class A: mongodb = motor.motor_asyncio.AsyncIOMotorClient()
On Mon, Jun 12, 2017 at 7:37 PM Guido van Rossum <guido@python.org> wrote:
Yes, but not co-existing, I hope!
On Mon, Jun 12, 2017 at 9:25 AM, Andrew Svetlov <andrew.svetlov@gmail.com
wrote:
Unit tests at least. Running every test in own loop is crucial fro tests isolation.
On Mon, Jun 12, 2017 at 7:04 PM Guido van Rossum <guido@python.org> wrote:
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.)
On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example?
Btw thanks for the first explanation
El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
In theory it's possible to create two event loops (using new_event_loop()), then set one as the default event loop (using set_event_loop()), then run the other one (using run_forever() or run_until_complete()). To tasks running in the latter event loop, get_event_loop() would nevertheless return the former.
On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> wrote:
> Sorry a bit of topic, but I would like to figure out why older python > versions, prior this commit [1], the get_event_loop is not considered > deterministic > > does anybody know the reason behind this change? > > > [1] https://github.com/python/cpython/commit/ > 600a349781bfa0a8239e1cb95fac29c7c4a3302e > > On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> > wrote: > > On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> > wrote: > >> > >> > >> > >> My concern with multiple loops boils down to the fact that urllib3 > >> supports being used in a multithreaded context where each thread > can > >> independently make forward progress on one request. To establish > that with a > >> synchronous codebase you either need one event loop per thread or > you need > >> to spawn a background thread on startup that owns the only event > loop in the > >> process. > > > > > > Yeah, one event loop per thread is probably the way to go for > integration > > with synchronous codebases. A dedicated event loop thread may > perform better > > but libraries that spawn threads are problematic. > > > >> > >> > >> Generally speaking I’ve not had positive results with libraries > spawning > >> their own threads in Python. In my experience this has tended to > lead to > >> programs that deadlock mysteriously or that fail to terminate in > the face of > >> a Ctrl+C. So I tend to prefer to have users spawn their own > threads, which > >> would make me want a “one-event-loop-per-thread” model: hence, > needing a > >> loop parameter to pass around prior to 3.6. > > > > > > You can avoid the loop parameter on older versions of asyncio (at > least as > > long as the default event loop policy is used) by manually setting > your > > event loop as current before calling run_until_complete (and > resetting it > > afterwards). > > > > Tornado's run_sync() method is equivalent to asyncio's > run_until_complete(), > > and Tornado supports multiple IOLoops in this way. We use this to > expose a > > synchronous version of our AsyncHTTPClient: > > https://github.com/tornadoweb/tornado/blob/ > 62e47215ce12aee83f951758c96775a43e80475b/tornado/httpclient.py#L54 > > > > -Ben > > > >> > >> > >> I admit that my concerns here regarding libraries spawning their > own > >> threads may be overblown: after my series of negative experiences > I > >> basically never went back to that model, and it may be that the > problems > >> were more user-error than anything else. However, I feel > comfortable saying > >> that libraries spawning their own Python threads is definitely > subtle and > >> hard to get right, at the very least. > >> > >> Cory > >> _______________________________________________ > >> Async-sig mailing list > >> Async-sig@python.org > >> https://mail.python.org/mailman/listinfo/async-sig > >> Code of Conduct: https://www.python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > > Async-sig mailing list > > Async-sig@python.org > > https://mail.python.org/mailman/listinfo/async-sig > > Code of Conduct: https://www.python.org/psf/codeofconduct/ > > > > > > -- > --pau > _______________________________________________ > Async-sig mailing list > Async-sig@python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/ >
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
Agree in general but current asyncio still may shoot your leg. The solution (at least for my unittest example) might be in adding top level functions for running asyncio code (asyncio.run() and asyncio.run_forever() as Yury Selivanov proposed in https://github.com/python/asyncio/pull/465) After this we could raise a warning in `asyncio.get_event_loop()` if the loop was not set explicitly by `asyncio.set_event_loop()`. On Mon, Jun 12, 2017 at 9:50 PM Guido van Rossum <guido@python.org> wrote:
Honestly I think we're in agreement. There's never a use for one loop running while another is the default. There are some rare use cases for multiple loops running but before the mentioned commit it was up to the app to ensure to switch the default loop when running a loop. The commit took the ability to screw up there out of the user's hand.
On Mon, Jun 12, 2017 at 9:57 AM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Yes, but with one exception: default event loop created on module import stage might co-exist with a loop created for test. It leads to mystic hangs, you know. Please recall code like: class A: mongodb = motor.motor_asyncio.AsyncIOMotorClient()
On Mon, Jun 12, 2017 at 7:37 PM Guido van Rossum <guido@python.org> wrote:
Yes, but not co-existing, I hope!
On Mon, Jun 12, 2017 at 9:25 AM, Andrew Svetlov < andrew.svetlov@gmail.com> wrote:
Unit tests at least. Running every test in own loop is crucial fro tests isolation.
On Mon, Jun 12, 2017 at 7:04 PM Guido van Rossum <guido@python.org> wrote:
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.)
On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
And what about the rationale of having multiple loop instances in the same thread switching btw them. Im still trying to find out what patterns need this... Do you have an example?
Btw thanks for the first explanation
El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió:
> In theory it's possible to create two event loops (using > new_event_loop()), then set one as the default event loop (using > set_event_loop()), then run the other one (using run_forever() or > run_until_complete()). To tasks running in the latter event loop, > get_event_loop() would nevertheless return the former. > > On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> > wrote: > >> Sorry a bit of topic, but I would like to figure out why older >> python >> versions, prior this commit [1], the get_event_loop is not >> considered >> deterministic >> >> does anybody know the reason behind this change? >> >> >> [1] >> https://github.com/python/cpython/commit/600a349781bfa0a8239e1cb95fac29c7c4a... >> >> On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> >> wrote: >> > On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> >> wrote: >> >> >> >> >> >> >> >> My concern with multiple loops boils down to the fact that >> urllib3 >> >> supports being used in a multithreaded context where each thread >> can >> >> independently make forward progress on one request. To establish >> that with a >> >> synchronous codebase you either need one event loop per thread >> or you need >> >> to spawn a background thread on startup that owns the only event >> loop in the >> >> process. >> > >> > >> > Yeah, one event loop per thread is probably the way to go for >> integration >> > with synchronous codebases. A dedicated event loop thread may >> perform better >> > but libraries that spawn threads are problematic. >> > >> >> >> >> >> >> Generally speaking I’ve not had positive results with libraries >> spawning >> >> their own threads in Python. In my experience this has tended to >> lead to >> >> programs that deadlock mysteriously or that fail to terminate in >> the face of >> >> a Ctrl+C. So I tend to prefer to have users spawn their own >> threads, which >> >> would make me want a “one-event-loop-per-thread” model: hence, >> needing a >> >> loop parameter to pass around prior to 3.6. >> > >> > >> > You can avoid the loop parameter on older versions of asyncio (at >> least as >> > long as the default event loop policy is used) by manually >> setting your >> > event loop as current before calling run_until_complete (and >> resetting it >> > afterwards). >> > >> > Tornado's run_sync() method is equivalent to asyncio's >> run_until_complete(), >> > and Tornado supports multiple IOLoops in this way. We use this to >> expose a >> > synchronous version of our AsyncHTTPClient: >> > >> https://github.com/tornadoweb/tornado/blob/62e47215ce12aee83f951758c96775a43... >> > >> > -Ben >> > >> >> >> >> >> >> I admit that my concerns here regarding libraries spawning their >> own >> >> threads may be overblown: after my series of negative >> experiences I >> >> basically never went back to that model, and it may be that the >> problems >> >> were more user-error than anything else. However, I feel >> comfortable saying >> >> that libraries spawning their own Python threads is definitely >> subtle and >> >> hard to get right, at the very least. >> >> >> >> Cory >> >> _______________________________________________ >> >> Async-sig mailing list >> >> Async-sig@python.org >> >> https://mail.python.org/mailman/listinfo/async-sig >> >> Code of Conduct: https://www.python.org/psf/codeofconduct/ >> > >> > >> > _______________________________________________ >> > Async-sig mailing list >> > Async-sig@python.org >> > https://mail.python.org/mailman/listinfo/async-sig >> > Code of Conduct: https://www.python.org/psf/codeofconduct/ >> > >> >> >> >> -- >> --pau >> _______________________________________________ >> Async-sig mailing list >> Async-sig@python.org >> https://mail.python.org/mailman/listinfo/async-sig >> Code of Conduct: https://www.python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) >
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
I think we're getting way beyond the rationale Pau Freixes requested... On Mon, Jun 12, 2017 at 12:05 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
Agree in general but current asyncio still may shoot your leg. The solution (at least for my unittest example) might be in adding top level functions for running asyncio code (asyncio.run() and asyncio.run_forever() as Yury Selivanov proposed in https://github.com/python/asyncio/pull/465) After this we could raise a warning in `asyncio.get_event_loop()` if the loop was not set explicitly by `asyncio.set_event_loop()`.
On Mon, Jun 12, 2017 at 9:50 PM Guido van Rossum <guido@python.org> wrote:
Honestly I think we're in agreement. There's never a use for one loop running while another is the default. There are some rare use cases for multiple loops running but before the mentioned commit it was up to the app to ensure to switch the default loop when running a loop. The commit took the ability to screw up there out of the user's hand.
On Mon, Jun 12, 2017 at 9:57 AM, Andrew Svetlov <andrew.svetlov@gmail.com
wrote:
Yes, but with one exception: default event loop created on module import stage might co-exist with a loop created for test. It leads to mystic hangs, you know. Please recall code like: class A: mongodb = motor.motor_asyncio.AsyncIOMotorClient()
On Mon, Jun 12, 2017 at 7:37 PM Guido van Rossum <guido@python.org> wrote:
Yes, but not co-existing, I hope!
On Mon, Jun 12, 2017 at 9:25 AM, Andrew Svetlov < andrew.svetlov@gmail.com> wrote:
Unit tests at least. Running every test in own loop is crucial fro tests isolation.
On Mon, Jun 12, 2017 at 7:04 PM Guido van Rossum <guido@python.org> wrote:
Multiple loops in the same thread is purely theoretical -- the API allows it but there's no use case. It might be necessary if a platform has a UI-only event loop that cannot be extended to do I/O -- the only solution to do background I/O might be to alternate between two loops. (Though in that case I would still prefer a thread for the background I/O.)
On Mon, Jun 12, 2017 at 8:49 AM, Pau Freixes <pfreixes@gmail.com> wrote:
> And what about the rationale of having multiple loop instances in > the same thread switching btw them. Im still trying to find out what > patterns need this... Do you have an example? > > Btw thanks for the first explanation > > El 12/06/2017 17:36, "Guido van Rossum" <guido@python.org> escribió: > >> In theory it's possible to create two event loops (using >> new_event_loop()), then set one as the default event loop (using >> set_event_loop()), then run the other one (using run_forever() or >> run_until_complete()). To tasks running in the latter event loop, >> get_event_loop() would nevertheless return the former. >> >> On Mon, Jun 12, 2017 at 4:39 AM, Pau Freixes <pfreixes@gmail.com> >> wrote: >> >>> Sorry a bit of topic, but I would like to figure out why older >>> python >>> versions, prior this commit [1], the get_event_loop is not >>> considered >>> deterministic >>> >>> does anybody know the reason behind this change? >>> >>> >>> [1] https://github.com/python/cpython/commit/ >>> 600a349781bfa0a8239e1cb95fac29c7c4a3302e >>> >>> On Fri, Jun 9, 2017 at 6:07 PM, Ben Darnell <ben@bendarnell.com> >>> wrote: >>> > On Fri, Jun 9, 2017 at 11:51 AM Cory Benfield <cory@lukasa.co.uk> >>> wrote: >>> >> >>> >> >>> >> >>> >> My concern with multiple loops boils down to the fact that >>> urllib3 >>> >> supports being used in a multithreaded context where each >>> thread can >>> >> independently make forward progress on one request. To >>> establish that with a >>> >> synchronous codebase you either need one event loop per thread >>> or you need >>> >> to spawn a background thread on startup that owns the only >>> event loop in the >>> >> process. >>> > >>> > >>> > Yeah, one event loop per thread is probably the way to go for >>> integration >>> > with synchronous codebases. A dedicated event loop thread may >>> perform better >>> > but libraries that spawn threads are problematic. >>> > >>> >> >>> >> >>> >> Generally speaking I’ve not had positive results with libraries >>> spawning >>> >> their own threads in Python. In my experience this has tended >>> to lead to >>> >> programs that deadlock mysteriously or that fail to terminate >>> in the face of >>> >> a Ctrl+C. So I tend to prefer to have users spawn their own >>> threads, which >>> >> would make me want a “one-event-loop-per-thread” model: hence, >>> needing a >>> >> loop parameter to pass around prior to 3.6. >>> > >>> > >>> > You can avoid the loop parameter on older versions of asyncio >>> (at least as >>> > long as the default event loop policy is used) by manually >>> setting your >>> > event loop as current before calling run_until_complete (and >>> resetting it >>> > afterwards). >>> > >>> > Tornado's run_sync() method is equivalent to asyncio's >>> run_until_complete(), >>> > and Tornado supports multiple IOLoops in this way. We use this >>> to expose a >>> > synchronous version of our AsyncHTTPClient: >>> > https://github.com/tornadoweb/tornado/blob/ >>> 62e47215ce12aee83f951758c96775a43e80475b/tornado/httpclient.py#L54 >>> > >>> > -Ben >>> > >>> >> >>> >> >>> >> I admit that my concerns here regarding libraries spawning >>> their own >>> >> threads may be overblown: after my series of negative >>> experiences I >>> >> basically never went back to that model, and it may be that the >>> problems >>> >> were more user-error than anything else. However, I feel >>> comfortable saying >>> >> that libraries spawning their own Python threads is definitely >>> subtle and >>> >> hard to get right, at the very least. >>> >> >>> >> Cory >>> >> _______________________________________________ >>> >> Async-sig mailing list >>> >> Async-sig@python.org >>> >> https://mail.python.org/mailman/listinfo/async-sig >>> >> Code of Conduct: https://www.python.org/psf/codeofconduct/ >>> > >>> > >>> > _______________________________________________ >>> > Async-sig mailing list >>> > Async-sig@python.org >>> > https://mail.python.org/mailman/listinfo/async-sig >>> > Code of Conduct: https://www.python.org/psf/codeofconduct/ >>> > >>> >>> >>> >>> -- >>> --pau >>> _______________________________________________ >>> Async-sig mailing list >>> Async-sig@python.org >>> https://mail.python.org/mailman/listinfo/async-sig >>> Code of Conduct: https://www.python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> >
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
-- Thanks, Andrew Svetlov
-- --Guido van Rossum (python.org/~guido)
On Fri, Jun 9, 2017 at 8:51 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
On 9 Jun 2017, at 16:40, Guido van Rossum <guido@python.org> wrote:
Also, I don't think the situation with explicitly passing loop= is so terrible as you seem to think. If you rely on the default event loop, you rely on there *being* a default event loop, but there will always be one unless an app goes out of its way to create an event loop and then make it not the default loop. Only the asyncio tests do that. There are a few things you can't do unless you pass an event loop (such as scheduling callbacks before the event loop is started) but other than that it's really not such a big deal as people seem to think it is. (You mostly see the pattern because asyncio itself uses that pattern, because it needs to be robust for the extreme use case where someone *does* hide the active event loop. But there will never be two active event loops.)
My concern with multiple loops boils down to the fact that urllib3 supports being used in a multithreaded context where each thread can independently make forward progress on one request. To establish that with a synchronous codebase you either need one event loop per thread or you need to spawn a background thread on startup that owns the only event loop in the process.
Generally speaking I’ve not had positive results with libraries spawning their own threads in Python. In my experience this has tended to lead to programs that deadlock mysteriously or that fail to terminate in the face of a Ctrl+C. So I tend to prefer to have users spawn their own threads, which would make me want a “one-event-loop-per-thread” model: hence, needing a loop parameter to pass around prior to 3.6.
I admit that my concerns here regarding libraries spawning their own threads may be overblown: after my series of negative experiences I basically never went back to that model, and it may be that the problems were more user-error than anything else. However, I feel comfortable saying that libraries spawning their own Python threads is definitely subtle and hard to get right, at the very least.
At least one of us is still confused. The one-event-loop-per-thread model is supported in asyncio without passing the loop around explicitly. The get_event_loop() implementation stores all its state in thread-locals instance, so it returns the thread's event loop. (Because this is an "advanced" model, you have to explicitly create the event loop with new_event_loop() and make it the default loop for the thread with set_event_loop().) All in all, I'm a bit curious why you would need to use asyncio at all when you've got a thread per request anyway. I agree there are problems with threads that are hidden from an app. Hence asyncio allows you to set the executor where it runs things you pass to run_in_executor() (including some of its own, esp. getaddrinfo()). One note about the one-event-loop-per-thread model: threads should be very cautious touching each other's event loops. This should only be done usingcall_soon_threadsafe()! --Guido -- --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
On 9 Jun 2017, at 17:28, Guido van Rossum <guido@python.org> wrote:
At least one of us is still confused. The one-event-loop-per-thread model is supported in asyncio without passing the loop around explicitly. The get_event_loop() implementation stores all its state in thread-locals instance, so it returns the thread's event loop. (Because this is an "advanced" model, you have to explicitly create the event loop with new_event_loop() and make it the default loop for the thread with set_event_loop().)
Aha, ok, so the confused one is me. I did not know this. =) That definitely works a lot better. It admittedly works less well if someone is doing their own custom event loop stuff, but that’s probably an acceptable limitation up until the time that Python 2 goes quietly into the night.
All in all, I'm a bit curious why you would need to use asyncio at all when you've got a thread per request anyway.
Yeah, so this is a bit of a diversion from the original topic of this thread but I think it’s an idea worth discussing in this space. I want to reframe the question a bit if you don’t mind, so shout if you think I’m not responding to quite what you were asking. In my understanding, the question you’re implicitly asking is this: "If you have a thread-safe library today (that is, one that allows users to do threaded I/O with appropriate resource pooling and management), why move to a model built on asyncio?” There are many answers to this question that differ for different libraries with different uses, but for HTTP libraries like urllib3 here are our reasons. The first is that it turns out that even for HTTP/1.1 you need to write something that amounts to a partial event loop to properly handle the protocol. Good HTTP clients need to watch for responses while they’re uploading body data because if a response arrives during that process body upload should be terminated immediately. This is also required for sensibly handling things like Expect: 100-continue, as well as spotting other intermediate responses and connection teardowns sensibly and without throwing exceptions. Today urllib3 does not do this, and it has caused us pain, so our v2 branch includes a backport of the Python 3 selectors module and a hand-written partially-complete event loop that only handles the specific cases we need. This is an extra thing for us to debug and maintain, and ultimately it’d be easier to just delegate the whole thing to event loops written by others who promise to maintain them and make them efficient. The second answer is that I believe good asyncio support in libraries is a vital part of the future of this language, and “good” asyncio support IMO does as little as possible to block the main event loop. Running all of the complex protocol parsing and state manipulation of the Requests stack on a background thread is not cheap, and involves a lot of GIL swapping around. We have found several bug reports complaining about using Requests with largish-numbers of threads, indicating that our big stack of Python code really does cause contention on the GIL if used heavily. In general, having to defer to a thread to run *Python* code in asyncio is IMO a nasty anti-pattern that should be avoided where possible. It is much less bad to defer to a thread to then block on a syscall (e.g. to get an “async” getaddrinfo), but doing so to run a big big stack of Python code is vastly less pleasant for the main event loop. For this reason, we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around. This goes doubly so when you consider the other reasons for wanting to use asyncio. The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a *highly* concurrent protocol. Connections send a lot of control frames back and forth that are invisible to the user working at the semantic HTTP level but that nonetheless need relatively low-latency turnaround (e.g. PING frames). It turns out that in the traditional synchronous HTTP model urllib3 only gets access to the socket to do work when the user calls into our code. If the user goes a “long” time without calling into urllib3, we take a long time to process any data off the connection. In the best case this causes latency spikes as we process all the data that queued up in the socket. In the worst case, this causes us to lose connections we should have been able to keep because we failed to respond to a PING frame in a timely manner. My experience is that purely synchronous libraries handling HTTP/2 simply cannot provide a positive user experience. HTTP/2 flat-out *requires* either an event loop or a dedicated background thread, and in practice in your dedicated background thread you’d also just end up writing an event loop (see answer 1 again). For this reason, it is basically mandatory for HTTP/2 support in Python to either use an event loop or to spawn out a dedicated C thread that does not hold the GIL to do the I/O (as this thread will be regularly woken up to handle I/O events). Hopefully this (admittedly horrifyingly long) response helps illuminate why we’re interested in asyncio support. It should be noted that if we find ourselves unable to get it in the short term we may simply resort to offering an “async” API that involves us doing the rough equivalent of running in a thread-pool executor, but I won’t be thrilled about it. ;) Cory
Great write-up! I actually find the async nature of HTTP (both versions) a compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly like it would make the implementation easier; for HTTP/2 it sounds like it would just be better for the user-side as well (if the user just wants one resource they can safely continue to use the synchronous HTTP/1.1 version of the API.) On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
On 9 Jun 2017, at 17:28, Guido van Rossum <guido@python.org> wrote:
At least one of us is still confused. The one-event-loop-per-thread model is supported in asyncio without passing the loop around explicitly. The get_event_loop() implementation stores all its state in thread-locals instance, so it returns the thread's event loop. (Because this is an "advanced" model, you have to explicitly create the event loop with new_event_loop() and make it the default loop for the thread with set_event_loop().)
Aha, ok, so the confused one is me. I did not know this. =) That definitely works a lot better. It admittedly works less well if someone is doing their own custom event loop stuff, but that’s probably an acceptable limitation up until the time that Python 2 goes quietly into the night.
All in all, I'm a bit curious why you would need to use asyncio at all when you've got a thread per request anyway.
Yeah, so this is a bit of a diversion from the original topic of this thread but I think it’s an idea worth discussing in this space. I want to reframe the question a bit if you don’t mind, so shout if you think I’m not responding to quite what you were asking. In my understanding, the question you’re implicitly asking is this:
"If you have a thread-safe library today (that is, one that allows users to do threaded I/O with appropriate resource pooling and management), why move to a model built on asyncio?”
There are many answers to this question that differ for different libraries with different uses, but for HTTP libraries like urllib3 here are our reasons.
The first is that it turns out that even for HTTP/1.1 you need to write something that amounts to a partial event loop to properly handle the protocol. Good HTTP clients need to watch for responses while they’re uploading body data because if a response arrives during that process body upload should be terminated immediately. This is also required for sensibly handling things like Expect: 100-continue, as well as spotting other intermediate responses and connection teardowns sensibly and without throwing exceptions.
Today urllib3 does not do this, and it has caused us pain, so our v2 branch includes a backport of the Python 3 selectors module and a hand-written partially-complete event loop that only handles the specific cases we need. This is an extra thing for us to debug and maintain, and ultimately it’d be easier to just delegate the whole thing to event loops written by others who promise to maintain them and make them efficient.
The second answer is that I believe good asyncio support in libraries is a vital part of the future of this language, and “good” asyncio support IMO does as little as possible to block the main event loop. Running all of the complex protocol parsing and state manipulation of the Requests stack on a background thread is not cheap, and involves a lot of GIL swapping around. We have found several bug reports complaining about using Requests with largish-numbers of threads, indicating that our big stack of Python code really does cause contention on the GIL if used heavily. In general, having to defer to a thread to run *Python* code in asyncio is IMO a nasty anti-pattern that should be avoided where possible. It is much less bad to defer to a thread to then block on a syscall (e.g. to get an “async” getaddrinfo), but doing so to run a big big stack of Python code is vastly less pleasant for the main event loop.
For this reason, we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around. This goes doubly so when you consider the other reasons for wanting to use asyncio.
The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a *highly* concurrent protocol. Connections send a lot of control frames back and forth that are invisible to the user working at the semantic HTTP level but that nonetheless need relatively low-latency turnaround (e.g. PING frames). It turns out that in the traditional synchronous HTTP model urllib3 only gets access to the socket to do work when the user calls into our code. If the user goes a “long” time without calling into urllib3, we take a long time to process any data off the connection. In the best case this causes latency spikes as we process all the data that queued up in the socket. In the worst case, this causes us to lose connections we should have been able to keep because we failed to respond to a PING frame in a timely manner.
My experience is that purely synchronous libraries handling HTTP/2 simply cannot provide a positive user experience. HTTP/2 flat-out *requires* either an event loop or a dedicated background thread, and in practice in your dedicated background thread you’d also just end up writing an event loop (see answer 1 again). For this reason, it is basically mandatory for HTTP/2 support in Python to either use an event loop or to spawn out a dedicated C thread that does not hold the GIL to do the I/O (as this thread will be regularly woken up to handle I/O events).
Hopefully this (admittedly horrifyingly long) response helps illuminate why we’re interested in asyncio support. It should be noted that if we find ourselves unable to get it in the short term we may simply resort to offering an “async” API that involves us doing the rough equivalent of running in a thread-pool executor, but I won’t be thrilled about it. ;)
Cory
-- --Guido van Rossum (python.org/~guido)
...so I really am enjoying the conversation. Guido - re: "vision too far out": yes, for people trying to struggle w/ async support in their libraries, now... but that is also part of my motivation. Python 5? Sure... (I may have to watch it come to use from the grave, but hopefully not... ;-) ). Anyway, from back-porting and tactical "implement now" concerns, to plans for next release, to plans for next version of python, to brainstorming much less concrete future versions - all are an interesting continuum. Re: GIL... sure, sort of, and sort of not. I was thinking "as long as major changes are going on... think about additional structural changes..." More to the point: as I see it, people have a hard time thinking about async in the cooperative-multitasking (CMT) sense, and thus disappointments happen around blocking (missed, or unexpects, e.g. hardware failures). Cory (in his reply - and, yeah: nice writeup!) hints to what I generally structurally like: "...we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around" Structurally, async is light-weight overhead compared to threads, which are lightweight compared to processes, and so a sort of natural app flow seems from lightest-weight, on out. To me, this seems practical for making life easier for developers, because you can imagine "promoting" an async task caught unexpectedly blocking, to a thread, while still having the lightest-weight loop have control over it (promotion out, as well as cancellation while promoted). As for multiple task loops, or loops off in a thread, I haven't thought about it too much, but this seems like nothing new nor unreasonable. I'm thinking of the base-stations we talk over in our mobile connections, which are multiple diskless servers, and hot-promote to "master" server status on hardware failure (or live capacity upgrade, i.e. inserting processors). This pattern seems both reasonable and useful in this context, i.e. the concept of a master loop (which implies communication/control channels - a complication). With some thought, some reasonable ground rules and simplifications, and I would expect much can be done. Appreciate the discussions! - Yarko On Fri, Jun 9, 2017 at 1:23 PM, Guido van Rossum <guido@python.org> wrote:
Great write-up! I actually find the async nature of HTTP (both versions) a compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly like it would make the implementation easier; for HTTP/2 it sounds like it would just be better for the user-side as well (if the user just wants one resource they can safely continue to use the synchronous HTTP/1.1 version of the API.)
On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
On 9 Jun 2017, at 17:28, Guido van Rossum <guido@python.org> wrote:
At least one of us is still confused. The one-event-loop-per-thread model is supported in asyncio without passing the loop around explicitly. The get_event_loop() implementation stores all its state in thread-locals instance, so it returns the thread's event loop. (Because this is an "advanced" model, you have to explicitly create the event loop with new_event_loop() and make it the default loop for the thread with set_event_loop().)
Aha, ok, so the confused one is me. I did not know this. =) That definitely works a lot better. It admittedly works less well if someone is doing their own custom event loop stuff, but that’s probably an acceptable limitation up until the time that Python 2 goes quietly into the night.
All in all, I'm a bit curious why you would need to use asyncio at all when you've got a thread per request anyway.
Yeah, so this is a bit of a diversion from the original topic of this thread but I think it’s an idea worth discussing in this space. I want to reframe the question a bit if you don’t mind, so shout if you think I’m not responding to quite what you were asking. In my understanding, the question you’re implicitly asking is this:
"If you have a thread-safe library today (that is, one that allows users to do threaded I/O with appropriate resource pooling and management), why move to a model built on asyncio?”
There are many answers to this question that differ for different libraries with different uses, but for HTTP libraries like urllib3 here are our reasons.
The first is that it turns out that even for HTTP/1.1 you need to write something that amounts to a partial event loop to properly handle the protocol. Good HTTP clients need to watch for responses while they’re uploading body data because if a response arrives during that process body upload should be terminated immediately. This is also required for sensibly handling things like Expect: 100-continue, as well as spotting other intermediate responses and connection teardowns sensibly and without throwing exceptions.
Today urllib3 does not do this, and it has caused us pain, so our v2 branch includes a backport of the Python 3 selectors module and a hand-written partially-complete event loop that only handles the specific cases we need. This is an extra thing for us to debug and maintain, and ultimately it’d be easier to just delegate the whole thing to event loops written by others who promise to maintain them and make them efficient.
The second answer is that I believe good asyncio support in libraries is a vital part of the future of this language, and “good” asyncio support IMO does as little as possible to block the main event loop. Running all of the complex protocol parsing and state manipulation of the Requests stack on a background thread is not cheap, and involves a lot of GIL swapping around. We have found several bug reports complaining about using Requests with largish-numbers of threads, indicating that our big stack of Python code really does cause contention on the GIL if used heavily. In general, having to defer to a thread to run *Python* code in asyncio is IMO a nasty anti-pattern that should be avoided where possible. It is much less bad to defer to a thread to then block on a syscall (e.g. to get an “async” getaddrinfo), but doing so to run a big big stack of Python code is vastly less pleasant for the main event loop.
For this reason, we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around. This goes doubly so when you consider the other reasons for wanting to use asyncio.
The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a *highly* concurrent protocol. Connections send a lot of control frames back and forth that are invisible to the user working at the semantic HTTP level but that nonetheless need relatively low-latency turnaround (e.g. PING frames). It turns out that in the traditional synchronous HTTP model urllib3 only gets access to the socket to do work when the user calls into our code. If the user goes a “long” time without calling into urllib3, we take a long time to process any data off the connection. In the best case this causes latency spikes as we process all the data that queued up in the socket. In the worst case, this causes us to lose connections we should have been able to keep because we failed to respond to a PING frame in a timely manner.
My experience is that purely synchronous libraries handling HTTP/2 simply cannot provide a positive user experience. HTTP/2 flat-out *requires* either an event loop or a dedicated background thread, and in practice in your dedicated background thread you’d also just end up writing an event loop (see answer 1 again). For this reason, it is basically mandatory for HTTP/2 support in Python to either use an event loop or to spawn out a dedicated C thread that does not hold the GIL to do the I/O (as this thread will be regularly woken up to handle I/O events).
Hopefully this (admittedly horrifyingly long) response helps illuminate why we’re interested in asyncio support. It should be noted that if we find ourselves unable to get it in the short term we may simply resort to offering an “async” API that involves us doing the rough equivalent of running in a thread-pool executor, but I won’t be thrilled about it. ;)
Cory
-- --Guido van Rossum (python.org/~guido)
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
So, I've been playing a bit with the information I saw in this thread (thank you all for the responses) and I got something super simple working: https://gist.github.com/argaen/056a43b083a29f76ac6e2fa97b3e08d1 What I like about this (and that's what I was aiming for) is that the user uses the same class/interface no matter if its inside asyncio world or not. So both `await fn()` and `fn()` work producing the expected results. Now some cons (that in the case of my library are acceptable): - This aims only for asyncio compatibility, other async frameworks like trio, curio, etc. wouldn't work - No python2 compatibility (although Nathaniel's idea of bleaching could still be applied) - I guess it adds some overhead to both sync and async versions, I will do some benchmarking when I have time (actually this one will be the one deciding whether I do the integration or not) Pros: - User is agnostic to the async/sync implementation. If you are in asyncio world, just use `async fn()` and if not `fn()`. Both will work - There is compatibility between classes using this approach - No duplication of code I haven't thought yet about async context managers, iterations and so but I guess there is a way to fix that too (or not, I have no idea). One fun part of all this is if its possible (meaning easily) to reuse also the tests to test both the sync and the async version... :rolling_eyes: On Fri, Jun 9, 2017 at 9:52 PM Yarko Tymciurak <yarkot1@gmail.com> wrote:
...so I really am enjoying the conversation.
Guido - re: "vision too far out": yes, for people trying to struggle w/ async support in their libraries, now... but that is also part of my motivation. Python 5? Sure... (I may have to watch it come to use from the grave, but hopefully not... ;-) ). Anyway, from back-porting and tactical "implement now" concerns, to plans for next release, to plans for next version of python, to brainstorming much less concrete future versions - all are an interesting continuum.
Re: GIL... sure, sort of, and sort of not. I was thinking "as long as major changes are going on... think about additional structural changes..." More to the point: as I see it, people have a hard time thinking about async in the cooperative-multitasking (CMT) sense, and thus disappointments happen around blocking (missed, or unexpects, e.g. hardware failures). Cory (in his reply - and, yeah: nice writeup!) hints to what I generally structurally like:
"...we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around"
Structurally, async is light-weight overhead compared to threads, which are lightweight compared to processes, and so a sort of natural app flow seems from lightest-weight, on out. To me, this seems practical for making life easier for developers, because you can imagine "promoting" an async task caught unexpectedly blocking, to a thread, while still having the lightest-weight loop have control over it (promotion out, as well as cancellation while promoted).
As for multiple task loops, or loops off in a thread, I haven't thought about it too much, but this seems like nothing new nor unreasonable. I'm thinking of the base-stations we talk over in our mobile connections, which are multiple diskless servers, and hot-promote to "master" server status on hardware failure (or live capacity upgrade, i.e. inserting processors). This pattern seems both reasonable and useful in this context, i.e. the concept of a master loop (which implies communication/control channels - a complication). With some thought, some reasonable ground rules and simplifications, and I would expect much can be done.
Appreciate the discussions!
- Yarko On Fri, Jun 9, 2017 at 1:23 PM, Guido van Rossum <guido@python.org> wrote:
Great write-up! I actually find the async nature of HTTP (both versions) a compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly like it would make the implementation easier; for HTTP/2 it sounds like it would just be better for the user-side as well (if the user just wants one resource they can safely continue to use the synchronous HTTP/1.1 version of the API.)
On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <cory@lukasa.co.uk> wrote:
On 9 Jun 2017, at 17:28, Guido van Rossum <guido@python.org> wrote:
At least one of us is still confused. The one-event-loop-per-thread model is supported in asyncio without passing the loop around explicitly. The get_event_loop() implementation stores all its state in thread-locals instance, so it returns the thread's event loop. (Because this is an "advanced" model, you have to explicitly create the event loop with new_event_loop() and make it the default loop for the thread with set_event_loop().)
Aha, ok, so the confused one is me. I did not know this. =) That definitely works a lot better. It admittedly works less well if someone is doing their own custom event loop stuff, but that’s probably an acceptable limitation up until the time that Python 2 goes quietly into the night.
All in all, I'm a bit curious why you would need to use asyncio at all when you've got a thread per request anyway.
Yeah, so this is a bit of a diversion from the original topic of this thread but I think it’s an idea worth discussing in this space. I want to reframe the question a bit if you don’t mind, so shout if you think I’m not responding to quite what you were asking. In my understanding, the question you’re implicitly asking is this:
"If you have a thread-safe library today (that is, one that allows users to do threaded I/O with appropriate resource pooling and management), why move to a model built on asyncio?”
There are many answers to this question that differ for different libraries with different uses, but for HTTP libraries like urllib3 here are our reasons.
The first is that it turns out that even for HTTP/1.1 you need to write something that amounts to a partial event loop to properly handle the protocol. Good HTTP clients need to watch for responses while they’re uploading body data because if a response arrives during that process body upload should be terminated immediately. This is also required for sensibly handling things like Expect: 100-continue, as well as spotting other intermediate responses and connection teardowns sensibly and without throwing exceptions.
Today urllib3 does not do this, and it has caused us pain, so our v2 branch includes a backport of the Python 3 selectors module and a hand-written partially-complete event loop that only handles the specific cases we need. This is an extra thing for us to debug and maintain, and ultimately it’d be easier to just delegate the whole thing to event loops written by others who promise to maintain them and make them efficient.
The second answer is that I believe good asyncio support in libraries is a vital part of the future of this language, and “good” asyncio support IMO does as little as possible to block the main event loop. Running all of the complex protocol parsing and state manipulation of the Requests stack on a background thread is not cheap, and involves a lot of GIL swapping around. We have found several bug reports complaining about using Requests with largish-numbers of threads, indicating that our big stack of Python code really does cause contention on the GIL if used heavily. In general, having to defer to a thread to run *Python* code in asyncio is IMO a nasty anti-pattern that should be avoided where possible. It is much less bad to defer to a thread to then block on a syscall (e.g. to get an “async” getaddrinfo), but doing so to run a big big stack of Python code is vastly less pleasant for the main event loop.
For this reason, we’d ideally treat asyncio as the first-class citizen and retrofit on the threaded support, rather than the other way around. This goes doubly so when you consider the other reasons for wanting to use asyncio.
The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a *highly* concurrent protocol. Connections send a lot of control frames back and forth that are invisible to the user working at the semantic HTTP level but that nonetheless need relatively low-latency turnaround (e.g. PING frames). It turns out that in the traditional synchronous HTTP model urllib3 only gets access to the socket to do work when the user calls into our code. If the user goes a “long” time without calling into urllib3, we take a long time to process any data off the connection. In the best case this causes latency spikes as we process all the data that queued up in the socket. In the worst case, this causes us to lose connections we should have been able to keep because we failed to respond to a PING frame in a timely manner.
My experience is that purely synchronous libraries handling HTTP/2 simply cannot provide a positive user experience. HTTP/2 flat-out *requires* either an event loop or a dedicated background thread, and in practice in your dedicated background thread you’d also just end up writing an event loop (see answer 1 again). For this reason, it is basically mandatory for HTTP/2 support in Python to either use an event loop or to spawn out a dedicated C thread that does not hold the GIL to do the I/O (as this thread will be regularly woken up to handle I/O events).
Hopefully this (admittedly horrifyingly long) response helps illuminate why we’re interested in asyncio support. It should be noted that if we find ourselves unable to get it in the short term we may simply resort to offering an “async” API that involves us doing the rough equivalent of running in a thread-pool executor, but I won’t be thrilled about it. ;)
Cory
-- --Guido van Rossum (python.org/~guido)
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
participants (10)
-
Alex Grönholm
-
Andrew Svetlov
-
Ben Darnell
-
Cory Benfield
-
Guido van Rossum
-
Luciano Ramalho
-
manuel miranda
-
Nathaniel Smith
-
Pau Freixes
-
Yarko Tymciurak