Mailman 3 await by default - Python-ideas

newer
await outside async function could...

await by default

older
Postpone Python 4 for another 5...

J. Pic

12 Jun 2020 12 Jun '20

8:59 p.m.

Hi all, Just wonder what it would look like if coroutines where awaited by default, you would only have to use "noawait" when you do *not* want to await a coroutine ? async def test(): return do_something() # it's awaited here by default: we get the result and not a coroutine result1 = test() # not awaiting here because you want to do_something_else coroutine = noawait test() do_something_else() result2 = await coroutine Then, you could be chaining code again like this: foo_of_result = test().foo Instead of: foo_of_result = (await test()).foo Thank you in advance for your replies Have a great weekend ! -- ∞

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

Chris Angelico

12 Jun 12 Jun

9:03 p.m.

On Sat, Jun 13, 2020 at 7:02 AM J. Pic <jpic@yourlabs.org> wrote:

...

Hi all,

Just wonder what it would look like if coroutines where awaited by default, you would only have to use "noawait" when you do *not* want to await a coroutine ?

import threading :) ChrisA

Brett Cannon

9:13 p.m.

The use of `await` is on purpose to signal to you as a developer that the event loop may very well pause at that point and context switch to another coroutine. Take that away and you lose that signal. You also would never know from your code whether you need to have an event loop running or not to make such a call. On Fri, Jun 12, 2020 at 2:03 PM J. Pic <jpic@yourlabs.org> wrote:

...

Hi all,

Just wonder what it would look like if coroutines where awaited by default, you would only have to use "noawait" when you do *not* want to await a coroutine ?

async def test(): return do_something()

# it's awaited here by default: we get the result and not a coroutine result1 = test()

# not awaiting here because you want to do_something_else coroutine = noawait test() do_something_else() result2 = await coroutine

Then, you could be chaining code again like this:

foo_of_result = test().foo

Instead of:

foo_of_result = (await test()).foo

Thank you in advance for your replies

Have a great weekend !

-- ∞ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TZAVYT... Code of Conduct: http://python.org/psf/codeofconduct/

J. Pic

9:18 p.m.

Thank you for your comments, In my experience, 99.9% of the time I don't want to do_something_else() before I want to await do_something(). I could as well change this code: foo.bar To: foo.__getattribute__('bar') This would signal that I'm calling the __getattribute__ function. But the reason I'm not doing that is because well, it's not convenient for me, but also I don't really care what happens, I just want the bar attribute of foo when I call foo.bar In the same fashion, when I call result = test(), I don't really care about lower level details, I just want the result of calling the test function. How to refute that comparaison ? -- ∞

J. Pic

9:25 p.m.

I mean, if it pauses because of some blocking IO and switches to another coroutine, that's just a BIG win ... I have hard times trying to figure how that signal could be useful to me as a developer. On the other hand, I have to (await test()).bar ...

Chris Angelico

9:37 p.m.

On Sat, Jun 13, 2020 at 7:29 AM J. Pic <jpic@yourlabs.org> wrote:

...

I mean, if it pauses because of some blocking IO and switches to another coroutine, that's just a BIG win ... I have hard times trying to figure how that signal could be useful to me as a developer. On the other hand, I have to (await test()).bar ...

Exactly what threading is for. If you don't care where the context switches happen and just want everything to behave sanely by default, use threads, not coroutines. The entire point of coroutines is that you know exactly where they can switch contexts. ChrisA

Greg Ewing

13 Jun 13 Jun

2:09 a.m.

On 13/06/20 9:37 am, Chris Angelico wrote:

...

If you don't care where the context switches happen and just want everything to behave sanely by default, use threads, not coroutines.

There are other reasons for using coroutines, such as the fact that they're very lightweight compared to threads. Telling people to "just use threads" without knowing more about their use case is not helpful. -- Greg

Chris Angelico

2:20 a.m.

On Sat, Jun 13, 2020 at 12:11 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

On 13/06/20 9:37 am, Chris Angelico wrote:

...
If you don't care where the context switches happen and just want everything to behave sanely by default, use threads, not coroutines.

There are other reasons for using coroutines, such as the fact that they're very lightweight compared to threads. Telling people to "just use threads" without knowing more about their use case is not helpful.

Perhaps, but the fact is that the described use-case is VERY closely aligned with threads. They behave exactly as the OP is hoping for. So it's up to the OP to explain why threads *wouldn't* be appropriate, which could be something like "ahh but I need 2000 of these at once, I tried it with threads and it cost too much RAM". But without a justification of the inappropriateness of threads, I stand by the recommendation. ChrisA

J. Pic

10:11 a.m.

I'm not sure I should rewrite my asyncio code to threading, I'll just keep going asyncio, maybe one day it'll be possible to get the best of both worlds ... On #python they generally tell me that the extra syntax is worth it because of the internal details, that rewriting to threads does not seem like a good idea on the long run. People tell me "you can use threads if you want, but why not keep your codebase in asyncio", so really I'm not sure what I /should/ do. Basically, my use case is about subprocess call programing, so I got await on 10% of my SLOCs, it's on a toy project on https://yourlabs.io/oss/shlax Actually, it's not really much of a problem for me, I'm just trying to feel the complaints that I've read about Python async, and get a proper idea of the refutations that you got, from this thread you might have seen https://news.ycombinator.com/item?id=23496994 I guess we can say "curiousity killed the cat" once again ;) Thank you all for your replies, Have a great weekend !

Chris Angelico

11:54 a.m.

On Sat, Jun 13, 2020 at 8:14 PM J. Pic <jpic@yourlabs.org> wrote:

...

I'm not sure I should rewrite my asyncio code to threading, I'll just keep going asyncio, maybe one day it'll be possible to get the best of both worlds ...

On #python they generally tell me that the extra syntax is worth it because of the internal details, that rewriting to threads does not seem like a good idea on the long run.

People tell me "you can use threads if you want, but why not keep your codebase in asyncio", so really I'm not sure what I /should/ do.

Basically, my use case is about subprocess call programing, so I got await on 10% of my SLOCs, it's on a toy project on https://yourlabs.io/oss/shlax

Your project is fundamentally about instantiating processes? Then any overhead you'd see from threading rather than asyncio is going to be completely lost in the noise. The actual fundamental work of what you're doing is WAY more than a few extra context switches. Code whichever way makes you happier, and don't take any notice of microbenchmarks. ChrisA

J. Pic

3:15 p.m.

...

Your project is fundamentally about instantiating processes?

Yes, that and spawning other subprocesses depending on the outcome of the previous one, sometimes concurrently per-host but mostly sequentially, and clearly most of the time will be spent waiting for suprocesses because all my commands.

...

Then any overhead you'd see from threading rather than asyncio is going to be completely lost in the noise.

Absolutely, and if I were to use this to orchestrate a bunch of subprocess on thousands of servers, it'd probably happen on a blade with hundreds of gigs of RAM anyway and many cores, so I don't think hardware is really going to be a problem here. That said, I'm also considering a rewrite of my asyncio code in gevent with stdlib patching, I was already happy with eventlet back when doing OpenStack code, and gevent seems even better, might be the opportunity to get the best of both worlds. Sorry I think this discussion got out of the scope of this list, I'll be back with more thoughts on python-list in a few weeks after I gather() more insight Have a great weekend ;)

Kyle Stanley

2:22 a.m.

While I agree that we should avoid telling people to "just use threads", I think Chris was instead informing the OP that their desired behavior is already present in threads, and if they don't want to be concerned at all about context switching, OS threads should be considered as an alternative. Of course, this doesn't take into account the lower memory usage of coroutines, the shorter context switching delay, etc. But that's not necessary for all use cases. On Fri, Jun 12, 2020 at 10:11 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

On 13/06/20 9:37 am, Chris Angelico wrote:

...
If you don't care where the context switches happen and just want everything to behave sanely by default, use threads, not coroutines.

There are other reasons for using coroutines, such as the fact that they're very lightweight compared to threads. Telling people to "just use threads" without knowing more about their use case is not helpful.

-- Greg _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QWUWDA... Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

3:02 a.m.

On Sat, Jun 13, 2020 at 02:09:53PM +1200, Greg Ewing wrote:

...

On 13/06/20 9:37 am, Chris Angelico wrote:

...
If you don't care where the context switches happen and just want everything to behave sanely by default, use threads, not coroutines.

There are other reasons for using coroutines, such as the fact that they're very lightweight compared to threads. Telling people to "just use threads" without knowing more about their use case is not helpful.

Ignorant question here... isn't that at least in part *because* they are designed to be concurrent not parallel? Coroutines are lighter weight than threads because they don't need all the machinary to pre-emptively run threads in parallel; threads are lighter weight than processes because they don't need to be in separate memory spaces enforced by the OS. So if you give up the manual concurrency of coroutines and use them as if they were threads, doesn't that just make them like threads, including roughly the same overhead thereof? -- Steven

Greg Ewing

7:49 a.m.

On 13/06/20 3:02 pm, Steven D'Aprano wrote:

...

Coroutines are lighter weight than threads because they don't need all the machinary to pre-emptively run threads in parallel; threads are lighter weight than processes because they don't need to be in separate memory spaces enforced by the OS.

Well, sort of. Threading mechanisms provided by typical desktop OSes tend to come with a lot of baggage that isn't strictly necessary for the concept of preemptive scheduling. A real time OS designed for embedded devices, for instance, likely provides preemptive threads with far less overhead. -- Greg

Brett Cannon

12 Jun 12 Jun

9:48 p.m.

That switch means other things can now occur. You have to know that when you yield your coroutine other things may mutate state, which means you now need to check that no previous assumptions have become invalid. Event loops typically being single-threaded means you don't need to lock and worry about race conditions, but it doesn't let you off the hook to not assume something mutated when you weren't looking. Plus execution order is now non-deterministic which is something else you need to worry about. To be upfront, as a steering council member I wouldn't vote to approve a PEP that proposed this change. On Fri, Jun 12, 2020 at 2:27 PM J. Pic <jpic@yourlabs.org> wrote:

...

I mean, if it pauses because of some blocking IO and switches to another coroutine, that's just a BIG win ... I have hard times trying to figure how that signal could be useful to me as a developer. On the other hand, I have to (await test()).bar ...

Greg Ewing

13 Jun 13 Jun

11:52 p.m.

Allowing the 'await' keyword to be omitted would also present some semantic and implementation difficulties. The mechanism behind 'await' is essentially the same as 'yield from', so you're more or less asking for 'yield from' to be automagically applied to the result of an expression. But this would require reading the programmer's mind, because it's quite legitimate to create an iterator and keep it around to be yielded from later. Likewise, it's legitimate to create an awaitable object and then await it later. (Personally I think it *shouldn't* be legitimate to do that in the case of await, but Guido thinks otherwise, so it is the way it is.) -- Greg

Chris Angelico

14 Jun 14 Jun

12:05 a.m.

On Sun, Jun 14, 2020 at 9:54 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

But this would require reading the programmer's mind, because it's quite legitimate to create an iterator and keep it around to be yielded from later. Likewise, it's legitimate to create an awaitable object and then await it later.

(Personally I think it *shouldn't* be legitimate to do that in the case of await, but Guido thinks otherwise, so it is the way it is.)

If it isn't, then how do you start multiple tasks in parallel? async def get_thing(id): await spam(id) await ham(id) return await internet() needed_things = [53, 110, 587] tasks = [get_thing(id) for id in needed_things] # ... now what? Somehow you need to have three tasks run concurrently, and if you weren't allowed to create an awaitable without immediately awaiting it, how would you do that? ChrisA

Greg Ewing

1:27 a.m.

On 14/06/20 12:05 pm, Chris Angelico wrote:

...

On Sun, Jun 14, 2020 at 9:54 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
Likewise, it's legitimate to create an awaitable object and then await it later.

(Personally I think it *shouldn't* be legitimate to do that in the case of await, but Guido thinks otherwise, so it is the way it is.)

If it isn't, then how do you start multiple tasks in parallel?

There would need to be a primitive that takes an async def function and creates an awaitable from it. The API for spawning tasks would then take an async function and use this primitive to get things rolling. So it wouldn't be impossible to separate the two, but you would have to go out of your way to do it. It wouldn't be the usual way to do things. (For more on this, look up the discussions about my "cofunctions" idea. It was very similar to async/await, except that the operations of calling an async function and awaiting the result were fused into a single syntactic entity.) -- Greg

Chris Angelico

1:42 a.m.

On Sun, Jun 14, 2020 at 11:29 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

On 14/06/20 12:05 pm, Chris Angelico wrote:

...
On Sun, Jun 14, 2020 at 9:54 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
Likewise, it's legitimate to create an awaitable object and then await it later.

(Personally I think it *shouldn't* be legitimate to do that in the case of await, but Guido thinks otherwise, so it is the way it is.)

If it isn't, then how do you start multiple tasks in parallel?

There would need to be a primitive that takes an async def function and creates an awaitable from it. The API for spawning tasks would then take an async function and use this primitive to get things rolling. So it wouldn't be impossible to separate the two, but you would have to go out of your way to do it. It wouldn't be the usual way to do things.

(For more on this, look up the discussions about my "cofunctions" idea. It was very similar to async/await, except that the operations of calling an async function and awaiting the result were fused into a single syntactic entity.)

Hmm, I think I see what you mean. So awaiting it would be "spam(123)" and getting an awaitable for later would be "spam.defer(123)"? That would make reasonable sense. Still, I don't think it's of value, since part of the point of coroutines is knowing exactly where a context switch could happen. If you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread). ChrisA

Kyle Stanley

4:16 a.m.

...

If you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread).

These "unexpectedly blocking actions" can be identified in asyncio's debug mode. Specifically, any callback or task step that has a duration greater than 100ms will be logged. Then, the user can take a closer look at the offending long running step. If it's like socket.gethostbyname() and is a blocking IO-bound function call, it can be executed in a thread pool using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid blocking the event loop. In 3.9, there's also a roughly equivalent higher-level function that doesn't require access to the event loop: asyncio.to_thread(socket.gethostbyname, hostname). With the default duration of 100ms, it likely wouldn't pick up on socket.gethostbyname(), but it can rather easily be adjusted via the modifiable loop.slow_callback_duration attribute. Here's a quick, trivial example: ``` import asyncio import socket async def main(): loop = asyncio.get_running_loop() loop.slow_callback_duration = .01 # 10ms socket.gethostbyname("python.org") asyncio.run(main(), debug=True) # If asyncio.run() is not an option, it can also be enabled via: # loop.set_debug() # using -X dev # PYTHONASYNCIODEBUG env var ``` Output (3.8.3): Executing <Task finished name='Task-1' coro=<main() done, defined at asyncio_debug_ex.py:5> result=None created at /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds This is a bit more involved than it is for working with threads; I just wanted to demonstrate one method of addressing the problem, as it's a decently common issue. For more details about asyncio's debug mode, see https://docs.python.org/3/library/asyncio-dev.html#debug-mode. On Sat, Jun 13, 2020 at 9:44 PM Chris Angelico <rosuav@gmail.com> wrote:

...

On Sun, Jun 14, 2020 at 11:29 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
On 14/06/20 12:05 pm, Chris Angelico wrote:

...
On Sun, Jun 14, 2020 at 9:54 AM Greg Ewing <

greg.ewing@canterbury.ac.nz> wrote:

...
...
...
Likewise, it's legitimate to create an awaitable object and then await it later.

(Personally I think it *shouldn't* be legitimate to do that in the case of await, but Guido thinks otherwise, so it is the way it is.)

If it isn't, then how do you start multiple tasks in parallel?

There would need to be a primitive that takes an async def function and creates an awaitable from it. The API for spawning tasks would then take an async function and use this primitive to get things rolling. So it wouldn't be impossible to separate the two, but you would have to go out of your way to do it. It wouldn't be the usual way to do things.

(For more on this, look up the discussions about my "cofunctions" idea. It was very similar to async/await, except that the operations of calling an async function and awaiting the result were fused into a single syntactic entity.)

Hmm, I think I see what you mean. So awaiting it would be "spam(123)" and getting an awaitable for later would be "spam.defer(123)"? That would make reasonable sense.

Still, I don't think it's of value, since part of the point of coroutines is knowing exactly where a context switch could happen. If you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread).

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WDYCTC... Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

4:41 a.m.

On Sun, Jun 14, 2020 at 2:16 PM Kyle Stanley <aeros167@gmail.com> wrote:

...

...
If you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread).

These "unexpectedly blocking actions" can be identified in asyncio's debug mode. Specifically, any callback or task step that has a duration greater than 100ms will be logged. Then, the user can take a closer look at the offending long running step. If it's like socket.gethostbyname() and is a blocking IO-bound function call, it can be executed in a thread pool using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid blocking the event loop. In 3.9, there's also a roughly equivalent higher-level function that doesn't require access to the event loop: asyncio.to_thread(socket.gethostbyname, hostname).

With the default duration of 100ms, it likely wouldn't pick up on socket.gethostbyname(), but it can rather easily be adjusted via the modifiable loop.slow_callback_duration attribute.

Here's a quick, trivial example: ``` import asyncio import socket

async def main(): loop = asyncio.get_running_loop() loop.slow_callback_duration = .01 # 10ms socket.gethostbyname("python.org")

asyncio.run(main(), debug=True) # If asyncio.run() is not an option, it can also be enabled via: # loop.set_debug() # using -X dev # PYTHONASYNCIODEBUG env var ``` Output (3.8.3): Executing <Task finished name='Task-1' coro=<main() done, defined at asyncio_debug_ex.py:5> result=None created at /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds

This is a bit more involved than it is for working with threads; I just wanted to demonstrate one method of addressing the problem, as it's a decently common issue. For more details about asyncio's debug mode, see https://docs.python.org/3/library/asyncio-dev.html#debug-mode.

IOW the solution to the problem is to use threads. You can see here why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads. (Yes, you can asynchronously do a DNS lookup rather than using gethostbyname, but the semantics aren't identical, and you may seriously annoy someone who uses other forms of name resolution. So that doesn't count.) As an additional concern, you don't always know which operations are going to be slow. For instance, maybe during testing (with debug=True), your DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all? That's why threads are so convenient for these kinds of jobs. Disadvantages of threads: 1) Overhead. If you make one thread for each task, your maximum simultaneous tasks can potentially be capped. Irrelevant if each task is doing things with far greater overhead anyway. 2) Unexpected context switching. Unless you use locks, a context switch can occur at any time. The GIL ensures that this won't corrupt Python's internal data structures, but you have to be aware of it with any mutable globals or shared state. 3) Steven D'Aprano is terrified of them and will rail on you for using threads. Disadvantages of asyncio: 1) Code complexity. You have to explicitly show which things are waiting on which others. 2) Unexpected LACK of context switching. Unless you use await, a context switch cannot occur. Take your pick. Figure out what your task needs. Both exist for good reasons. ChrisA

Kyle Stanley

5:42 a.m.

...

IOW the solution to the problem is to use threads. You can see here why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads.

In the case of the above example, I'd say it's more so "use coroutines by default and threads as needed" rather than just using threads, but fair enough. I'll concede that point.

...

For instance, maybe during testing (with debug=True), your DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all?

...

3) Steven D'Aprano is terrified of them and will rail on you for using

That's very situationally dependent, but for any IO-bound call with a variable time where async isn't an option (either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively. This is easier said than done of course and it's very possible for some to be glossed over. If it's missed though, I don't think it's too much effort to change it over; IMO the main challenge is more so with locating all of them in production for a large, existing codebase. threads. Haha, I've somehow completely missed that. I CC'd Steven in the response, since I'm curious as to what he has to say about that.

...

Take your pick. Figure out what your task needs. Both exist for good reasons.

Completely agreed, threads and coroutines are two completely different approaches, with neither one being clearly superior for all situations. Even as someone who's invested a significant amount of time in helping to improve asyncio recently, I'll admit that I decently often encounter users that would be better off using threads. Particularly for code that isn't performance or resource critical, or when it involves a reasonably small number of concurrent operations that aren't expected to scale in volume significantly. The fine-grained control over context switching (which can be a pro or a con), shorter switch delay, and lower resource usage from coroutines isn't always worth the added code complexity. On Sun, Jun 14, 2020 at 12:43 AM Chris Angelico <rosuav@gmail.com> wrote:

...

On Sun, Jun 14, 2020 at 2:16 PM Kyle Stanley <aeros167@gmail.com> wrote:

...
...
If you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread).

These "unexpectedly blocking actions" can be identified in asyncio's

debug mode. Specifically, any callback or task step that has a duration greater than 100ms will be logged. Then, the user can take a closer look at the offending long running step. If it's like socket.gethostbyname() and is a blocking IO-bound function call, it can be executed in a thread pool using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid blocking the event loop. In 3.9, there's also a roughly equivalent higher-level function that doesn't require access to the event loop: asyncio.to_thread(socket.gethostbyname, hostname).

...
With the default duration of 100ms, it likely wouldn't pick up on

socket.gethostbyname(), but it can rather easily be adjusted via the modifiable loop.slow_callback_duration attribute.

...
Here's a quick, trivial example: ``` import asyncio import socket

async def main(): loop = asyncio.get_running_loop() loop.slow_callback_duration = .01 # 10ms socket.gethostbyname("python.org")

asyncio.run(main(), debug=True) # If asyncio.run() is not an option, it can also be enabled via: # loop.set_debug() # using -X dev # PYTHONASYNCIODEBUG env var ``` Output (3.8.3): Executing <Task finished name='Task-1' coro=<main() done, defined at

asyncio_debug_ex.py:5> result=None created at /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds

...
This is a bit more involved than it is for working with threads; I just

wanted to demonstrate one method of addressing the problem, as it's a decently common issue. For more details about asyncio's debug mode, see https://docs.python.org/3/library/asyncio-dev.html#debug-mode.

...
IOW the solution to the problem is to use threads. You can see here why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads. (Yes, you can asynchronously do a DNS lookup rather than using gethostbyname, but the semantics aren't identical, and you may seriously annoy someone who uses other forms of name resolution. So that doesn't count.) As an additional concern, you don't always know which operations are going to be slow. For instance, maybe during testing (with debug=True), your DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all?

That's why threads are so convenient for these kinds of jobs.

Disadvantages of threads: 1) Overhead. If you make one thread for each task, your maximum simultaneous tasks can potentially be capped. Irrelevant if each task is doing things with far greater overhead anyway. 2) Unexpected context switching. Unless you use locks, a context switch can occur at any time. The GIL ensures that this won't corrupt Python's internal data structures, but you have to be aware of it with any mutable globals or shared state. 3) Steven D'Aprano is terrified of them and will rail on you for using threads.

Disadvantages of asyncio: 1) Code complexity. You have to explicitly show which things are waiting on which others. 2) Unexpected LACK of context switching. Unless you use await, a context switch cannot occur.

Take your pick. Figure out what your task needs. Both exist for good reasons.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AJ2EOL... Code of Conduct: http://python.org/psf/codeofconduct/

Kyle Stanley

5:54 a.m.

...

for any IO-bound call with a variable time where async isn't an option (either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively.

Clarification: this pretty much applies to any non-async IO-bound call that can block the event loop. You can definitely get away with ignoring some that have a consistently negligible duration, but I would not *directly* call any of them that could vary significantly in time (or are consistently long running) within a coroutine. Otherwise, it's a complete gamble as to how long it stalls the rest of the program, which is generally not desirable to say the least. On Sun, Jun 14, 2020 at 1:42 AM Kyle Stanley <aeros167@gmail.com> wrote:

...

...
IOW the solution to the problem is to use threads. You can see here why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads.

In the case of the above example, I'd say it's more so "use coroutines by default and threads as needed" rather than just using threads, but fair enough. I'll concede that point.

...
For instance, maybe during testing (with debug=True), your DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all?

That's very situationally dependent, but for any IO-bound call with a variable time where async isn't an option (either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively. This is easier said than done of course and it's very possible for some to be glossed over. If it's missed though, I don't think it's too much effort to change it over; IMO the main challenge is more so with locating all of them in production for a large, existing codebase.

...
3) Steven D'Aprano is terrified of them and will rail on you for using threads.

Haha, I've somehow completely missed that. I CC'd Steven in the response, since I'm curious as to what he has to say about that.

...
Take your pick. Figure out what your task needs. Both exist for good reasons.

Completely agreed, threads and coroutines are two completely different approaches, with neither one being clearly superior for all situations. Even as someone who's invested a significant amount of time in helping to improve asyncio recently, I'll admit that I decently often encounter users that would be better off using threads. Particularly for code that isn't performance or resource critical, or when it involves a reasonably small number of concurrent operations that aren't expected to scale in volume significantly. The fine-grained control over context switching (which can be a pro or a con), shorter switch delay, and lower resource usage from coroutines isn't always worth the added code complexity.

On Sun, Jun 14, 2020 at 12:43 AM Chris Angelico <rosuav@gmail.com> wrote:

...
On Sun, Jun 14, 2020 at 2:16 PM Kyle Stanley <aeros167@gmail.com> wrote:

...
...
If you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread).

These "unexpectedly blocking actions" can be identified in asyncio's

debug mode. Specifically, any callback or task step that has a duration greater than 100ms will be logged. Then, the user can take a closer look at the offending long running step. If it's like socket.gethostbyname() and is a blocking IO-bound function call, it can be executed in a thread pool using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid blocking the event loop. In 3.9, there's also a roughly equivalent higher-level function that doesn't require access to the event loop: asyncio.to_thread(socket.gethostbyname, hostname).

...
With the default duration of 100ms, it likely wouldn't pick up on

socket.gethostbyname(), but it can rather easily be adjusted via the modifiable loop.slow_callback_duration attribute.

...
Here's a quick, trivial example: ``` import asyncio import socket

async def main(): loop = asyncio.get_running_loop() loop.slow_callback_duration = .01 # 10ms socket.gethostbyname("python.org")

asyncio.run(main(), debug=True) # If asyncio.run() is not an option, it can also be enabled via: # loop.set_debug() # using -X dev # PYTHONASYNCIODEBUG env var ``` Output (3.8.3): Executing <Task finished name='Task-1' coro=<main() done, defined at

asyncio_debug_ex.py:5> result=None created at /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds

...
This is a bit more involved than it is for working with threads; I just

wanted to demonstrate one method of addressing the problem, as it's a decently common issue. For more details about asyncio's debug mode, see https://docs.python.org/3/library/asyncio-dev.html#debug-mode.

...
IOW the solution to the problem is to use threads. You can see here why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads. (Yes, you can asynchronously do a DNS lookup rather than using gethostbyname, but the semantics aren't identical, and you may seriously annoy someone who uses other forms of name resolution. So that doesn't count.) As an additional concern, you don't always know which operations are going to be slow. For instance, maybe during testing (with debug=True), your DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all?

That's why threads are so convenient for these kinds of jobs.

Disadvantages of threads: 1) Overhead. If you make one thread for each task, your maximum simultaneous tasks can potentially be capped. Irrelevant if each task is doing things with far greater overhead anyway. 2) Unexpected context switching. Unless you use locks, a context switch can occur at any time. The GIL ensures that this won't corrupt Python's internal data structures, but you have to be aware of it with any mutable globals or shared state. 3) Steven D'Aprano is terrified of them and will rail on you for using threads.

Disadvantages of asyncio: 1) Code complexity. You have to explicitly show which things are waiting on which others. 2) Unexpected LACK of context switching. Unless you use await, a context switch cannot occur.

Take your pick. Figure out what your task needs. Both exist for good reasons.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AJ2EOL... Code of Conduct: http://python.org/psf/codeofconduct/

Greg Ewing

4:37 a.m.

On 14/06/20 1:42 pm, Chris Angelico wrote:

...

Hmm, I think I see what you mean. So awaiting it would be "spam(123)"

No, it would still be "await spam(123)". It's just that you wouldn't be able to separate the await from the call, so this wouldn't be allowed: a = spam(123) await a Incidentally, I think that "async" and "await" are terrible names for what they represent. In my idea they were spelled "codef" and "cocall". But we seem to be stuck with async/await now. -- Greg

1575

Age (days ago)

1577

Last active (days ago)

List overview

Download

23 comments

6 participants

participants (6)

Brett Cannon
Chris Angelico
Greg Ewing
J. Pic
Kyle Stanley
Steven D'Aprano

await by default

tags

participants (6)