> for any IO-bound call with a variable time where async isn't an option (either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively.

Clarification: this pretty much applies to any non-async IO-bound call that can block the event loop. You can definitely get away with ignoring some that have a consistently negligible duration, but I would not directly call any of them that could vary significantly in time (or are consistently long running) within a coroutine. Otherwise, it's a complete gamble as to how long it stalls the rest of the program, which is generally not desirable to say the least.

On Sun, Jun 14, 2020 at 1:42 AM Kyle Stanley <aeros167@gmail.com> wrote:
> IOW the solution to the problem is to use threads. You can see here
why I said what I did: threads specifically avoid this problem and the
only way for asyncio to avoid it is to use threads.

In the case of the above example, I'd say it's more so "use coroutines by default and threads as needed" rather than just using threads, but fair enough. I'll concede that point.

> For instance, maybe during testing (with debug=True), your
DNS lookups are always reasonably fast, but then some time after
deployment, you find that they're stalling you out. How much effort is
it to change this over? How many other things are going to be slow,
and can you find them all?

That's very situationally dependent, but for any IO-bound call with a variable time where async isn't an option (either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively. This is easier said than done of course and it's very possible for some to be glossed over. If it's missed though, I don't think it's too much effort to change it over; IMO the main challenge is more so with locating all of them in production for a large, existing codebase.

> 3) Steven D'Aprano is terrified of them and will rail on you for using threads.

Haha, I've somehow completely missed that. I CC'd Steven in the response, since I'm curious as to what he has to say about that.

> Take your pick. Figure out what your task needs. Both exist for good reasons.

Completely agreed, threads and coroutines are two completely different approaches, with neither one being clearly superior for all situations. Even as someone who's invested a significant amount of time in helping to improve asyncio recently, I'll admit that I decently often encounter users that would be better off using threads. Particularly for code that isn't performance or resource critical, or when it involves a reasonably small number of concurrent operations that aren't expected to scale in volume significantly. The fine-grained control over context switching (which can be a pro or a con), shorter switch delay, and lower resource usage from coroutines isn't always worth the added code complexity.



On Sun, Jun 14, 2020 at 12:43 AM Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Jun 14, 2020 at 2:16 PM Kyle Stanley <aeros167@gmail.com> wrote:
>
> > If
> you're fine with invisible context switches, you're probably better
> off with threads, because they're not vulnerable to unexpectedly
> blocking actions (a common culprit being name lookups before network
> transactions - you can connect sockets asynchronously, but
> gethostbyname will block the current thread).
>
> These "unexpectedly blocking actions" can be identified in asyncio's debug mode. Specifically, any callback or task step that has a duration greater than 100ms will be logged. Then, the user can take a closer look at the offending long running step. If it's like socket.gethostbyname() and is a blocking IO-bound function call, it can be executed in a thread pool using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid blocking the event loop. In 3.9, there's also a roughly equivalent higher-level function that doesn't require access to the event loop: asyncio.to_thread(socket.gethostbyname, hostname).
>
> With the default duration of 100ms, it likely wouldn't pick up on socket.gethostbyname(), but it can rather easily be adjusted via the modifiable loop.slow_callback_duration attribute.
>
> Here's a quick, trivial example:
> ```
> import asyncio
> import socket
>
> async def main():
>     loop = asyncio.get_running_loop()
>     loop.slow_callback_duration = .01 # 10ms
>     socket.gethostbyname("python.org")
>
> asyncio.run(main(), debug=True)
> # If asyncio.run() is not an option, it can also be enabled via:
> #     loop.set_debug()
> #     using -X dev
> #     PYTHONASYNCIODEBUG env var
> ```
> Output (3.8.3):
> Executing <Task finished name='Task-1' coro=<main() done, defined at asyncio_debug_ex.py:5> result=None created at /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds
>
> This is a bit more involved than it is for working with threads; I just wanted to demonstrate one method of addressing the problem, as it's a decently common issue. For more details about asyncio's debug mode, see https://docs.python.org/3/library/asyncio-dev.html#debug-mode.
>

IOW the solution to the problem is to use threads. You can see here
why I said what I did: threads specifically avoid this problem and the
only way for asyncio to avoid it is to use threads. (Yes, you can
asynchronously do a DNS lookup rather than using gethostbyname, but
the semantics aren't identical, and you may seriously annoy someone
who uses other forms of name resolution. So that doesn't count.) As an
additional concern, you don't always know which operations are going
to be slow. For instance, maybe during testing (with debug=True), your
DNS lookups are always reasonably fast, but then some time after
deployment, you find that they're stalling you out. How much effort is
it to change this over? How many other things are going to be slow,
and can you find them all?

That's why threads are so convenient for these kinds of jobs.

Disadvantages of threads:
1) Overhead. If you make one thread for each task, your maximum
simultaneous tasks can potentially be capped. Irrelevant if each task
is doing things with far greater overhead anyway.
2) Unexpected context switching. Unless you use locks, a context
switch can occur at any time. The GIL ensures that this won't corrupt
Python's internal data structures, but you have to be aware of it with
any mutable globals or shared state.
3) Steven D'Aprano is terrified of them and will rail on you for using threads.

Disadvantages of asyncio:
1) Code complexity. You have to explicitly show which things are
waiting on which others.
2) Unexpected LACK of context switching. Unless you use await, a
context switch cannot occur.

Take your pick. Figure out what your task needs. Both exist for good reasons.

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AJ2EOLSWSOAPSUG7BOM5MF3CHP3BHS3H/
Code of Conduct: http://python.org/psf/codeofconduct/