for any IO-bound call with a variable time where async isn't an option
(either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively.
Clarification: this pretty much applies to any non-async IO-bound call that can block the event loop. You can definitely get away with ignoring some that have a consistently negligible duration, but I would not *directly* call any of them that could vary significantly in time (or are consistently long running) within a coroutine. Otherwise, it's a complete gamble as to how long it stalls the rest of the program, which is generally not desirable to say the least.
On Sun, Jun 14, 2020 at 1:42 AM Kyle Stanley firstname.lastname@example.org wrote:
IOW the solution to the problem is to use threads. You can see here
why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads.
In the case of the above example, I'd say it's more so "use coroutines by default and threads as needed" rather than just using threads, but fair enough. I'll concede that point.
For instance, maybe during testing (with debug=True), your
DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all?
That's very situationally dependent, but for any IO-bound call with a variable time where async isn't an option (either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively. This is easier said than done of course and it's very possible for some to be glossed over. If it's missed though, I don't think it's too much effort to change it over; IMO the main challenge is more so with locating all of them in production for a large, existing codebase.
- Steven D'Aprano is terrified of them and will rail on you for using
Haha, I've somehow completely missed that. I CC'd Steven in the response, since I'm curious as to what he has to say about that.
Take your pick. Figure out what your task needs. Both exist for good
Completely agreed, threads and coroutines are two completely different approaches, with neither one being clearly superior for all situations. Even as someone who's invested a significant amount of time in helping to improve asyncio recently, I'll admit that I decently often encounter users that would be better off using threads. Particularly for code that isn't performance or resource critical, or when it involves a reasonably small number of concurrent operations that aren't expected to scale in volume significantly. The fine-grained control over context switching (which can be a pro or a con), shorter switch delay, and lower resource usage from coroutines isn't always worth the added code complexity.
On Sun, Jun 14, 2020 at 12:43 AM Chris Angelico email@example.com wrote:
On Sun, Jun 14, 2020 at 2:16 PM Kyle Stanley firstname.lastname@example.org wrote:
you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread).
These "unexpectedly blocking actions" can be identified in asyncio's
debug mode. Specifically, any callback or task step that has a duration greater than 100ms will be logged. Then, the user can take a closer look at the offending long running step. If it's like socket.gethostbyname() and is a blocking IO-bound function call, it can be executed in a thread pool using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid blocking the event loop. In 3.9, there's also a roughly equivalent higher-level function that doesn't require access to the event loop: asyncio.to_thread(socket.gethostbyname, hostname).
With the default duration of 100ms, it likely wouldn't pick up on
socket.gethostbyname(), but it can rather easily be adjusted via the modifiable loop.slow_callback_duration attribute.
Here's a quick, trivial example:
import asyncio import socket async def main(): loop = asyncio.get_running_loop() loop.slow_callback_duration = .01 # 10ms socket.gethostbyname("python.org") asyncio.run(main(), debug=True) # If asyncio.run() is not an option, it can also be enabled via: # loop.set_debug() # using -X dev # PYTHONASYNCIODEBUG env var
Output (3.8.3): Executing <Task finished name='Task-1' coro=<main() done, defined at
asyncio_debug_ex.py:5> result=None created at /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds
This is a bit more involved than it is for working with threads; I just
wanted to demonstrate one method of addressing the problem, as it's a decently common issue. For more details about asyncio's debug mode, see https://docs.python.org/3/library/asyncio-dev.html#debug-mode.
IOW the solution to the problem is to use threads. You can see here why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads. (Yes, you can asynchronously do a DNS lookup rather than using gethostbyname, but the semantics aren't identical, and you may seriously annoy someone who uses other forms of name resolution. So that doesn't count.) As an additional concern, you don't always know which operations are going to be slow. For instance, maybe during testing (with debug=True), your DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all?
That's why threads are so convenient for these kinds of jobs.
Disadvantages of threads:
- Overhead. If you make one thread for each task, your maximum
simultaneous tasks can potentially be capped. Irrelevant if each task is doing things with far greater overhead anyway. 2) Unexpected context switching. Unless you use locks, a context switch can occur at any time. The GIL ensures that this won't corrupt Python's internal data structures, but you have to be aware of it with any mutable globals or shared state. 3) Steven D'Aprano is terrified of them and will rail on you for using threads.
Disadvantages of asyncio:
- Code complexity. You have to explicitly show which things are
waiting on which others. 2) Unexpected LACK of context switching. Unless you use await, a context switch cannot occur.
Take your pick. Figure out what your task needs. Both exist for good reasons.
ChrisA _______________________________________________ Python-ideas mailing list -- email@example.com To unsubscribe send an email to firstname.lastname@example.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://email@example.com/message/AJ2EOL... Code of Conduct: http://python.org/psf/codeofconduct/