On Sun, Jun 14, 2020 at 2:16 PM Kyle Stanley aeros167@gmail.com wrote:
If
you're fine with invisible context switches, you're probably better off with threads, because they're not vulnerable to unexpectedly blocking actions (a common culprit being name lookups before network transactions - you can connect sockets asynchronously, but gethostbyname will block the current thread).
These "unexpectedly blocking actions" can be identified in asyncio's debug mode. Specifically, any callback or task step that has a duration greater than 100ms will be logged. Then, the user can take a closer look at the offending long running step. If it's like socket.gethostbyname() and is a blocking IO-bound function call, it can be executed in a thread pool using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid blocking the event loop. In 3.9, there's also a roughly equivalent higher-level function that doesn't require access to the event loop: asyncio.to_thread(socket.gethostbyname, hostname).
With the default duration of 100ms, it likely wouldn't pick up on socket.gethostbyname(), but it can rather easily be adjusted via the modifiable loop.slow_callback_duration attribute.
Here's a quick, trivial example:
import asyncio import socket async def main(): loop = asyncio.get_running_loop() loop.slow_callback_duration = .01 # 10ms socket.gethostbyname("python.org") asyncio.run(main(), debug=True) # If asyncio.run() is not an option, it can also be enabled via: # loop.set_debug() # using -X dev # PYTHONASYNCIODEBUG env var
Output (3.8.3): Executing <Task finished name='Task-1' coro=<main() done, defined at asyncio_debug_ex.py:5> result=None created at /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds
This is a bit more involved than it is for working with threads; I just wanted to demonstrate one method of addressing the problem, as it's a decently common issue. For more details about asyncio's debug mode, see https://docs.python.org/3/library/asyncio-dev.html#debug-mode.
IOW the solution to the problem is to use threads. You can see here why I said what I did: threads specifically avoid this problem and the only way for asyncio to avoid it is to use threads. (Yes, you can asynchronously do a DNS lookup rather than using gethostbyname, but the semantics aren't identical, and you may seriously annoy someone who uses other forms of name resolution. So that doesn't count.) As an additional concern, you don't always know which operations are going to be slow. For instance, maybe during testing (with debug=True), your DNS lookups are always reasonably fast, but then some time after deployment, you find that they're stalling you out. How much effort is it to change this over? How many other things are going to be slow, and can you find them all?
That's why threads are so convenient for these kinds of jobs.
Disadvantages of threads: 1) Overhead. If you make one thread for each task, your maximum simultaneous tasks can potentially be capped. Irrelevant if each task is doing things with far greater overhead anyway. 2) Unexpected context switching. Unless you use locks, a context switch can occur at any time. The GIL ensures that this won't corrupt Python's internal data structures, but you have to be aware of it with any mutable globals or shared state. 3) Steven D'Aprano is terrified of them and will rail on you for using threads.
Disadvantages of asyncio: 1) Code complexity. You have to explicitly show which things are waiting on which others. 2) Unexpected LACK of context switching. Unless you use await, a context switch cannot occur.
Take your pick. Figure out what your task needs. Both exist for good reasons.
ChrisA