Re: [Python-Dev] A more flexible task creation

Hi, I've been using asyncio a lot lately and have encountered this problem several times. Imagine you want to do a lot of queries against a database, spawning 10000 tasks in parallel will probably cause a lot of them to fail. What you need in a task pool of sorts, to limit concurrency and do only 20 requests in parallel. If we were doing this synchronously, we wouldn't spawn 10000 threads using 10000 connections, we would use a thread pool with a limited number of threads and submit the jobs into its queue. To me, tasks are (somewhat) logically analogous to threads. The solution that first comes to mind is to create an AsyncioTaskExecutor with a submit(coro, *args, **kwargs) method. Put a reference to the coroutine and its arguments into an asyncio queue. Spawn n tasks pulling from this queue and awaiting the coroutines. It'd probably be useful to have this in the stdlib at some point. Date: Wed, 13 Jun 2018 22:45:22 +0200

On Thu, 14 Jun 2018 at 17:40, Tin Tvrtković <tinchester@gmail.com> wrote:
It'd probably be useful to have this in the stdlib at some point.
Probably a good idea, yes, because it seems a rather common use case. OTOH, I did something similar but for a different use case. In my case, I have a Watchdog class, that takes a list of (coro, *args, **kwargs). What it does is ensure there is always a task for each of the co-routines running, and watches the tasks, if they crash they are automatically restarted (with logging). Then there is a stop() method to cancel the watchdog-managed tasks and await them. My use case is because I tend to write a lot of singleton-style objects, which need book keeping tasks, or redis pubsub listening tasks, and my primary concern is not starting lots of tasks, it is that the few tasks I have must be restarted if they crash, forever. This is why I think it's not that hard to write "sugar" APIs on top of asyncio, and everyone's needs will be different. The strict API compatibility requirements of core Python stdlib, coupled with the very long feature release life-cycles of Python, make me think this sort of thing perhaps is better built in an utility library on top of asyncio, rather than inside asyncio itself? 18 months is a long long time to iterate on these features. I can't wait for Python 3.8...
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

A lot of my late requests come from my attempt to group some of that in a lib: https://github.com/Tygs/ayo Most of it works, although I got read of context() recently, but the lazy task part really fails. Indeed, the API allows to do: async with ayo.scope() as run: task_list = run.all(foo(), foo(), foo()) run.asap(bar()) await task_list.gather() run.asap(baz()) scope() return a nursery like object, and this works perfectly, with the usual guaranty of Trio's nursery, but working in asyncio right now. However, I tried to add to the mix: async with ayo.scope(max_concurrency=2) as run: task_list = run.all(foo(), foo(), foo()) run.asap(bar()) await task_list.gather() run.asap(baz()) And I can get it to work. task_list will right now contains a list of tasks and None, because some tasks are not scheduled immediately. That's why I wanted lazytasks. I tried to create my own lazy tasks, but it never really worked. I'm going to try to go down the road of wrapping the unscheduled coro in a future-like object as suggested by Yuri. But having that built-in seems logical, elegant, and just good design in general: __init__ should not have side effects.

On Fri, 15 Jun 2018 at 09:18, Michel Desmoulin <desmoulinmichel@gmail.com> wrote:
Ah, good idea.
To be honest, I see "async with" being abused everywhere in asyncio, lately. I like to have objects with start() and stop() methods, but everywhere I see async context managers. Fine, add nursery or whatever, but please also have a simple start() / stop() public API. "async with" is only good for functional programming. If you want to go more of an object-oriented style, you tend to have start() and stop() methods in your classes, which will call start() & stop() (or close()) methods recursively on nested resources. So of the libraries (aiopg, I'm looking at you) don't support start/stop or open/close well.
I tend to slightly agree, but OTOH if asyncio had been designed to not schedule tasks automatically on __init__ I bet there would have been other users complaining that "why didn't task XX run?", or "why do tasks need a start() method, that is clunky!". You can't please everyone... Also, in task_list = run.all(foo(), foo(), foo()) As soon as you call foo(), you are instantiating a coroutine, which consumes memory, while the task may not even be scheduled for a long time (if you have 5000 potential tasks but only execute 10 at a time, for example). But if you do as Yuri suggested, you'll instead accept a function reference, foo, which is a singleton, you can have many foo references to the function, but they will only create coroutine objects when the task is actually about to be scheduled, so it's more efficient in terms of memory. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

Wouldn't calling __enter__ and __exit__ manually works for you ? I started coding begin() and stop(), but I removed them, as I couldn't find a use case for them. And what exactly is the use case that doesn't work with `async with` ? The whole point is to spot the boundaries of the tasks execution easily. If you start()/stop() randomly, it kinda defeat the purpose. It's a genuine question though. I can totally accept I overlooked a valid use case.
Well, ensure_future([schedule_immediatly=True]) and asyncio.create_task([schedule_immediatly=True] would take care of that. They are the entry point for the task creation and scheduling.
Yes but this has the benefit of accepting any awaitable, not just coroutine. You don't have to wonder what to pass, or which form. It's always the same. Too many APi are hard to understand because you never know if it accept a callback, a coroutine function, a coroutine, a task, a future... For the same reason, request.get() create and destroys a session every time. It's inefficient, but way easier to understand, and fits the majority of the use cases.
I made some test, and the memory consumption is indeed radically smaller if you just store references if you just compare it to the same unique raw coroutine. However, this is a rare case. It assumes that: - you have a lot of tasks - you have a max concurrency - the max concurrency is very small - most tasks reuse a similar combination of callables and parameters It's a very specific narrow case. Also, everything you store on the scope will be wrapped into a Future object no matter if it's scheduled or not, so that you can cancel it later. So the scale of the memory consumption is not as much. I didn't want to compromise the quality of the current API for the general case for an edge case optimization. On the other hand, this is a low hanging fruit and on plateforms such as raspi where asyncio has a lot to offer, it can make a big difference to shave up 20 of memory consumption of a specific workload. So I listened and implemented an escape hatch: import random import asyncio import ayo async def zzz(seconds): await asyncio.sleep(seconds) print(f'Slept for {seconds} seconds') @ayo.run_as_main() async def main(run_in_top): async with ayo.scope(max_concurrency=10) as run: for _ in range(10000): run.from_callable(zzz, 0.005) # or run.asap(zzz(0.005)) This would only lazily create the awaitable (here the coroutine) on scheduling. I see a 15% of memory saving for the WHOLE program if using `from_callable()`. So definitly a good feature to have, thank you. But again, and I hope Yuri is reading this because he will implement that for uvloop, and this will trickles down to asyncio, I think we should not compromise the main API for this. asyncio is hard enough to grok, and too many concepts fly around. The average Python programmer has been experienced way easier things from past Python encounter. If we want, one day, that asyncio is consider the clean AND easy way to do async, we need to work on the API. asyncio.run() is a step in the right direction (although again I wish we implemented that 2 years ago when I talked about it instead of telling me no). Now if we add nurseries, it should hide the rest of the complexity. Not add to it.

Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here. But if I have this right: I've been using asyncio a lot lately and have encountered this problem
several times. Imagine you want to do a lot of queries against a database, spawning 10000 tasks in parallel will probably cause a lot of them to fail.
async is not parallel -- all the tasks will be run in the same thread (Unless you explicitly spawn another thread), and only one task is running at once, and the task switching happens when the task specifically releases itself. If it matters in what order the tasks are performed, then you should not be using async. So why do queries fail with 10000 tasks? or ANY number? If the async DB access code is written right, a given query should not "await" unless it is in a safe state to do so. So what am I missing here??? What you need in a task pool of sorts, to limit concurrency and do only 20
requests in parallel.
still wrapping my head around the vocabulary, but async is not concurrent. If we were doing this synchronously, we wouldn't spawn 10000 threads using
10000 connections,
and threads aren't synchronous -- but they are concurrent.
we would use a thread pool with a limited number of threads and submit the jobs into its queue.
because threads ARE concurrent, and there is no advantage to having more threads than can actually run at once, and having many more does cause thread-switching performance issues. To me, tasks are (somewhat) logically analogous to threads.
kinda -- in the sense that they are run (and completed) in arbitrary order, But they are different, and that difference is key to this issue. As Yury expressed interest in this idea, there must be something I'm missing. What is it? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Jun 14, 2018 at 9:17 PM Chris Barker via Python-Dev < python-dev@python.org> wrote:
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here.
Vocabulary-wise 'queue depth' might be a suitable mental aid for what people actually want to limit. The practical issue is most likely something to do with hitting timeouts when trying to queue too much work onto a service. -- Joni Orponen

On 14Jun2018 1214, Chris Barker via Python-Dev wrote:
If the task isn't actually doing the work, but merely waiting for it to finish, then you can end up overloading the thing that *is* doing the task (e.g. the network interface, database server, other thread/process, file system, etc.). Single-threaded async is actually all about *waiting* - it provides a convenient model to do other tasks while you are waiting for the first (as well as a convenient model to indicate what should be done after it completes - there are two conveniences here). If the underlying thing you're doing *can* run in parallel, but becomes less efficient the more times you do it (for example, most file system operations fall into this category), you will want to limit how many tasks you *start*, not just how many you are waiting for. I often use semaphores for this when I need it, and it looks like asyncio.Semaphore() is sufficient for this: import asyncio task_limiter = asyncio.Semaphore(4) async def my_task(): await task_limiter.acquire() try: await do_db_request() finally: task_limiter.release() Cheers, Steve

On Thu, Jun 14, 2018 at 10:03 PM Steve Dower <steve.dower@python.org> wrote:
Yeah, a semaphore logically fits exactly but * I feel this API is somewhat clunky, even if you use an 'async with'. * my gut feeling is spawning a thousand tasks and having them all fighting over the same semaphore and scheduling is going to be much less efficient than a small number of tasks draining a queue.

On Thu, Jun 14, 2018 at 3:31 PM, Tin Tvrtković <tinchester@gmail.com> wrote:
Fundamentally, a Semaphore is a queue: https://github.com/python/cpython/blob/9e7c92193cc98fd3c2d4751c87851460a33b9... ...so the two approaches are more analogous than it might appear at first. The big difference is what objects are in the queue. For a web scraper, the options might be either a queue where each entry is a URL represented as a str, versus a queue where each entry is (effectively) a Task object with attached coroutine object. So I think the main differences you'll see in practice are: - a Task + coroutine aren't terribly big -- maybe a few kilobytes -- but definitely larger than a str; so the Semaphore approach will take more RAM. Modern machines have lots of RAM, so for many use cases this is still probably fine (50,000 tasks is really not that many). But there will certainly be some situations where the str queue fits in RAM but the Task queue doesn't. - If you create all those Task objects up front, then that front-loads a chunk of work (i.e., allocating all those objects!) that otherwise would be spread throughout the queue processing. So you'll see a noticeable pause up front before the code starts working. -n -- Nathaniel J. Smith -- https://vorpus.org

Other folks have already chimed in, so I'll be to the point. Try writing a simple asyncio web scraper (using maybe the aiohttp library) and create 5000 tasks for scraping different sites. My prediction is a whole lot of them will time out due to various reasons. Other responses inline. On Thu, Jun 14, 2018 at 9:15 PM Chris Barker <chris.barker@noaa.gov> wrote:
asyncio is mostly used for IO-heavy workloads (note the name). If you're doing IO in asyncio, it is most definitely parallel. The point of it is having a large number of open network connections at the same time.
Imagine you have a batch job you need to do. You need to fetch a million records from your database, and you can't use a query to get them all - you need a million individual "get" requests. Even if Python was infinitely fast, and your bandwidth was infinite, can your database handle opening a million new connections in parallel, in a very short time? Mine sure can't, even a few hundred extra connections would be a potential problem. So you want to do the work in chunks, but still not one by one.
and threads aren't synchronous -- but they are concurrent.
Using threads implies coupling threads with IO. Doing requests one at a time in a given thread. Generally called 'synchronous IO', as opposed to asynchronous IO/asyncio.
Weeell technically threads in CPython aren't really concurrent (when running Python bytecode), but for doing IO they are in practice. When doing IO, there absolutely is an advantage to using more threads than can run at once (in CPython only one thread running Python can run at once). You can test it out yourself by writing a synchronous web scraper (using maybe the requests library) and trying to scrape using a threadpool vs using a single thread. You'll find the threadpool version is much faster.

On Thu, Jun 14, 2018 at 8:14 PM, Chris Barker via Python-Dev < python-dev@python.org> wrote:
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here.
All tasks need resources, and bookkeeping for such tasks is likely to slow things down. More importantly, with an uncontrolled number of tasks you can require an uncontrolled use of resources, decreasing efficiency to levels well below that attainable with sensible conservation of resources. Imagine, if you will, a task that starts by allocating 1GB of memory. Would you want 10,000 of those?

On Thu, 14 Jun 2018 at 17:40, Tin Tvrtković <tinchester@gmail.com> wrote:
It'd probably be useful to have this in the stdlib at some point.
Probably a good idea, yes, because it seems a rather common use case. OTOH, I did something similar but for a different use case. In my case, I have a Watchdog class, that takes a list of (coro, *args, **kwargs). What it does is ensure there is always a task for each of the co-routines running, and watches the tasks, if they crash they are automatically restarted (with logging). Then there is a stop() method to cancel the watchdog-managed tasks and await them. My use case is because I tend to write a lot of singleton-style objects, which need book keeping tasks, or redis pubsub listening tasks, and my primary concern is not starting lots of tasks, it is that the few tasks I have must be restarted if they crash, forever. This is why I think it's not that hard to write "sugar" APIs on top of asyncio, and everyone's needs will be different. The strict API compatibility requirements of core Python stdlib, coupled with the very long feature release life-cycles of Python, make me think this sort of thing perhaps is better built in an utility library on top of asyncio, rather than inside asyncio itself? 18 months is a long long time to iterate on these features. I can't wait for Python 3.8...
-- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

A lot of my late requests come from my attempt to group some of that in a lib: https://github.com/Tygs/ayo Most of it works, although I got read of context() recently, but the lazy task part really fails. Indeed, the API allows to do: async with ayo.scope() as run: task_list = run.all(foo(), foo(), foo()) run.asap(bar()) await task_list.gather() run.asap(baz()) scope() return a nursery like object, and this works perfectly, with the usual guaranty of Trio's nursery, but working in asyncio right now. However, I tried to add to the mix: async with ayo.scope(max_concurrency=2) as run: task_list = run.all(foo(), foo(), foo()) run.asap(bar()) await task_list.gather() run.asap(baz()) And I can get it to work. task_list will right now contains a list of tasks and None, because some tasks are not scheduled immediately. That's why I wanted lazytasks. I tried to create my own lazy tasks, but it never really worked. I'm going to try to go down the road of wrapping the unscheduled coro in a future-like object as suggested by Yuri. But having that built-in seems logical, elegant, and just good design in general: __init__ should not have side effects.

On Fri, 15 Jun 2018 at 09:18, Michel Desmoulin <desmoulinmichel@gmail.com> wrote:
Ah, good idea.
To be honest, I see "async with" being abused everywhere in asyncio, lately. I like to have objects with start() and stop() methods, but everywhere I see async context managers. Fine, add nursery or whatever, but please also have a simple start() / stop() public API. "async with" is only good for functional programming. If you want to go more of an object-oriented style, you tend to have start() and stop() methods in your classes, which will call start() & stop() (or close()) methods recursively on nested resources. So of the libraries (aiopg, I'm looking at you) don't support start/stop or open/close well.
I tend to slightly agree, but OTOH if asyncio had been designed to not schedule tasks automatically on __init__ I bet there would have been other users complaining that "why didn't task XX run?", or "why do tasks need a start() method, that is clunky!". You can't please everyone... Also, in task_list = run.all(foo(), foo(), foo()) As soon as you call foo(), you are instantiating a coroutine, which consumes memory, while the task may not even be scheduled for a long time (if you have 5000 potential tasks but only execute 10 at a time, for example). But if you do as Yuri suggested, you'll instead accept a function reference, foo, which is a singleton, you can have many foo references to the function, but they will only create coroutine objects when the task is actually about to be scheduled, so it's more efficient in terms of memory. -- Gustavo J. A. M. Carneiro Gambit Research "The universe is always one step beyond logic." -- Frank Herbert

Wouldn't calling __enter__ and __exit__ manually works for you ? I started coding begin() and stop(), but I removed them, as I couldn't find a use case for them. And what exactly is the use case that doesn't work with `async with` ? The whole point is to spot the boundaries of the tasks execution easily. If you start()/stop() randomly, it kinda defeat the purpose. It's a genuine question though. I can totally accept I overlooked a valid use case.
Well, ensure_future([schedule_immediatly=True]) and asyncio.create_task([schedule_immediatly=True] would take care of that. They are the entry point for the task creation and scheduling.
Yes but this has the benefit of accepting any awaitable, not just coroutine. You don't have to wonder what to pass, or which form. It's always the same. Too many APi are hard to understand because you never know if it accept a callback, a coroutine function, a coroutine, a task, a future... For the same reason, request.get() create and destroys a session every time. It's inefficient, but way easier to understand, and fits the majority of the use cases.
I made some test, and the memory consumption is indeed radically smaller if you just store references if you just compare it to the same unique raw coroutine. However, this is a rare case. It assumes that: - you have a lot of tasks - you have a max concurrency - the max concurrency is very small - most tasks reuse a similar combination of callables and parameters It's a very specific narrow case. Also, everything you store on the scope will be wrapped into a Future object no matter if it's scheduled or not, so that you can cancel it later. So the scale of the memory consumption is not as much. I didn't want to compromise the quality of the current API for the general case for an edge case optimization. On the other hand, this is a low hanging fruit and on plateforms such as raspi where asyncio has a lot to offer, it can make a big difference to shave up 20 of memory consumption of a specific workload. So I listened and implemented an escape hatch: import random import asyncio import ayo async def zzz(seconds): await asyncio.sleep(seconds) print(f'Slept for {seconds} seconds') @ayo.run_as_main() async def main(run_in_top): async with ayo.scope(max_concurrency=10) as run: for _ in range(10000): run.from_callable(zzz, 0.005) # or run.asap(zzz(0.005)) This would only lazily create the awaitable (here the coroutine) on scheduling. I see a 15% of memory saving for the WHOLE program if using `from_callable()`. So definitly a good feature to have, thank you. But again, and I hope Yuri is reading this because he will implement that for uvloop, and this will trickles down to asyncio, I think we should not compromise the main API for this. asyncio is hard enough to grok, and too many concepts fly around. The average Python programmer has been experienced way easier things from past Python encounter. If we want, one day, that asyncio is consider the clean AND easy way to do async, we need to work on the API. asyncio.run() is a step in the right direction (although again I wish we implemented that 2 years ago when I talked about it instead of telling me no). Now if we add nurseries, it should hide the rest of the complexity. Not add to it.

Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here. But if I have this right: I've been using asyncio a lot lately and have encountered this problem
several times. Imagine you want to do a lot of queries against a database, spawning 10000 tasks in parallel will probably cause a lot of them to fail.
async is not parallel -- all the tasks will be run in the same thread (Unless you explicitly spawn another thread), and only one task is running at once, and the task switching happens when the task specifically releases itself. If it matters in what order the tasks are performed, then you should not be using async. So why do queries fail with 10000 tasks? or ANY number? If the async DB access code is written right, a given query should not "await" unless it is in a safe state to do so. So what am I missing here??? What you need in a task pool of sorts, to limit concurrency and do only 20
requests in parallel.
still wrapping my head around the vocabulary, but async is not concurrent. If we were doing this synchronously, we wouldn't spawn 10000 threads using
10000 connections,
and threads aren't synchronous -- but they are concurrent.
we would use a thread pool with a limited number of threads and submit the jobs into its queue.
because threads ARE concurrent, and there is no advantage to having more threads than can actually run at once, and having many more does cause thread-switching performance issues. To me, tasks are (somewhat) logically analogous to threads.
kinda -- in the sense that they are run (and completed) in arbitrary order, But they are different, and that difference is key to this issue. As Yury expressed interest in this idea, there must be something I'm missing. What is it? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Jun 14, 2018 at 9:17 PM Chris Barker via Python-Dev < python-dev@python.org> wrote:
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here.
Vocabulary-wise 'queue depth' might be a suitable mental aid for what people actually want to limit. The practical issue is most likely something to do with hitting timeouts when trying to queue too much work onto a service. -- Joni Orponen

On 14Jun2018 1214, Chris Barker via Python-Dev wrote:
If the task isn't actually doing the work, but merely waiting for it to finish, then you can end up overloading the thing that *is* doing the task (e.g. the network interface, database server, other thread/process, file system, etc.). Single-threaded async is actually all about *waiting* - it provides a convenient model to do other tasks while you are waiting for the first (as well as a convenient model to indicate what should be done after it completes - there are two conveniences here). If the underlying thing you're doing *can* run in parallel, but becomes less efficient the more times you do it (for example, most file system operations fall into this category), you will want to limit how many tasks you *start*, not just how many you are waiting for. I often use semaphores for this when I need it, and it looks like asyncio.Semaphore() is sufficient for this: import asyncio task_limiter = asyncio.Semaphore(4) async def my_task(): await task_limiter.acquire() try: await do_db_request() finally: task_limiter.release() Cheers, Steve

On Thu, Jun 14, 2018 at 10:03 PM Steve Dower <steve.dower@python.org> wrote:
Yeah, a semaphore logically fits exactly but * I feel this API is somewhat clunky, even if you use an 'async with'. * my gut feeling is spawning a thousand tasks and having them all fighting over the same semaphore and scheduling is going to be much less efficient than a small number of tasks draining a queue.

On Thu, Jun 14, 2018 at 3:31 PM, Tin Tvrtković <tinchester@gmail.com> wrote:
Fundamentally, a Semaphore is a queue: https://github.com/python/cpython/blob/9e7c92193cc98fd3c2d4751c87851460a33b9... ...so the two approaches are more analogous than it might appear at first. The big difference is what objects are in the queue. For a web scraper, the options might be either a queue where each entry is a URL represented as a str, versus a queue where each entry is (effectively) a Task object with attached coroutine object. So I think the main differences you'll see in practice are: - a Task + coroutine aren't terribly big -- maybe a few kilobytes -- but definitely larger than a str; so the Semaphore approach will take more RAM. Modern machines have lots of RAM, so for many use cases this is still probably fine (50,000 tasks is really not that many). But there will certainly be some situations where the str queue fits in RAM but the Task queue doesn't. - If you create all those Task objects up front, then that front-loads a chunk of work (i.e., allocating all those objects!) that otherwise would be spread throughout the queue processing. So you'll see a noticeable pause up front before the code starts working. -n -- Nathaniel J. Smith -- https://vorpus.org

Other folks have already chimed in, so I'll be to the point. Try writing a simple asyncio web scraper (using maybe the aiohttp library) and create 5000 tasks for scraping different sites. My prediction is a whole lot of them will time out due to various reasons. Other responses inline. On Thu, Jun 14, 2018 at 9:15 PM Chris Barker <chris.barker@noaa.gov> wrote:
asyncio is mostly used for IO-heavy workloads (note the name). If you're doing IO in asyncio, it is most definitely parallel. The point of it is having a large number of open network connections at the same time.
Imagine you have a batch job you need to do. You need to fetch a million records from your database, and you can't use a query to get them all - you need a million individual "get" requests. Even if Python was infinitely fast, and your bandwidth was infinite, can your database handle opening a million new connections in parallel, in a very short time? Mine sure can't, even a few hundred extra connections would be a potential problem. So you want to do the work in chunks, but still not one by one.
and threads aren't synchronous -- but they are concurrent.
Using threads implies coupling threads with IO. Doing requests one at a time in a given thread. Generally called 'synchronous IO', as opposed to asynchronous IO/asyncio.
Weeell technically threads in CPython aren't really concurrent (when running Python bytecode), but for doing IO they are in practice. When doing IO, there absolutely is an advantage to using more threads than can run at once (in CPython only one thread running Python can run at once). You can test it out yourself by writing a synchronous web scraper (using maybe the requests library) and trying to scrape using a threadpool vs using a single thread. You'll find the threadpool version is much faster.

On Thu, Jun 14, 2018 at 8:14 PM, Chris Barker via Python-Dev < python-dev@python.org> wrote:
Excuse my ignorance (or maybe it's a vocabulary thing), but I'm trying to understand the problem here.
All tasks need resources, and bookkeeping for such tasks is likely to slow things down. More importantly, with an uncontrolled number of tasks you can require an uncontrolled use of resources, decreasing efficiency to levels well below that attainable with sensible conservation of resources. Imagine, if you will, a task that starts by allocating 1GB of memory. Would you want 10,000 of those?
participants (9)
-
Chris Barker
-
Gustavo Carneiro
-
Joni Orponen
-
Michel Desmoulin
-
Nathaniel Smith
-
Steve Dower
-
Steve Holden
-
Tin Tvrtković
-
Yury Selivanov