> (b) Why limit coroutines? It's just another Python object and has no
operating resources associated with it. Perhaps your definition of
coroutine is different, and you are thinking of OS threads?
This was my primary concern with the proposed PEP. At the moment, it's rather trivial to create one million coroutines, and the total memory taken up by each individual coroutine object is very minimal compared to each OS thread.
There's also a practical use case for having a large number of coroutine objects, such as for asynchronously:
1) Handling a large number of concurrent clients on a continuously running web server that receives a significant amount of traffic.
2) Sending a large number of concurrent database transactions to run on a cluster of database servers.
I don't know that anyone is currently using production code that results in 1 million coroutine objects within the same interpreter at once, but something like this definitely scales over time. Arbitrarily placing a limit on the total number of coroutine objects doesn't make sense to me for that reason.
OS threads on the other hand take significantly more memory. From a recent (but entirely unrelated) discussion where the memory usage of threads was brought up, Victor Stinner wrote a program that demonstrated that each OS thread takes up approximately ~13.2kB on Linux, which I verified on kernel version 5.3.8. See
https://bugs.python.org/msg356596.
For comparison, I just wrote a similar program to compare the memory usage between 1M threads and 1M coroutines:
```
import asyncio
import threading
import sys
import os
def wait(event):
event.wait()
class Thread(threading.Thread):
def __init__(self):
super().__init__()
self.stop_event = threading.Event()
self.started_event = threading.Event()
def run(self):
self.started_event.set()
self.stop_event.wait()
def stop(self):
self.stop_event.set()
self.join()
def display_rss():
os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
async def test_mem_coros(count):
print("Coroutine memory usage before:")
display_rss()
coros = tuple(asyncio.sleep(0) for _ in range(count))
print("Coroutine memory usage after creation:")
display_rss()
await asyncio.gather(*coros)
print("Coroutine memory usage after awaiting:")
display_rss()
def test_mem_threads(count):
print("Thread memory usage before:")
display_rss()
threads = tuple(Thread() for _ in range(count))
print("Thread memory usage after creation:")
display_rss()
for thread in threads:
thread.start()
print("Thread memory usage after starting:")
for thread in threads:
thread.run()
print("Thread memory usage after running:")
display_rss()
for thread in threads:
thread.stop()
print("Thread memory usage after stopping:")
display_rss()
if __name__ == '__main__':
count = 1_000_000
arg = sys.argv[1]
if arg == 'threads':
test_mem_threads(count)
if arg == 'coros':
asyncio.run(test_mem_coros(count))
```
Here are the results:
1M coroutine objects:
Coroutine memory usage before:
VmRSS: 14800 kB
Coroutine memory usage after creation:
VmRSS: 651916 kB
Coroutine memory usage after awaiting:
VmRSS: 1289528 kB
1M OS threads:
Thread memory usage before:
VmRSS: 14816 kB
Thread memory usage after creation:
VmRSS: 4604356 kB
Traceback (most recent call last):
File "temp.py", line 60, in <module>
test_mem_threads(count)
File "temp.py", line 44, in test_mem_threads
thread.start()
File "/usr/lib/python3.8/threading.py", line 852, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
(Python version: 3.8)
(Linux kernel version: 5.13)
As is present in the results above, 1M OS threads can't even be ran at once, and the memory taken up just to create the 1M threads is ~3.6x more than it costs to concurrently await the 1M coroutine objects. Based on that, I think it would be reasonable to place a limit of 1M on the total number of OS threads. It seems unlikely that a system would be able to properly handle 1M threads at once anyways, whereas that seems entirely feasible with 1M coroutine objects. Especially on a high traffic server.