<div dir="ltr"><div>Other folks have already chimed in, so I'll be to the point. Try writing a simple asyncio web scraper (using maybe the aiohttp library) and create 5000 tasks for scraping different sites. My prediction is a whole lot of them will time out due to various reasons.<br></div><div><br></div><div>Other responses inline.<br></div><div><br><div class="gmail_quote"><div dir="ltr">On Thu, Jun 14, 2018 at 9:15 PM Chris Barker <<a href="mailto:chris.barker@noaa.gov">chris.barker@noaa.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">async is not parallel -- all the tasks will be run in the same thread (Unless you explicitly spawn another thread), and only one task is running at once, and the task switching happens when the task specifically releases itself.</blockquote><div><br></div><div>asyncio is mostly used for IO-heavy workloads (note the name). If you're doing IO in asyncio, it is most definitely parallel. The point of it is having a large number of open network connections at the same time.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">So why do queries fail with 10000 tasks? or ANY number? If the async DB access code is written right, a given query should not "await" unless it is in a safe state to do so.</div></div></div></blockquote><div><br></div><div>Imagine you have a batch job you need to do. You need to fetch a million records from your database, and you can't use a query to get them all - you need a million individual "get" requests. Even if Python was infinitely fast, and your bandwidth was infinite, can your database handle opening a million new connections in parallel, in a very short time? Mine sure can't, even a few hundred extra connections would be a potential problem. So you want to do the work in chunks, but still not one by one.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">and threads aren't synchronous -- but they are concurrent.</div></div></div></blockquote><div><br></div><div>Using threads implies coupling threads with IO. Doing requests one at a time in a given thread. Generally called 'synchronous IO', as opposed to asynchronous IO/asyncio.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> because threads ARE concurrent, and there is no advantage to having more threads than can actually run at once, and having many more does cause thread-switching performance issues.</div></div></div></div></blockquote><div><br></div><div>Weeell technically threads in CPython aren't really concurrent (when running Python bytecode), but for doing IO they are in practice. When doing IO, there absolutely is an advantage to using more threads than can run at once (in CPython only one thread running Python can run at once). You can test it out yourself by writing a synchronous web scraper (using maybe the requests library) and trying to scrape using a threadpool vs using a single thread. You'll find the threadpool version is much faster.<br></div></div></div></div>