Howdy fellows, I had a thought of adding a global thread executor in python. It can use something like concurrent.futures, and be created on demand. There are times when I find myself needing to run just a single function in a thread, cause there's practically no other way. The function is a fast-acting one, and needs to run infrequently throughout the code. Let's take socket.getaddrinfo for example. Even in non-blocking mode, this function will still always block. Even asyncio solves it by running it on a different thread. Each program, library or piece of code that wants to use it in an asynchronous manner, will have to initialize a thread (with all the respective overhead), for the use of a single function. This can happen for plenty of libraries, each initializing their own thread for a one-night stand. If we would have had a global ThreadPoolExecutor, we could have use it exactly for that. The threads would be shared, the overhead would only occur once. Users of the executor will know that it's a limited resource that may be full at times, and as responsible programmers will not use it for infinite loops, clogging the whole system. Even if a programmer uses it irresponsibly (shame), we still have future.result(timeout) to cover us and issue a warning if we must. Any inputs would be appreciated, Bar Harel.
On Feb 15, 2020, at 05:40, Bar Harel <bzvi7919@gmail.com> wrote:
If we would have had a global ThreadPoolExecutor, we could have use it exactly for that. The threads would be shared, the overhead would only occur once. Users of the executor will know that it's a limited resource that may be full at times, and as responsible programmers will not use it for infinite loops, clogging the whole system.
You can already do this trivially in any application—just create an executor and store it as a global in some module, or attach it to some other global object like the config or the run loop, or whatever. Presumably the goal here is that having it come with Python would mean lots of third-party libraries would start using it. Similar to GCD (Grand Central Dispatch): its default dispatch queues would only be a minor convenience that you could trivially build yourself, but the real benefit is that they’re widely used by third-party ObjectiveC and Swift libraries because they’ve been there from the start and Apple has encouraged their use. The question is, if we added, say, a concurrent.futures.get_shared_thread_pool_executor function today, would people change all of the popular libraries to start using it? Probably not, because then they’d all have to start requiring Python 3.10. In which case we wouldn’t get the benefits. The solution to that is of course to have a backport on PyPI. But you can create that same library today and try to get libraries to start using it, and then it could be added to the stdlib once it’s clear there’s a lot of uptake and everyone is happy with the API. The only advantage I can see from putting it in the stdlib now along with creating that PyPI library is that it might be easier to proselytize for it. But I think you need to make the case that it really would be easy to proselytize for a stdlib feature with a backport, but hard with just a PyPI library. The fact that it wasn’t there from the start like GCD’s default queues were means people have already come up with other solutions and they might not want a different one. Plus, you have to propose a specific design and make sure everyone’s happy with that, because once it goes into the stdlib, its interface is fixed forever. Do people want just a single flat shared executor, or do they want to be able to specify different priority/QoS for tasks? (GCD provides five queues to its shared thread pool, not just one, so you can make sure your user-initiated request doesn’t get blocked by a bunch of bulk background requests.) Do we need a shared process pool executor also? Who controls the max thread count? (There are Java executors that provide a way for libs to increase it, but not decrease it below what the app wanted.) Do asyncio apps really want the same behavior from a global shared executor as GUI apps, non-asyncio network apps, games, etc.? Do servers and clients want the same behavior? And so on. If you’re only building a PyPI library, you can guess at all of this and see what people say, but if you add it to the stdlib you have to get it right the first time.
The question is, if we added, say, a concurrent.futures.get_shared_thread_pool_executor function today, would people change all of the popular libraries to start using it? Probably not, because then they’d all have to start requiring Python 3.10. In which case we wouldn’t get the benefits.
You can say it about every python feature. Using a new feature restricts you to the latest python version. Eventually it will trickle down.
The fact that it wasn’t there from the start like GCD’s default queues were means people have already come up with other solutions and they might not want a different one.
There are no other solutions to that problem. Up until now, people created their own solutions, and their own threads, out of necessity. They simply didn't have a choice. I've witnessed a large project which used many 3rd party packages, and had no less than 148 (!!!) threads, since every package had to create it's own thread, waiting in the background and doing nothing. Since I'm a backend developer, I can see it being highly used in my field. Metrics? Server statistics? Lot's of times it's a fire-and-forget mechanism, that just needs to execute stuff in the background. But heck, even using logging.QueueListener creates an unavoidable thread. Right now there's no standardized way to do so. I thought of concurrent.futures with a get_global_pool() as a simple solution, but did not know about GCD. We can try and implement something equivalent on a provisional basis, that newer packages will use. There's no need to port older code to use it, as it will slowly fade away.
once it goes into the stdlib, its interface is fixed forever.
Well, it just means we have some thinking to do. Rome wasn't built in a day. With all the benefits of creating a PyPi package, I'm afraid adoption rate won't be high if it won't be a standard. The solution to the issue can either be starting with stdlib on provisional, or developing and releasing under a well known developer's name, much like PyPa or aio-libs. These questions are eventually up for discussion. I'm just a single developer with a small perspective, and don't pretend to know the answer :-) Perhaps more devs will weigh in and give their opinion. Always happy to learn, Bar Harel.
So there's one reason to put this in the stdlib: so that stdlib modules (like logging.QueueListener which you mentioned) can use it. There should still a backport so that 3rd party packages have a simple way of using this for earlier Python versions. A PEP would be the best way to make progress with this -- it would both ensure that the API is widely usable and that it will be widely used (assuming the PyPI package is available for a wide range of versions, e.g. 3.5 and later). On Sat, Feb 15, 2020 at 3:12 PM Bar Harel <bzvi7919@gmail.com> wrote:
The question is, if we added, say, a
concurrent.futures.get_shared_thread_pool_executor function today, would people change all of the popular libraries to start using it? Probably not, because then they’d all have to start requiring Python 3.10. In which case we wouldn’t get the benefits.
You can say it about every python feature. Using a new feature restricts you to the latest python version. Eventually it will trickle down.
The fact that it wasn’t there from the start like GCD’s default queues were means people have already come up with other solutions and they might not want a different one.
There are no other solutions to that problem. Up until now, people created their own solutions, and their own threads, out of necessity. They simply didn't have a choice. I've witnessed a large project which used many 3rd party packages, and had no less than 148 (!!!) threads, since every package had to create it's own thread, waiting in the background and doing nothing.
Since I'm a backend developer, I can see it being highly used in my field. Metrics? Server statistics? Lot's of times it's a fire-and-forget mechanism, that just needs to execute stuff in the background. But heck, even using logging.QueueListener creates an unavoidable thread.
Right now there's no standardized way to do so. I thought of concurrent.futures with a get_global_pool() as a simple solution, but did not know about GCD. We can try and implement something equivalent on a provisional basis, that newer packages will use. There's no need to port older code to use it, as it will slowly fade away.
once it goes into the stdlib, its interface is fixed forever.
Well, it just means we have some thinking to do. Rome wasn't built in a day.
With all the benefits of creating a PyPi package, I'm afraid adoption rate won't be high if it won't be a standard. The solution to the issue can either be starting with stdlib on provisional, or developing and releasing under a well known developer's name, much like PyPa or aio-libs.
These questions are eventually up for discussion. I'm just a single developer with a small perspective, and don't pretend to know the answer :-)
Perhaps more devs will weigh in and give their opinion.
Always happy to learn, Bar Harel. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HGTACG... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
participants (3)
-
Andrew Barnert
-
Bar Harel
-
Guido van Rossum