multiprocessing and physical CPU cores count
This is a follow up of a feature request which recently appeared on psutil bug tracker: https://code.google.com/p/psutil/issues/detail?id=427 I don't know whether the proposal makes sense for psutil per-se but it certainly made me think about multiprocessing.cpu_count() and the fact that it currently returns the number of virtual CPUs (physical + logical). Given that multiple processes cannot take any advantage of hyper threading technology then maybe it makes sense for multiprocessing to expose a physical_cpu_count() function in order to preemptively figure out how many processes to spawn. Same thing is discussed here: https://groups.google.com/forum/#!msg/nzpug/_5sFW9BEMQ4/Y4laXRNlXkMJ Thoughts? --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/
On Thu, 12 Sep 2013 20:59:46 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote:
This is a follow up of a feature request which recently appeared on psutil bug tracker: https://code.google.com/p/psutil/issues/detail?id=427
I don't know whether the proposal makes sense for psutil per-se but it certainly made me think about multiprocessing.cpu_count() and the fact that it currently returns the number of virtual CPUs (physical + logical).
Given that multiple processes cannot take any advantage of hyper threading technology
Of course they can. The CPU doesn't distinguish between different kinds of "threads", they can either belong to the same process or to different ones. Regards Antoine.
On Thu, Sep 12, 2013 at 12:10 PM, Antoine Pitrou <solipsis@pitrou.net>wrote:
This is a follow up of a feature request which recently appeared on
bug tracker: https://code.google.com/p/psutil/issues/detail?id=427
I don't know whether the proposal makes sense for psutil per-se but it certainly made me think about multiprocessing.cpu_count() and the fact
it currently returns the number of virtual CPUs (physical + logical).
Given that multiple processes cannot take any advantage of hyper
On Thu, 12 Sep 2013 20:59:46 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote: psutil that threading
technology
Of course they can. The CPU doesn't distinguish between different kinds of "threads", they can either belong to the same process or to different ones.
Regards
Antoine.
Antoine's claim is backed by a document written by Intel: http://software.intel.com/en-us/articles/performance-insights-to-intel-hyper.... Specifically, in the section "Software Use of Intel HT Technology".
On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
This is a follow up of a feature request which recently appeared on
bug tracker: https://code.google.com/p/psutil/issues/detail?id=427
I don't know whether the proposal makes sense for psutil per-se but it certainly made me think about multiprocessing.cpu_count() and the fact
it currently returns the number of virtual CPUs (physical + logical).
Given that multiple processes cannot take any advantage of hyper
On Thu, 12 Sep 2013 20:59:46 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote: psutil that threading
technology
Of course they can. The CPU doesn't distinguish between different kinds of "threads", they can either belong to the same process or to different ones.
Of course you're right, I'm sorry. I should have phrased my statement more carefully before sending the email. Then the question is whether having physical CPU cores count can be useful. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/
On Thu, 12 Sep 2013 21:26:07 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote:
On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
This is a follow up of a feature request which recently appeared on
bug tracker: https://code.google.com/p/psutil/issues/detail?id=427
I don't know whether the proposal makes sense for psutil per-se but it certainly made me think about multiprocessing.cpu_count() and the fact
it currently returns the number of virtual CPUs (physical + logical).
Given that multiple processes cannot take any advantage of hyper
On Thu, 12 Sep 2013 20:59:46 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote: psutil that threading
technology
Of course they can. The CPU doesn't distinguish between different kinds of "threads", they can either belong to the same process or to different ones.
Of course you're right, I'm sorry. I should have phrased my statement more carefully before sending the email. Then the question is whether having physical CPU cores count can be useful.
I suppose it doesn't hurt :-) I don't think it belongs specifically in multiprocessing, though. Perhaps in the platform module? (unless you want to contribute psutil to the stdlib?) Regards Antoine.
On Thu, Sep 12, 2013 at 9:32 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 12 Sep 2013 21:26:07 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote:
On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
This is a follow up of a feature request which recently appeared on
bug tracker: https://code.google.com/p/psutil/issues/detail?id=427
I don't know whether the proposal makes sense for psutil per-se but it certainly made me think about multiprocessing.cpu_count() and the fact
it currently returns the number of virtual CPUs (physical + logical).
Given that multiple processes cannot take any advantage of hyper
On Thu, 12 Sep 2013 20:59:46 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote: psutil that threading
technology
Of course they can. The CPU doesn't distinguish between different kinds of "threads", they can either belong to the same process or to different ones.
Of course you're right, I'm sorry. I should have phrased my statement more carefully before sending the email. Then the question is whether having physical CPU cores count can be useful.
I suppose it doesn't hurt :-) I don't think it belongs specifically in multiprocessing, though. Perhaps in the platform module?
I'd be +0.5 for multiprocessing because: - cpu_count() is already there - physical_cpu_count() will likely be used by multiprocessing users only ...but my main concern was first figuring out whether it might actually make sense to distinguish between virtual and physical CPUs in a real world app.
(unless you want to contribute psutil to the stdlib?)
That's something I'd be happy to do if there's general approval but I guess that's for another thread. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/
Am 12.09.2013 21:51, schrieb Giampaolo Rodola':
I'd be +0.5 for multiprocessing because:
- cpu_count() is already there - physical_cpu_count() will likely be used by multiprocessing users only
...but my main concern was first figuring out whether it might actually make sense to distinguish between virtual and physical CPUs in a real world app.
I would go one step further and expose the topology of the CPUs. It's much, much more complicated than just physical and logical CPUs. For example with Intel CPUs, two hyper-threading units have different registers but share the same L1 and L2 cache. All CPU core inside a physical processor share a common L3 cache. Multiple processor on machines with several processor slots have to communicate through QPI (QuickPath Interconnect). ccNUMA (cache coherent non-uniform memory access) ensures that memory barriers syncs these caches when a process uses multiple processors. Every processor has its own memory banks so 'remote' memory is more expensive to access. Other processors have a different internal structure. Some aren't ccNUMA ... Christian
On Thu, 12 Sep 2013 22:19:45 +0200 Christian Heimes <christian@python.org> wrote:
Am 12.09.2013 21:51, schrieb Giampaolo Rodola':
I'd be +0.5 for multiprocessing because:
- cpu_count() is already there - physical_cpu_count() will likely be used by multiprocessing users only
...but my main concern was first figuring out whether it might actually make sense to distinguish between virtual and physical CPUs in a real world app.
I would go one step further and expose the topology of the CPUs. It's much, much more complicated than just physical and logical CPUs.
I'm not sure what the point would be. From the point of the view of an application programmer, the CPU topology is an almost esoteric detail. This would be appropriate for a third-party "system information" package, IMO (with memory speed, number of PCIe channels, cache associativity, etc.). Regards Antoine.
From: Python-ideas [mailto:python-ideas- bounces+anikom15=gmail.com@python.org] On Behalf Of Antoine Pitrou Sent: Thursday, September 12, 2013 2:16 PM To: python-ideas@python.org Subject: Re: [Python-ideas] multiprocessing and physical CPU cores count
I'm not sure what the point would be. From the point of the view of an application programmer, the CPU topology is an almost esoteric detail. This would be appropriate for a third-party "system information" package, IMO (with memory speed, number of PCIe channels, cache associativity, etc.).
Regards
Antoine.
Isn't the whole point of a high-level language to be able to not have to know about the hardware?
On Sep 12, 2013, at 19:17, Westley Martínez <anikom15@gmail.com> wrote:
From: Python-ideas [mailto:python-ideas- bounces+anikom15=gmail.com@python.org] On Behalf Of Antoine Pitrou Sent: Thursday, September 12, 2013 2:16 PM To: python-ideas@python.org Subject: Re: [Python-ideas] multiprocessing and physical CPU cores count
I'm not sure what the point would be. From the point of the view of an application programmer, the CPU topology is an almost esoteric detail. This would be appropriate for a third-party "system information" package, IMO (with memory speed, number of PCIe channels, cache associativity, etc.).
Regards
Antoine.
Isn't the whole point of a high-level language to be able to not have to know about the hardware?
Most programmers won't care; they'll just use the default value for multiprocessing.Pool. But the implementation of multiprocessing, or any similar third-party module like pp, needs that information, so it can pick a good default value so the programmers don't have to. Also, very occasionally, you need to build a pool of processes manually. So if the module has the info, it might as well expose it.
Python 3.4 has os.cpu_count(). Victor Le 12 sept. 2013 21:52, "Giampaolo Rodola'" <g.rodola@gmail.com> a écrit :
On Thu, Sep 12, 2013 at 9:32 PM, Antoine Pitrou <solipsis@pitrou.net>
wrote:
On Thu, 12 Sep 2013 21:26:07 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote:
On Thu, Sep 12, 2013 at 9:10 PM, Antoine Pitrou <solipsis@pitrou.net>
wrote:
This is a follow up of a feature request which recently appeared on
On Thu, 12 Sep 2013 20:59:46 +0200 "Giampaolo Rodola'" <g.rodola@gmail.com> wrote: psutil
bug tracker: https://code.google.com/p/psutil/issues/detail?id=427
I don't know whether the proposal makes sense for psutil per-se
but it
certainly made me think about multiprocessing.cpu_count() and the fact that it currently returns the number of virtual CPUs (physical + logical).
Given that multiple processes cannot take any advantage of hyper threading technology
Of course they can. The CPU doesn't distinguish between different kinds of "threads", they can either belong to the same process or to different ones.
Of course you're right, I'm sorry. I should have phrased my statement more carefully before sending the email. Then the question is whether having physical CPU cores count can be useful.
I suppose it doesn't hurt :-) I don't think it belongs specifically in multiprocessing, though. Perhaps in the platform module?
I'd be +0.5 for multiprocessing because:
- cpu_count() is already there - physical_cpu_count() will likely be used by multiprocessing users only
...but my main concern was first figuring out whether it might actually make sense to distinguish between virtual and physical CPUs in a real world app.
(unless you want to contribute psutil to the stdlib?)
That's something I'd be happy to do if there's general approval but I guess that's for another thread.
--- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas
On 12.09.2013 21:51, Giampaolo Rodola' wrote:
On Thu, Sep 12, 2013 at 9:32 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Then the question is whether having physical CPU cores count can be useful.
I suppose it doesn't hurt :-) I don't think it belongs specifically in multiprocessing, though. Perhaps in the platform module?
I'd be +0.5 for multiprocessing because:
- cpu_count() is already there - physical_cpu_count() will likely be used by multiprocessing users only
...but my main concern was first figuring out whether it might actually make sense to distinguish between virtual and physical CPUs in a real world app.
I'm with Antoine here: both APIs would make more sense in the platform or os module. Victor mentioned that there already is an os.cpu_count() in Python 3.4, so perhaps add it there. Do you need C code for determining the physical count ?
(unless you want to contribute psutil to the stdlib?)
That's something I'd be happy to do if there's general approval but I guess that's for another thread.
I'd love to see psutils in the stdlib, but also be warned: once the code lives in the stdlib, a) making changes is difficult and adding new features as well, b) you are bound by the Python release cycle. For a package such psutil, it may actually be better to keep it outside the stdlib, since the outside world changes regularly and doesn't adhere to the Python release cycle or feature for patch level releases ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 13 2013)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 7 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On 12/09/2013 7:59pm, Giampaolo Rodola' wrote:
Given that multiple processes cannot take any advantage of hyper threading technology then maybe it makes sense for multiprocessing to expose a physical_cpu_count() function in order to preemptively figure out how many processes to spawn.
Do you have a reference? Wikipedia may not be reliable, but it seems to think otherwise: Hyper-threading works by duplicating certain sections of the processor— those that store the architectural state— but not duplicating the main execution resources. This allows a hyper-threading processor to appear as the usual "physical" processor and an extra "logical" processor to the host operating system (HTT-unaware operating systems see two "physical" processors), allowing the operating system to schedule two threads or processes simultaneously and appropriately. ^^^^^^^^^ -- Richard
On Thu, Sep 12, 2013 at 9:27 PM, Richard Oudkerk <shibturn@gmail.com> wrote:
On 12/09/2013 7:59pm, Giampaolo Rodola' wrote:
Given that multiple processes cannot take any advantage of hyper threading technology then maybe it makes sense for multiprocessing to expose a physical_cpu_count() function in order to preemptively figure out how many processes to spawn.
Do you have a reference? Wikipedia may not be reliable, but it seems to think otherwise:
Hyper-threading works by duplicating certain sections of the processor— those that store the architectural state— but not duplicating the main execution resources. This allows a hyper-threading processor to appear as the usual "physical" processor and an extra "logical" processor to the host operating system (HTT-unaware operating systems see two "physical" processors), allowing the operating system to schedule two threads or processes simultaneously and appropriately. ^^^^^^^^^
No, I was wrong. Please ignore that statement. I got confused by the name "hyper-threading" and erroneously thought it only affected threads. =) --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/
participants (9)
-
Andrew Barnert
-
Antoine Pitrou
-
Chris Kaynor
-
Christian Heimes
-
Giampaolo Rodola'
-
M.-A. Lemburg
-
Richard Oudkerk
-
Victor Stinner
-
Westley Martínez