Setuptools detection of host cpuinfo and/or header file availablity
Hi Yoval, I had a question about setuptools that Wes suggested I run by you...do you know if there's there's a canonical (and, hopefully by extension, reliable and cross-platform...) way of checking for host cpu capabilities and header file availability in a python setuptools script to configure a build? This is for potentially including code using SSE intrinsics into pandas. Stephen
I believe Numexpr does this (not sure it's 'reliable', but it's cross platform) I am think this is runtime - but could also be in setup On Apr 26, 2013, at 5:12 PM, Stephen Lin <swlin@post.harvard.edu> wrote:
Hi Yoval,
I had a question about setuptools that Wes suggested I run by you...do you know if there's there's a canonical (and, hopefully by extension, reliable and cross-platform...) way of checking for host cpu capabilities and header file availability in a python setuptools script to configure a build?
This is for potentially including code using SSE intrinsics into pandas.
Stephen _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Well, I can do a runtime cpuid check but there's some expense for that (not for the check so much, but in making the executable include both code paths into the same executable and making it possible to select between them at runtime...). Also, that still leaves the header availability issue. On Fri, Apr 26, 2013 at 5:36 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I believe Numexpr does this (not sure it's 'reliable', but it's cross platform)
I am think this is runtime - but could also be in setup
On Apr 26, 2013, at 5:12 PM, Stephen Lin <swlin@post.harvard.edu> wrote:
Hi Yoval,
I had a question about setuptools that Wes suggested I run by you...do you know if there's there's a canonical (and, hopefully by extension, reliable and cross-platform...) way of checking for host cpu capabilities and header file availability in a python setuptools script to configure a build?
This is for potentially including code using SSE intrinsics into pandas.
Stephen _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Hi stephen, Nothing I know of that you can't google just as well as me or better. There's PyCPUID, which explicitly reports SSEx support, you could document it in the README and use it if it's available at build time. By "including intrinsics" do you mean setting compiler optimization flags that preclude older cpus from the binary ? or are you planning to embed asm in the cython files? I wouldn't put it past you. icc can automatically bloat the binary with parallel paths and choose at runtime like you describe IIRC. Yoval On 04/27/2013 12:41 AM, Stephen Lin wrote:
Well, I can do a runtime cpuid check but there's some expense for that (not for the check so much, but in making the executable include both code paths into the same executable and making it possible to select between them at runtime...). Also, that still leaves the header availability issue.
On Fri, Apr 26, 2013 at 5:36 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I believe Numexpr does this (not sure it's 'reliable', but it's cross platform)
I am think this is runtime - but could also be in setup
On Apr 26, 2013, at 5:12 PM, Stephen Lin <swlin@post.harvard.edu> wrote:
Hi Yoval,
I had a question about setuptools that Wes suggested I run by you...do you know if there's there's a canonical (and, hopefully by extension, reliable and cross-platform...) way of checking for host cpu capabilities and header file availability in a python setuptools script to configure a build?
This is for potentially including code using SSE intrinsics into pandas.
Stephen _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
OK thanks anyway; haven't found anything authoritative online unfortunately. Also, not assembly but SSE intrinsics: http://stackoverflow.com/questions/11228855/header-files-for-simd-intrinsics The headers have data types that correspond to the SIMD registers and intrinsic functions that correspond to instructions, but it's not inline asm...instruction scheduling and register allocation are done for you. I don't know if all compilers we care about ship these headers though. Stephen On Fri, Apr 26, 2013 at 6:16 PM, yoval p. <yoval@gmx.com> wrote:
Hi stephen,
Nothing I know of that you can't google just as well as me or better. There's PyCPUID, which explicitly reports SSEx support, you could document it in the README and use it if it's available at build time.
By "including intrinsics" do you mean setting compiler optimization flags that preclude older cpus from the binary ? or are you planning to embed asm in the cython files? I wouldn't put it past you.
icc can automatically bloat the binary with parallel paths and choose at runtime like you describe IIRC.
Yoval
On 04/27/2013 12:41 AM, Stephen Lin wrote:
Well, I can do a runtime cpuid check but there's some expense for that (not for the check so much, but in making the executable include both code paths into the same executable and making it possible to select between them at runtime...). Also, that still leaves the header availability issue.
On Fri, Apr 26, 2013 at 5:36 PM, Jeff Reback <jeffreback@gmail.com> wrote:
I believe Numexpr does this (not sure it's 'reliable', but it's cross platform)
I am think this is runtime - but could also be in setup
On Apr 26, 2013, at 5:12 PM, Stephen Lin <swlin@post.harvard.edu> wrote:
Hi Yoval,
I had a question about setuptools that Wes suggested I run by you...do you know if there's there's a canonical (and, hopefully by extension, reliable and cross-platform...) way of checking for host cpu capabilities and header file availability in a python setuptools script to configure a build?
This is for potentially including code using SSE intrinsics into pandas.
Stephen _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
icc can automatically bloat the binary with parallel paths and choose at runtime like you describe IIRC.
Also, yeah, I've heard that it does that, but I don't trust a compiler to do this optimally :D how could it know where to place the check optimally? You don't want to do redundant checks within tight loops but you also don't want to create too many parallel code paths; there's also ABI/linking issues if it duplicates entire functions... Stephen
Conventional wisdom says that these days trying to outdo optimizing compilers is a fool's errand. Do you have numbers showing that manually coding at the instruction level can do significantly better then the compiler? If so, you're probably a modern day [John_Henry](https://en.wikipedia.org/wiki/John_Henry_(folklore)), In which case try not to drop dead at the end of a long and agonizing python packaging nightmare, as we'd like to see more of your tricks in pandas in the future. Yoval On 04/27/2013 02:14 AM, Stephen Lin wrote:
icc can automatically bloat the binary with parallel paths and choose at runtime like you describe IIRC.
Also, yeah, I've heard that it does that, but I don't trust a compiler to do this optimally :D how could it know where to place the check optimally? You don't want to do redundant checks within tight loops but you also don't want to create too many parallel code paths; there's also ABI/linking issues if it duplicates entire functions...
Stephen
The compiler won't use the intrinsics on its own, unfortunately, since it doesn't know about the alignment and buffer size guarantees. I will be fixing this in llvm/clang soon enough :) Stephen On Fri, Apr 26, 2013 at 7:46 PM, yoval p. <yoval@gmx.com> wrote:
Conventional wisdom says that these days trying to outdo optimizing compilers is a fool's errand. Do you have numbers showing that manually coding at the instruction level can do significantly better then the compiler?
If so, you're probably a modern day [John_Henry](https://en.wikipedia.org/wiki/John_Henry_(folklore)), In which case try not to drop dead at the end of a long and agonizing python packaging nightmare, as we'd like to see more of your tricks in pandas in the future.
Yoval
On 04/27/2013 02:14 AM, Stephen Lin wrote:
icc can automatically bloat the binary with parallel paths and choose at runtime like you describe IIRC.
Also, yeah, I've heard that it does that, but I don't trust a compiler to do this optimally :D how could it know where to place the check optimally? You don't want to do redundant checks within tight loops but you also don't want to create too many parallel code paths; there's also ABI/linking issues if it duplicates entire functions...
Stephen
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
Also, here's the data :) https://github.com/pydata/pandas/issues/3146 On Fri, Apr 26, 2013 at 7:55 PM, Stephen Lin <swlin@post.harvard.edu> wrote:
The compiler won't use the intrinsics on its own, unfortunately, since it doesn't know about the alignment and buffer size guarantees.
I will be fixing this in llvm/clang soon enough :)
Stephen
On Fri, Apr 26, 2013 at 7:46 PM, yoval p. <yoval@gmx.com> wrote:
Conventional wisdom says that these days trying to outdo optimizing compilers is a fool's errand. Do you have numbers showing that manually coding at the instruction level can do significantly better then the compiler?
If so, you're probably a modern day [John_Henry](https://en.wikipedia.org/wiki/John_Henry_(folklore)), In which case try not to drop dead at the end of a long and agonizing python packaging nightmare, as we'd like to see more of your tricks in pandas in the future.
Yoval
On 04/27/2013 02:14 AM, Stephen Lin wrote:
icc can automatically bloat the binary with parallel paths and choose at runtime like you describe IIRC.
Also, yeah, I've heard that it does that, but I don't trust a compiler to do this optimally :D how could it know where to place the check optimally? You don't want to do redundant checks within tight loops but you also don't want to create too many parallel code paths; there's also ABI/linking issues if it duplicates entire functions...
Stephen
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org http://mail.python.org/mailman/listinfo/pandas-dev
participants (3)
-
Jeff Reback -
Stephen Lin -
yoval p.