[SciPy-Dev] Boost for stats

Sam Wallan samwallan at icloud.com
Wed Feb 17 21:09:22 EST 2021


Hello,

I’ve been working on a spreadsheet that compares Boost and SciPy. I looked at statistical distributions, special functions, and ODE solvers. Here’s the google sheets link: 

https://docs.google.com/spreadsheets/d/1zVaau6k1_0yQNW107D81RVCirWEN8sXwcYaWj2g8UNY/edit?usp=sharing

I’ve left it on suggestion mode with that sharing link, so if anyone has any thoughts please feel free to leave a comment. It looks like Boost may have a lot to add!

Regards, 

Sam



> On Feb 15, 2021, at 4:48 AM, scipy-dev-request at python.org wrote:
> 
> Send SciPy-Dev mailing list submissions to
> 	scipy-dev at python.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://mail.python.org/mailman/listinfo/scipy-dev
> or, via email, send a message with subject or body 'help' to
> 	scipy-dev-request at python.org
> 
> You can reach the person managing the list at
> 	scipy-dev-owner at python.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-Dev digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Boost for stats (Neal Becker)
>   2. Re: Boost for stats (Hans Dembinski)
>   3. Re: Boost for stats (Ralf Gommers)
>   4. Re: Boost for stats (Neal Becker)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 15 Feb 2021 07:23:38 -0500
> From: Neal Becker <ndbecker2 at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Boost for stats
> Message-ID:
> 	<CAG3t+pHTnLa+EHL5G=_Esvi1unvYO0+DNnv8RxGKryuTS+jBUg at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> I have been using   (and it's predecessor before it,
> boost::python) to package c++ code for python use for many years,
> including some of boost libraries.
> pybind11 is easy to use and is much better than e.g., cython for
> packaging c++ code.  pybind11 is also header-only.
> 
> I would also like to call attention for anyone interested in
> scientific software and c++ to a wonderful library (header-only),
> xtensor
> https://xtensor.readthedocs.io/en/latest/
> 
> On Mon, Feb 15, 2021 at 7:15 AM Hans Dembinski <hans.dembinski at gmail.com> wrote:
>> 
>> 
>>> On 15. Feb 2021, at 08:26, Andrew Nelson <andyfaff at gmail.com> wrote:
>>> 
>>> My questions would be:
>>> 
>>> - how portable is the boost code in general?
>> 
>> It is very portable. The core goal of Boost is to offer implementations with quality and portability on par with the C++ standard library implementations. Non-portable extensions are sometimes used to speed up things, but there is always a standard compliant vanilla version. In practice, maintainers test portability with CI on Windows, OSX, Linux, using various versions of gcc, clang, msvc, intel, see e.g.
>> 
>> https://github.com/boostorg/math/blob/develop/.github/workflows/ci.yml
>> 
>> and the Boost build farm from the days before free CI for OSS was easily available,
>> 
>> https://www.boost.org/development/tests/master/developer/move.html
>> 
>> Not all compilers/platforms are fully compliant, of course. Boost uses workarounds to combat that and submits bug reports on the compiler bug trackers.
>> 
>>> - how easy is it to install the library.
>> 
>> As Nicholas mentioned, Boost.Math (and Boost.Histogram) is header-only, so it is sufficient to include the headers.
>> 
>> Best regards,
>> Hans
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
> 
> 
> 
> -- 
> Those who don't understand recursion are doomed to repeat it
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 15 Feb 2021 13:35:25 +0100
> From: Hans Dembinski <hans.dembinski at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Boost for stats
> Message-ID: <13411649-82A4-4EC1-A58C-FAA3DDFF11D1 at gmail.com>
> Content-Type: text/plain;	charset=us-ascii
> 
> 
>> On 15. Feb 2021, at 01:47, Warren Weckesser <warren.weckesser at gmail.com> wrote:
>> 
>> * The Boost histogram library might provide some benefits over the
>>  existing NumPy and SciPy options.  (Hans Dembinski, the author
>>  of the histrogram library, has already commented in this email
>>  thread.)
> 
> I would happily support this. We currently offer a Python front-end to Boost.Histogram
> https://github.com/scikit-hep/boost-histogram
> which includes a numpy.histogram compatible interface.
> 
> Switching to Boost.Histogram may offer performance benefits, see
> https://boost-histogram.readthedocs.io/en/latest/notebooks/PerformanceComparison.html
> 
> Compared to np.histogram we saw a 1.7 times increase - single threaded, more if multiple threads are used. Compared to np.histogram2d we saw a 11 times increase. These numbers should probably be checked more carefully before decisions are made.
> 
> Boost.Histogram offers generalised histograms with arbitrary accumulators per cell, so it could also replace the implementations of https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html and friends.
> 
> Best regards,
> Hans
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 15 Feb 2021 13:41:51 +0100
> From: Ralf Gommers <ralf.gommers at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Boost for stats
> Message-ID:
> 	<CABL7CQjYZh0CyA6Kx5FULw2KaYMmdrLbm0Jecztc5+4z+r8OJg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> On Mon, Feb 15, 2021 at 1:35 PM Hans Dembinski <hans.dembinski at gmail.com>
> wrote:
> 
>> 
>>> On 15. Feb 2021, at 01:47, Warren Weckesser <warren.weckesser at gmail.com>
>> wrote:
>>> 
>>> * The Boost histogram library might provide some benefits over the
>>>  existing NumPy and SciPy options.  (Hans Dembinski, the author
>>>  of the histrogram library, has already commented in this email
>>>  thread.)
>> 
>> I would happily support this. We currently offer a Python front-end to
>> Boost.Histogram
>> https://github.com/scikit-hep/boost-histogram
>> which includes a numpy.histogram compatible interface.
>> 
>> Switching to Boost.Histogram may offer performance benefits, see
>> 
>> https://boost-histogram.readthedocs.io/en/latest/notebooks/PerformanceComparison.html
>> 
>> Compared to np.histogram we saw a 1.7 times increase - single threaded,
>> more if multiple threads are used. Compared to np.histogram2d we saw a 11
>> times increase. These numbers should probably be checked more carefully
>> before decisions are made.
>> 
>> Boost.Histogram offers generalised histograms with arbitrary accumulators
>> per cell, so it could also replace the implementations of
>> https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html
>> and friends.
>> 
> 
> That would be really nice. binned_statistic is currently pure Python, and
> can be a performance hotspot (I've seen multiple cases of that in dealing
> with image and geospatial data).
> 
> Cheers,
> Ralf
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210215/5a972b62/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 4
> Date: Mon, 15 Feb 2021 07:47:48 -0500
> From: Neal Becker <ndbecker2 at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Boost for stats
> Message-ID:
> 	<CAG3t+pH-Eq2KfEwRbN0UVRSNmdSkY-DmhD=C-Tvo6bnPKWNH_w at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
> 
> One thing I've missed with the current scipy histogram is the ability
> to do 'online' or 'incremental' collection of the histogram data.  For
> this reason I have written my own histogram code.  I am often
> collecting data from monte-carlo simulations and want to accumulate
> stats from data that arrives in batches.
> I don't know if boost-histogram supports this but if so I would find
> this very welcome.
> 
> On Mon, Feb 15, 2021 at 7:42 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>> 
>> 
>> 
>> On Mon, Feb 15, 2021 at 1:35 PM Hans Dembinski <hans.dembinski at gmail.com> wrote:
>>> 
>>> 
>>>> On 15. Feb 2021, at 01:47, Warren Weckesser <warren.weckesser at gmail.com> wrote:
>>>> 
>>>> * The Boost histogram library might provide some benefits over the
>>>>  existing NumPy and SciPy options.  (Hans Dembinski, the author
>>>>  of the histrogram library, has already commented in this email
>>>>  thread.)
>>> 
>>> I would happily support this. We currently offer a Python front-end to Boost.Histogram
>>> https://github.com/scikit-hep/boost-histogram
>>> which includes a numpy.histogram compatible interface.
>>> 
>>> Switching to Boost.Histogram may offer performance benefits, see
>>> https://boost-histogram.readthedocs.io/en/latest/notebooks/PerformanceComparison.html
>>> 
>>> Compared to np.histogram we saw a 1.7 times increase - single threaded, more if multiple threads are used. Compared to np.histogram2d we saw a 11 times increase. These numbers should probably be checked more carefully before decisions are made.
>>> 
>>> Boost.Histogram offers generalised histograms with arbitrary accumulators per cell, so it could also replace the implementations of https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic.html and friends.
>> 
>> 
>> That would be really nice. binned_statistic is currently pure Python, and can be a performance hotspot (I've seen multiple cases of that in dealing with image and geospatial data).
>> 
>> Cheers,
>> Ralf
>> 
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
> 
> 
> 
> -- 
> Those who don't understand recursion are doomed to repeat it
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
> 
> 
> ------------------------------
> 
> End of SciPy-Dev Digest, Vol 208, Issue 13
> ******************************************



More information about the SciPy-Dev mailing list