[SciPy-User] Big performance hit when using frozen distributions on scipy 0.16.0

Ralf Gommers ralf.gommers at gmail.com
Fri Oct 28 15:03:19 EDT 2016


On Sat, Oct 29, 2016 at 6:37 AM, Nicolas Chopin <nicolas.chopin at ensae.fr>
wrote:

> Yes, as I have just said, I agree that it is the creation of the frozen
> dist that
> explains the difference.
>
> I do need to create a *lot* of frozen distributions, there is no way
> around that
> in what I do.
>

Whatever you can do with frozen distributions you can also do with the
regular non-frozen ones, so I doubt that that's true.


> Typically, one run may involve O(10^8) frozen distributions;
> for each of these I may either simulate a vector (of size 10^2-10^3), or
> compute
> the log-pdf of a vector of the same size, or both.
>

You haven't explained what's wrong with simply using the rvs() and logpdf()
methods from the distribution instances provided in the stats namespace.

Ralf



>
> On Fri, 28 Oct 2016 at 19:29 Evgeni Burovski <evgeny.burovskiy at gmail.com>
> wrote:
>
>> On Fri, Oct 28, 2016 at 7:53 PM, Nicolas Chopin <nicolas.chopin at ensae.fr>
>> wrote:
>> >  Hi list,
>> > I'm working on a package that does some complicate Monte Carlo
>> experiments.
>> > The package passes around frozen distributions quite a lot. Trying to
>> > understand why certain parts were so slow, I did a bit of profiling, and
>> > stumbled upon this:
>> >
>> >  > %timeit x = scipy.stats.norm.rvs(size=1000)
>> >> 10000 loops, best of 3: 49.3 µs per loop
>> >
>> >> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
>> >> 1000 loops, best of 3: 512 µs per loop
>> >
>> > So a x10 penalty when using a frozen dist, even if the size of the
>> simulated
>> > vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot
>> > replicate this problem on another machine with scipy 0.13.3 and Ubuntu
>> 14.04
>> > (there is a penalty, but it's much smaller).
>> >
>> > In the profiler, I can see that a lot of time is spent doing string
>> > operations (such as expand_tabs) in order to generate the doc. In the
>> > source, I see that this may depend on a certain -00 flag???
>> >
>> > I do realise that instantiating a frozen distribution requires some
>> argument
>> > checking and what not, but here it looks too expensive. For my package,
>> this
>> > amounts to hours spent on ... tab extensions?
>> >
>> > Anyway, I'd like to ask
>> > (a) is this a known problem? I could not find anything on-line about
>> this.
>> > (b) Is this going to be fixed in some future version of scipy?
>> > (c) is there a way to fix this with *this* version of scipy using this
>> flag
>> > mentioned in the source, and then how?
>> > (c) or should I instead re-define manually my own distributions objects?
>> > (it's really convenient for what I'm trying to do to define
>> distributions as
>> > objects with methods rvs, logpdf, and so on).
>> >
>> > Many thanks for reading this! :-)
>> > All the best
>>
>>
>> Why are you including the construction time into your timings? Surely,
>> if you use frozen distributions for some MC work, you're not
>> recreating frozen instances in hot loops?
>>
>>
>> In [4]: %timeit norm.rvs(size=100, random_state=123)
>> The slowest run took 142.68 times longer than the fastest. This could
>> mean that an intermediate result is being cached.
>> 10000 loops, best of 3: 74.2 µs per loop
>>
>> In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123)
>> The slowest run took 4.40 times longer than the fastest. This could
>> mean that an intermediate result is being cached.
>> 1000 loops, best of 3: 796 µs per loop
>>
>> In [6]: %timeit dist = norm()
>> The slowest run took 4.89 times longer than the fastest. This could
>> mean that an intermediate result is being cached.
>> 1000 loops, best of 3: 672 µs per loop
>>
>> > (b) Is this going to be fixed in some future version of scipy?
>> > (c) is there a way to fix this with *this* version of scipy using this
>> flag
>> > mentioned in the source, and then how?
>>
>> You could of course try reverting
>> https://github.com/scipy/scipy/pull/3245 for your local copy of scipy.
>> It went in into scipy 0.14, so this is the likely suspect.
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20161029/6c3b6883/attachment.html>


More information about the SciPy-User mailing list