> Yes, as I have just said, I agree that it is the creation of the frozen
> dist that
> explains the difference.
> I do need to create a *lot* of frozen distributions, there is no way
> around that
> in what I do.

Whatever you can do with frozen distributions you can also do with the
regular non-frozen ones, so I doubt that that's true.

> Typically, one run may involve O(10^8) frozen distributions;
> for each of these I may either simulate a vector (of size 10^2-10^3), or
> compute
> the log-pdf of a vector of the same size, or both.

You haven't explained what's wrong with simply using the rvs() and logpdf()
methods from the distribution instances provided in the stats namespace.


>> >  Hi list,
>> > I'm working on a package that does some complicate Monte Carlo
>> experiments.
>> > The package passes around frozen distributions quite a lot. Trying to
>> > understand why certain parts were so slow, I did a bit of profiling, and
>> > stumbled upon this:
>> >
>> >  > %timeit x = scipy.stats.norm.rvs(size=1000)
>> >> 10000 loops, best of 3: 49.3 µs per loop
>> >
>> >> %timeit dist = scipy.stats.norm(); x = dist.rvs(size=1000)
>> >> 1000 loops, best of 3: 512 µs per loop
>> >
>> > So a x10 penalty when using a frozen dist, even if the size of the
>> simulated
>> > vector is 1000. This is using scipy 0.16.0 on Ubuntu 16.04. I cannot
>> > replicate this problem on another machine with scipy 0.13.3 and Ubuntu
>> 14.04
>> > (there is a penalty, but it's much smaller).
>> >
>> > In the profiler, I can see that a lot of time is spent doing string
>> > operations (such as expand_tabs) in order to generate the doc. In the
>> > source, I see that this may depend on a certain -00 flag???
>> >
>> > I do realise that instantiating a frozen distribution requires some
>> argument
>> > checking and what not, but here it looks too expensive. For my package,
>> this
>> > amounts to hours spent on ... tab extensions?
>> >
>> > Anyway, I'd like to ask
>> > (a) is this a known problem? I could not find anything on-line about
>> this.
>> > (b) Is this going to be fixed in some future version of scipy?
>> > (c) is there a way to fix this with *this* version of scipy using this
>> flag
>> > mentioned in the source, and then how?
>> > (c) or should I instead re-define manually my own distributions objects?
>> > (it's really convenient for what I'm trying to do to define
>> distributions as
>> > objects with methods rvs, logpdf, and so on).
>> >
>> > Many thanks for reading this! :-)
>> > All the best
>> Why are you including the construction time into your timings? Surely,
>> if you use frozen distributions for some MC work, you're not
>> recreating frozen instances in hot loops?
>> In [4]: %timeit norm.rvs(size=100, random_state=123)
>> The slowest run took 142.68 times longer than the fastest. This could
>> mean that an intermediate result is being cached.
>> 10000 loops, best of 3: 74.2 µs per loop
>> In [5]: %timeit dist = norm(); dist.rvs(size=100, random_state=123)
>> The slowest run took 4.40 times longer than the fastest. This could
>> mean that an intermediate result is being cached.
>> 1000 loops, best of 3: 796 µs per loop
>> In [6]: %timeit dist = norm()
>> The slowest run took 4.89 times longer than the fastest. This could
>> mean that an intermediate result is being cached.
>> 1000 loops, best of 3: 672 µs per loop
>> > (b) Is this going to be fixed in some future version of scipy?
>> > (c) is there a way to fix this with *this* version of scipy using this
>> flag
>> > mentioned in the source, and then how?
>> You could of course try reverting
>> https://github.com/scipy/scipy/pull/3245 for your local copy of scipy.
>> It went in into scipy 0.14, so this is the likely suspect.
