[Numpy-discussion] Deprecate zipf distribution?
Charles R Harris
charlesr.harris at gmail.com
Sat Oct 7 11:29:12 EDT 2017
The current NumPy implementation of the truncated zipf distribution has
- Extremely poor performance when the parameter `a` is near 1. For
instance, when `a = 1.000001` a simple change in the implementation speeds
things up by a factor of 1,657. When the parameter is closer to 1, the
algorithm effectively hangs.
- Because the distribution is truncated, say to integers in the range of
int64, the parameter could be allowed to take all values > 0, even though
the untruncated series diverges. There is some indication that such values
of `a` can be useful in modeling because of the heavy distribution in the
Because fixing these problems will change the output stream, I suggest
implementing a truncated zeta distribution, which is an alternative name
for the same distribution, and deprecating the the zipf distribution.
Furthermore, rather than truncate at the value of C long, which varies,
truncate at max(int64), or some possibly smaller value, say 2**44, which
allows all integers up to that value to be realized with approximately
correct probabilities when using double precision for the intermediate
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion