[Numpy-discussion] Random int64 and float64 numbers

Thu Nov 5 22:28:04 EST 2009

On Thu, Nov 5, 2009 at 7:53 PM, <josef.pktd at gmail.com> wrote:

> On Thu, Nov 5, 2009 at 9:23 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> >
> >
> > On Thu, Nov 5, 2009 at 7:04 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Thu, Nov 5, 2009 at 6:36 PM, Charles R Harris
> >> <charlesr.harris at gmail.com> wrote:
> >> >
> >> >
> >> > On Thu, Nov 5, 2009 at 4:26 PM, David Warde-Farley <
> dwf at cs.toronto.edu>
> >> > wrote:
> >> >>
> >> >> On 5-Nov-09, at 4:54 PM, David Goldsmith wrote:
> >> >>
> >> >> > Interesting thread, which leaves me wondering two things: is it
> >> >> > documented
> >> >> > somewhere (e.g., at the IEEE site) precisely how many *decimal*
> >> >> > mantissae
> >> >> > are representable using the 64-bit IEEE standard for float
> >> >> > representation
> >> >> > (if that makes sense);
> >> >>
> >> >> IEEE-754 says nothing about decimal representations aside from how to
> >> >> round when converting to and from strings. You have to provide/accept
> >> >> *at least* 9 decimal digits in the significand for single-precision
> >> >> and 17 for double-precision (section 5.6). AFAIK implementations will
> >> >> vary in how they handle cases where a binary significand would yield
> >> >> more digits than that.
> >> >>
> >> >
> >> > I believe that was the argument for the extended precision formats.
> The
> >> > givien number of decimal digits is sufficient to recover the same
> float
> >> > that
> >> > produced them if a slightly higher precision is used in the
> conversion.
> >> >
> >> > Chuck
> >>
> >> >From the discussion for the floating point representation, it seems
> that
> >> a uniform random number generator would have a very coarse grid
> >> in the range for example -1e30 to +1e30 compared to interval -0.5,0.5.
> >>
> >> How many points can be represented by a float in [-0.5,0.5] compared
> >> to [1e30, 1e30+1.]?
> >> If I interpret this correctly, then there are as many floating point
> >> numbers
> >> in [0,1] as in [1,inf), or am I misinterpreting this.
> >>
> >> So how does a PRNG handle a huge interval of uniform numbers?
> >>
> >
> > There are several implementations, but the ones I'm familiar with reduce
> to
> > scaling. If the rng produces random unsigned integers, then the range of
> > integers is scaled to the interval [0,1). The variations involve explicit
> > scaling (portable) or bit twiddling of the IEEE formats. In straight
> forward
> > scaling some ranges of the random integers may not map 1-1, so the unused
> > bits are masked off first; if you want doubles you only need 52 bits,
> etc.
> > For bit twiddling there is an implicit 1 in the mantissa, so the basic
> range
> > works out to [1,2), but that can be fixed by subtracting 1 from the
> result.
> > Handling larger ranges than [0,1) just involves another scaling.
>
> So, since this is then a discrete distribution, what is the number of
> points
> in the support? (My guess would be 2**52, but I don't know much about
> numerical representations.)
>
> This would be the largest set of integers that could be generated
> without gaps in the distribution, and would determine the grid size
> for floating point random variables. (?)
>
>
Yes.

> for Davids example:
> low, high = -1e307, 1e307
> np.random.uniform(low, high, 100) # much more reasonable
>
> this would imply a grid size of
> >>> 2*1e307/2.**52
> 4.4408920985006261e+291
>
> or something similar. (floating points are not very dense in the real
> line.)
>
>
Yes, or rather, they are more or less logarithmically distributed. Floats
are basically logarithms base 2 with a mantissa of fixed precision. Actual
logarithm code uses the exponent and does a correction to the mantissa,
i.e., takes the logarithm base two of numbers in the range [1,2).

Random integers are treated differently. To get random integers in a given
range, say [0,100], the bitstream produced by the rng would be broken up
into 7 bit chunks [0, 127] that are used in a sampling algorithm that loops
until a value in the range is produced. So any range of integers can be
produced. Code for that comes with the Mersenne Twister, but I don't know if
numpy uses it.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091105/fcb6d58b/attachment.html>