[Python-Dev] Caching float(0.0)

Nick Maclaren nmm1 at cus.cam.ac.uk
Tue Oct 3 11:12:04 CEST 2006

"Terry Reedy" <tjreedy at udel.edu> wrote:
> For true floating point measurements (of temperature, for instance), 
> 'integral' measurements (which are an artifact of the scale used (degrees F 
> versus C versus K)) should generally be no more common than other realized 
> measurements.

Not quite, but close enough.  A lot of algorithms use a conversion to
integer, or some of the values are actually counts (e.g. in statistics),
which makes them a bit more likely.  Not enough to get excited about,
in general.

> Thirty years ago, a major stat package written in Fortran (BMDP) required 
> that all data be stored as (Fortran 4-byte) floats for analysis.  So a 
> column of yes/no or male/female data would be stored as 0.0/1.0 or perhaps 
> 1.0/2.0.  That skewed the distribution of floats.  But Python and, I hope, 
> Python apps, are more modern than that.

And SPSS and Genstat and others - now even Excel ....

> Float caching strikes me a a good subject for cookbook recipies, but not, 
> without real data and a willingness to slightly screw some users, for the 
> default core code.

Yes.  It is trivial (if tedious) to add analysis code - the problem
is finding suitable representative applications.  That was always
my difficulty when I was analysing this sort of thing - and still
is when I need to do it!

> Nick Craig-Wood <nick at craig-wood.com> wrote:
> For my application caching 0.0 is by far the most important. 0.0 has
> ~200,000 references - the next highest reference count is only about ~200.

Yes.  All the experience I have ever seen over the past 4 decades
confirms that is the normal case, with the exception of floating-point
representations that have a missing value indicator.

Even in IEEE 754, infinities and NaN are rare unless the application
is up the spout.  There are claims that a lot of important ones have
a lot of NaNs and use them as missing values but, despite repeated
requests, none of the people claiming that have ever provided an
example.  There are some pretty solid grounds for believing that
those claims are not based in fact, but are polemic.

Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

More information about the Python-Dev mailing list