[Python-Dev] Caching float(0.0)

Mon Oct 2 19:33:03 CEST 2006

Michael Hudson <mwh at python.net> wrote:
> "Martin v. Löwis" <martin at v.loewis.de> writes:
> > Kristján V. Jónsson schrieb:
> >> I can't see how this situation is any different from the re-use of
> >> low ints.  There is no fundamental law that says that ints below 100
> >> are more common than other, yet experience shows that  this is so,
> >> and so they are reused.
> >
> > There are two important differences:
> > 1. it is possible to determine whether the value is "special" in
> >    constant time, and also fetch the singleton value in constant
> >    time for ints; the same isn't possible for floats.
> 
> I don't think you mean "constant time" here do you?  I think most of
> the code posted so far has been constant time, at least in terms of
> instruction count, though some might indeed be fairly slow on some
> processors -- conversion from double to integer on the PowerPC
> involves a trip off to memory for example.  Even so, everything should
> be fairly efficient compared to allocation, even with PyMalloc.
> 
> > 2. it may be that there is a loss of precision in reusing an existing
> >    value (although I'm not certain that this could really happen).
> >    For example, could it be that two values compare successful in
> >    ==, yet are different values? I know this can't happen for
> >    integers, so I feel much more comfortable with that cache.
> 
> I think the only case is that the two zeros compare equal, which is
> unfortunate given that it's the most compelling value to cache...
> 
> I don't know a reliable and fast way to distinguish +0.0 and -0.0.

The same way one could handle the lookups quickly; cast the pointer to a
uint64 and dereference it.  For all non-extended floats (I don't know
the proper terminology, but their >64 bit precision is stored on the
processor, not in memory), this will disambiguate *which* value it is.
It may cause problems with NaNs and infities, but we aren't caching them,
so we don't care. The result of all this is that we can do the following
on Intel x86 platforms (replace with hex if desired)...

switch (*(uint64*(&fval))) {
    case 13845191154443747328ULL:
    case 13844628204490326016ULL:
    case 13844065254536904704ULL:
    case 13842939354630062080ULL:
    case 13841813454723219456ULL:
    case 13840687554816376832ULL:
    case 13839561654909534208ULL:
    case 13837309855095848960ULL:
    case 13835058055282163712ULL:
    case 13830554455654793216ULL:
    case 0ULL:
    case 4607182418800017408ULL:
    case 4611686018427387904ULL:
    case 4613937818241073152ULL:
    case 4616189618054758400ULL:
    case 4617315517961601024ULL:
    case 4618441417868443648ULL:
    case 4619567317775286272ULL:
    case 4620693217682128896ULL:
    case 4621256167635550208ULL:
        /*lookup in the table */
    default: break;
}

Each platform would need a new block depending on their endianness
mixing of float/uint64 (if any), as well as depending on their double
representations (as long as it conforms to IEEE764 fp doubles for these
21 values, they don't need a new one).

 - Josiah