[Python-Dev] Optionally using GMP to implement long if available

Tim Peters tim.peters at gmail.com
Tue Nov 4 19:33:00 CET 2008

Hey, Mark -- let's establish some background here first.  It's a fact
that the product of two N-bit integers can require 2N bits, and also a
fact that lots of HW supports NxN->2N bit integer multiplication

However, it's unfortunately also a fact that standard C has no
corresponding concept:  "*" in C always means NxN->N multiplication
(single-width product, with possibility of overflow).  I don't know
whether C99 improved this situation -- I know it was proposed to add
some "double-width integer product" /functions/, but I don't know
whether that was adopted.  I do know that "*" remained "single-width".

[Tim Peters]
>> Note that while 32x32->64 multiply is supported by x86 HW, that
>> doesn't mean there's a uniform way to get at this HW capability across
>> C compilers.  In contrast, (at least) efficient HW 15x15->30 multiply
>> is universally spelled in C via "i*j" :-)

['Mark Dickinson]
> If we're talking about standards and portability, doesn't "i*j" fail
> on those (nonexistent?) platforms where the 'int' type is only 16-bits?
> Shouldn't this be something like "(long)i * j" instead?

Sorry, I should have made type declarations explicit.  In Python, the
basic long building block is "digit", which is typedef'ed to C
unsigned short.  C89 guarantees this holds at least 16 bits.  Whenever
two digits are multiplied, the code intends to cast (at least) one of
them to "twodigits" first (if you ever see a spot where this doesn't
happen, it's a bug).  "twodigits" is typedef'ed to C long.  C89
guarantees that a long holds at least 32 bits.

So C guarantees that we're doing (at least) 32x32->32 multiplication
whenever you see code like

    digit i, j;
    twodigits k;

    k = (twodigits)i * j;

In particular, the (at least) 32x32->32 C89 guarantees for that is at
least 15x15->30, and the latter is all that longobject.c intends to
rely on.  Along with the cast to twodigits, this is achieved across
all conforming C compilers simply by using infix "*".  The code from
1990 still works fine, on everything from cell phones to archaic Cray

> And for 32x32 -> 64, can't this simply be replaced by "(uint64_t)i * j",
> where uint64_t is as in C99?  I'd hope that most compilers would
> manage to turn this into the appropriate 32x32-bit hardware multiply.

1. That's C99, not C89, and therefore less portable.

2. On platforms that support it, this is at least 64x64->64 multiplication,
   potentially much more expensive than the 32x32->64 (or 31x31->62?)
   flavor you /intend/ to move to.

3. There's no way to know exactly what compilers will do with this short of
   staring at generated code.

FYI, in a previous life working in speech recognition, under
Microsoft's Visual Studio 6 the only way we found to get at the
Pentium's 32x32->64 HW ability efficiently was via using inline
assembler.  For example, using various MSVC spellings of "64-bit int"
instead for the inputs usually generated external calls to a
long-winded C library "piece by piece" multiplication routine (which,
at the time, emulated 64x64->128 multiplication, then threw away the
high 64 bits).

Again, the fundamental problem here is the disconnect between what
some HW is capable of and what C allows to express (at least through
C89).  That's why it's impossible here to write portable code in C
that's also efficient.  Even what Python does now is vulnerable on the
efficiency count:  on some weird platforms, "long" is 64 bits, and so
multiplying a pair of twodigits incurs the expense of  (usually
non-native) 64x64->64 multiplication.

> I agree that very-long-integer optimizations probably don't really belong in
> Python,

Depends in part on whether Python can attract as many obsessed
maintainers and porters for such gonzo algorithms as GMP attracts ;-)

> but this patch should also provide significant benefits for short
> and medium-sized integers.  I guess I need to go away and do some
> benchmarking...

If you can /get at/ HW 32x32->64 int multiply, of course that would be faster.

More information about the Python-Dev mailing list