[Python-Dev] Optionally using GMP to implement long if available

Mark Dickinson dickinsm at gmail.com
Tue Nov 4 21:59:11 CET 2008


On Tue, Nov 4, 2008 at 6:33 PM, Tim Peters <tim.peters at gmail.com> wrote:
> Whenever
> two digits are multiplied, the code intends to cast (at least) one of
> them to "twodigits" first (if you ever see a spot where this doesn't
> happen, it's a bug).  "twodigits" is typedef'ed to C long.  C89
> guarantees that a long holds at least 32 bits.

There are indeed one or two spots that seem to be missing a cast,
for example the line "rem -= hi*n" in inplace_divrem1.  I've fixed
all those I found, in the issue 4258 patch.

>> And for 32x32 -> 64, can't this simply be replaced by "(uint64_t)i * j",
> 1. That's C99, not C89, and therefore less portable.

Understood;  my thought was to use uint32_t and uint64_t for
digit and twodigits when available (with longs being stored in
base 2**30), falling back to the current ushort/ulong/base 2**15
otherwise.  On Unix, autoconf makes this easy by providing
macros like 'AC_TYPE_INT32_T', which makes sure that
int32_t is defined to be an integer type of exact width 32,
when available.

> 2. On platforms that support it, this is at least 64x64->64 multiplication,
>   potentially much more expensive than the 32x32->64 (or 31x31->62?)
>   flavor you /intend/ to move to.
>
> 3. There's no way to know exactly what compilers will do with this short of
>   staring at generated code.

Yes---I've done the staring for gcc, so I now have precisely *one*
data point, which is that various flavours of gcc on x86/x86_64
*are* clever enough to turn

  (uint64_t)my_uint32 * my_other_uint32

into the appropriate HW instruction.  Unfortunately I don't have
easy access to other compilers or platforms right now. :-(
Am still working on the benchmarking, but I'm definitely seeing
speedup on gcc/x86---between 10% and 100% depending
on the operations.

> FYI, in a previous life working in speech recognition, under
> Microsoft's Visual Studio 6 the only way we found to get at the
> Pentium's 32x32->64 HW ability efficiently was via using inline
> assembler.

Urk.  We definitely don't want to go there.  Though I guess this
is how packages like gmp and GP/Pari manage.

> C89).  That's why it's impossible here to write portable code in C
> that's also efficient.  Even what Python does now is vulnerable on the

But maybe it's possible to write portable code (by providing fallbacks)
that turns out to be efficient on the majority of mainstream systems?
The extent of the ifdef'ery in the patch is really rather small:  one
(compound) #ifdef in longintrepr.h for defining digit, twodigits, stwodigits
etc, and a couple more for the places where digits are read and written
in marshal.c.

>> I agree that very-long-integer optimizations probably don't really belong in
>> Python,
>
> Depends in part on whether Python can attract as many obsessed
> maintainers and porters for such gonzo algorithms as GMP attracts ;-)

Well, you can count me in. :)

Mark


More information about the Python-Dev mailing list