The current PyLong implementation represents arbitrary precision integers in units of 15 or 30 bits. I presume the purpose is to avoid overflow in addition , subtraction and multiplication. But compilers these days offer intrinsics that allow one to access the overflow flag, and to obtain the result of 64 bit multiplication as a 128 bit number. Or at least on x86-64, which is the dominant platform. Any reason why it is not done? If it is only because no one bothers, I may be able to do it.
2017-07-03 6:52 GMT+02:00 Siyuan Ren netheril96@gmail.com:
The current PyLong implementation represents arbitrary precision integers in units of 15 or 30 bits. I presume the purpose is to avoid overflow in addition , subtraction and multiplication. But compilers these days offer intrinsics that allow one to access the overflow flag, and to obtain the result of 64 bit multiplication as a 128 bit number.
The question is the performance. Is it fast? :-)
You can try to write a patch and run a benchmark.
See for example http://pyperformance.readthedocs.io/ for benchmarks.
Victor
On Mon, Jul 3, 2017 at 5:52 AM, Siyuan Ren netheril96@gmail.com wrote:
The current PyLong implementation represents arbitrary precision integers in units of 15 or 30 bits. I presume the purpose is to avoid overflow in addition , subtraction and multiplication. But compilers these days offer intrinsics that allow one to access the overflow flag, and to obtain the result of 64 bit multiplication as a 128 bit number. Or at least on x86-64, which is the dominant platform. Any reason why it is not done?
Portability matters, so any use of these intrinsics would likely also have to be accompanied by fallback code that doesn't depend on them, as well as some buildsystem complexity to figure out whether those intrinsics are supported or not. And then the Objects/longobject.c would suffer in terms of simplicity and readability, so there would have to be some clear gains to offset that. Note that the typical Python workload does not involve thousand-digit integers: what would matter would be performance of smaller integers, and it seems conceivable that 64-bit limbs would speed up those operations simply because so many more integers would become single-limb and so there would be more opportunities to take fast paths, but there would need to be benchmarks demonstrating that.
Oh, and you'd have to rewrite the power algorithm, which currently depends on the size of a limb in bytes being a multiple of 5. :-)
-- Mark
On 05/07/2017 20:05, Mark Dickinson wrote:
Oh, and you'd have to rewrite the power algorithm, which currently depends on the size of a limb in bytes being a multiple of 5. :-)
What is a limb, as my search foo has let me down?
-- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language.
Mark Lawrence
This email has been checked for viruses by AVG. http://www.avg.com
On Thu, Jul 6, 2017 at 5:33 AM, Mark Lawrence via Python-Dev
python-dev@python.org wrote:
On 05/07/2017 20:05, Mark Dickinson wrote:
Oh, and you'd have to rewrite the power algorithm, which currently depends on the size of a limb in bytes being a multiple of 5. :-)
What is a limb, as my search foo has let me down?
A thing that has a bunch of digits, but fits inside a machine word.
https://gmplib.org/manual/Nomenclature-and-Types.html
ChrisA
On Wed, Jul 5, 2017 at 12:05 PM Mark Dickinson dickinsm@gmail.com wrote:
On Mon, Jul 3, 2017 at 5:52 AM, Siyuan Ren netheril96@gmail.com wrote:
The current PyLong implementation represents arbitrary precision integers in units of 15 or 30 bits. I presume the purpose is to avoid overflow in addition , subtraction and multiplication. But compilers these days offer intrinsics that allow one to access the overflow flag, and to obtain the result of 64 bit multiplication as a 128 bit number. Or at least on x86-64, which is the dominant platform. Any reason why it is not done?
Portability matters, so any use of these intrinsics would likely also have to be accompanied by fallback code that doesn't depend on them, as well as some buildsystem complexity to figure out whether those intrinsics are supported or not. And then the Objects/longobject.c would suffer in terms of simplicity and readability, so there would have to be some clear gains to offset that. Note that the typical Python workload does not involve thousand-digit integers: what would matter would be performance of smaller integers, and it seems conceivable that 64-bit limbs would speed up those operations simply because so many more integers would become single-limb and so there would be more opportunities to take fast paths, but there would need to be benchmarks demonstrating that.
Oh, and you'd have to rewrite the power algorithm, which currently depends on the size of a limb in bytes being a multiple of 5. :-)
-- Mark
When I pushed to get us to adopt 30-bit digits instead of 15-bit digits, I hoped it could also happen on 32-bit x86 builds as the hardware has fast 32bit multiply -> 64 result support. But it made the configuration more difficult and IIRC would've increase the memory use of the PyLong type for single digit numbers on 32-bit platforms so we settled on moving to 30 bits only for 64-bit platforms (which were obviously going to become the norm).
A reasonable digit size for hardware that supports 128bit results of 64bit multiplies would be 60 bits (to keep our multiple of 5 bits logic working). But I doubt you will see any notable performance gains in practical applications by doing so. Sure, numbers in the 1-4 billion range are slightly less efficient today, but those are not likely to appear in hot loops in a typical application. Microbenchmarks alone should not be used to make this decision.
remember: We had native integer support in Python 1 & 2 via the old PyInt type. In Python 3 we ditched that in favor of PyLong everywhere. This was a performance hit for the sake of proper high level language simplicity. We've already regained all of that since then in other areas of the interpreter.
-gps