hybrid implementation for PyLongObject (performance)

The idea is mixing `PyLongObject` with `Python 2's PyIntObject` implementation. For example, on a 64-bit platform, if (an integer >=-9223372036854775808 and <=9223372036854775807), PyLongObject uses a native C type `signed long` to represent it. People mostly use +-* operations, maybe using native int is faster, even including the cost of overflow check. If operation will overflow or other operations like **, we can transform native int to current form, and run in current code path. Regards

On Aug 11, 2019, at 19:01, malincns@163.com wrote:
I’m assuming you don’t know C and/or CPython internals well enough to try it out yourself, and that’s fine. But If you want to get someone else motivated enough to try, you should at least provide some evidence that it’s potentially worth it, beyond the fact that someone who doesn’t know the internals thinks _maybe_ it’ll be faster. Have you benchmarked 2.7 vs. 3.7 adding various sizes of integers? (From a quick test with Pythonista, 3.6 seems to actually be _faster_ if the total is under 1<<30, although it does get slower beyond that, and especially between 1<<60 and 1<<63.) Or looked up the history from when PyIntObject was removed in 3.0? (There must have been lots of discussion on the -dev or py3k lists about performance before deciding to do it, especially so soon after all the work to make int/long interoperate seamlessly for later 2.x.) Or looked through the implementation as far as you can understand it to spot things that look slow? (longintrepr.h isn’t that complicated; longobject.c is pretty hairy, but at least it’s well-commented.) By the way, I suspect you wouldn’t want two separate structures, but rather just replace the int32 *digits with a union { int32 *digits; sintptr_t smallnum; }, and size 1 or -1 means you use smallnum instead of digits. Although I’m not sure how you’d do your “convert to present format for other operations or detected overflow” thing (and presumably for mixed operations, like adding a small int to a big one) that way, but I think you’re better off just building the array of digits for the temporary rather than a whole int object anyway.

在 19-8-13 1:27, Andrew Barnert via Python-ideas 写道:
Thanks for your guidance. To be honest, it's a challenge for me. It would be nice if an experienced person is willing to make this attempt.
size 1 or -1 means you use smallnum instead of digits
Or use this value, this won't consume more memory than current implementation: Py_SIZE(a) 32-bit platform 64-bit platform LONG_MAX 4 bytes singed 8 bytes singed LONG_MAX-1 2 bytes singed 4 bytes singed

On Aug 13, 2019, at 06:45, malincns <malincns@163.com> wrote:
It would be a significant amount of work just to implement it far enough to benchmark. So again, I don’t think it’s likely anyone will volunteer if you don’t provide some reason to expect it to pan out. Even if you can’t edit hairy C code like longobject.c, you can write simple Python code and benchmark it, search the list archives, etc., just as well as anyone else. If that’s not worth your time to do, why do you expect it would be worth the time of anyone else? And without that, the only way anyone is likely to try this is if they get really bored and happen to want to play with the internals of PyLongObject this weekend or something.
On platforms where long is 32 bits even though it’s a 64-bit platform with 64-bit pointers (like 64-bit Windows), this would effectively waste 32 bits. You’d also need to implement a check in the autoconf scripts and then create a new typedef and new constants that are used only for this purpose. And why? This is exactly the reason C offers types like intptr_t.

On Aug 11, 2019, at 19:01, malincns@163.com wrote:
I’m assuming you don’t know C and/or CPython internals well enough to try it out yourself, and that’s fine. But If you want to get someone else motivated enough to try, you should at least provide some evidence that it’s potentially worth it, beyond the fact that someone who doesn’t know the internals thinks _maybe_ it’ll be faster. Have you benchmarked 2.7 vs. 3.7 adding various sizes of integers? (From a quick test with Pythonista, 3.6 seems to actually be _faster_ if the total is under 1<<30, although it does get slower beyond that, and especially between 1<<60 and 1<<63.) Or looked up the history from when PyIntObject was removed in 3.0? (There must have been lots of discussion on the -dev or py3k lists about performance before deciding to do it, especially so soon after all the work to make int/long interoperate seamlessly for later 2.x.) Or looked through the implementation as far as you can understand it to spot things that look slow? (longintrepr.h isn’t that complicated; longobject.c is pretty hairy, but at least it’s well-commented.) By the way, I suspect you wouldn’t want two separate structures, but rather just replace the int32 *digits with a union { int32 *digits; sintptr_t smallnum; }, and size 1 or -1 means you use smallnum instead of digits. Although I’m not sure how you’d do your “convert to present format for other operations or detected overflow” thing (and presumably for mixed operations, like adding a small int to a big one) that way, but I think you’re better off just building the array of digits for the temporary rather than a whole int object anyway.

在 19-8-13 1:27, Andrew Barnert via Python-ideas 写道:
Thanks for your guidance. To be honest, it's a challenge for me. It would be nice if an experienced person is willing to make this attempt.
size 1 or -1 means you use smallnum instead of digits
Or use this value, this won't consume more memory than current implementation: Py_SIZE(a) 32-bit platform 64-bit platform LONG_MAX 4 bytes singed 8 bytes singed LONG_MAX-1 2 bytes singed 4 bytes singed

On Aug 13, 2019, at 06:45, malincns <malincns@163.com> wrote:
It would be a significant amount of work just to implement it far enough to benchmark. So again, I don’t think it’s likely anyone will volunteer if you don’t provide some reason to expect it to pan out. Even if you can’t edit hairy C code like longobject.c, you can write simple Python code and benchmark it, search the list archives, etc., just as well as anyone else. If that’s not worth your time to do, why do you expect it would be worth the time of anyone else? And without that, the only way anyone is likely to try this is if they get really bored and happen to want to play with the internals of PyLongObject this weekend or something.
On platforms where long is 32 bits even though it’s a 64-bit platform with 64-bit pointers (like 64-bit Windows), this would effectively waste 32 bits. You’d also need to implement a check in the autoconf scripts and then create a new typedef and new constants that are used only for this purpose. And why? This is exactly the reason C offers types like intptr_t.
participants (4)
-
Andrew Barnert
-
Ma Lin
-
malincns
-
malincns@163.com