SIGCHECK() in longobject.c

Hello,
In Objects/longobject.c, there's the SIGCHECK() macro which periodically checks for signals when doing long integer computations (divisions, multiplications). It does so by messing with the _Py_Ticker variable.
It was added in 1991 under the title "Many small changes", and I suppose it was useful back then.
However, nowadays long objects are ridiculously fast, witness for example:
$ ./py3k/python -m timeit -s "a=eval('3'*10000+'5');b=eval('8'*6000+'7')" "str(a//b)" 1000 loops, best of 3: 1.47 msec per loop
Can we remove this check, or are there people doing million-digits calculations they want to interrupt using Control-C ?
Regards
Antoine.

On Sun, Oct 18, 2009 at 9:01 PM, Antoine Pitrou solipsis@pitrou.net wrote:
In Objects/longobject.c, there's the SIGCHECK() macro which periodically checks for signals when doing long integer computations (divisions, multiplications). It does so by messing with the _Py_Ticker variable.
It was added in 1991 under the title "Many small changes", and I suppose it was useful back then.
However, nowadays long objects are ridiculously fast, witness for example:
$ ./py3k/python -m timeit -s "a=eval('3'*10000+'5');b=eval('8'*6000+'7')" "str(a//b)" 1000 loops, best of 3: 1.47 msec per loop
Can we remove this check, or are there people doing million-digits calculations they want to interrupt using Control-C ?
Yes, I suspect there are. Though you don't need millions of digits for a single operation to take a noticeable amount of time: try str(10**100000), for example.
Is there a benefit to removing the check?
Mark

Mark Dickinson <dickinsm <at> gmail.com> writes:
Yes, I suspect there are. Though you don't need millions of digits for a
single
operation to take a noticeable amount of time: try str(10**100000), for example.
Is there a benefit to removing the check?
Apart from casual cleanup, the reason I'm asking is that I'm experimenting with removing _Py_Ticker, and I want to know what I need to emulate :-)
Antoine.

On Sun, Oct 18, 2009 at 3:01 PM, Antoine Pitrou solipsis@pitrou.net wrote:
Can we remove this check, or are there people doing million-digits calculations they want to interrupt using Control-C ?
I sometimes do million-digits calculations that I want to interrupt using Control-C.
(particularly when I didn't *intend* to do a million-digits calculation... ;) )
-- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com

Daniel Stutzbach <daniel <at> stutzbachenterprises.com> writes:
I sometimes do million-digits calculations that I want to interrupt using
Control-C.(particularly when I didn't *intend* to do a million-digits calculation... ;) )--
Sure, but it's no different than doing, e.g.: list(range(100000000)).sort()
(don't try this, it just made by computer slow down to a crawl and I had to kill -9 the Python interpreter)
The question is whether there is still any reason to special-case long objects and not, say, list.sort() or re.sub().
Regards
Antoine.

On Sun, Oct 18, 2009 at 5:46 PM, Antoine Pitrou solipsis@pitrou.net wrote:
Daniel Stutzbach <daniel <at> stutzbachenterprises.com> writes:
I sometimes do million-digits calculations that I want to interrupt using
Control-C.(particularly when I didn't *intend* to do a million-digits calculation... ;) )--
Sure, but it's no different than doing, e.g.: list(range(100000000)).sort()
That's a good point, although I can't recall the last time I accidently created a painfully large list. I can recall the last time I started a painfully large integer computation.
Being able to stop the interpretter with Control-C instead of kill -9 is a minor convenience, though. I could live without it. (Although I can't speak for everyone, of course)
-- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com

On Sun, Oct 18, 2009 at 11:46 PM, Antoine Pitrou solipsis@pitrou.net wrote:
Sure, but it's no different than doing, e.g.: list(range(100000000)).sort()
(don't try this, it just made by computer slow down to a crawl and I had to kill -9 the Python interpreter)
Maybe you were running out of RAM? On a 64-bit machine, list(range(10**8)) takes over 3 Gb.
For me, x = list(range(10**8)) takes around 6 seconds, and then x.sort() takes around 2 seconds. This is on a machine with 16 Gb of RAM. (Though I do seem to recall that timsort is extra fast for already sorted lists: O(n) rather than O(n log n)?)
So I'd say that it *is* different. A million digit integer takes less than half a megabyte of RAM, so a single operation can be horribly slow long before memory limitations are reached.
I'd prefer to keep the SIGCHECK unless there's a really compelling advantage to getting rid of it.
Mark

On Sun, Oct 18, 2009 at 9:01 PM, Antoine Pitrou solipsis@pitrou.net wrote:
Can we remove this check, or are there people doing million-digits calculations they want to interrupt using Control-C ?
By the way, here's an example of an *almost* real-life use of million digit calculations.
For an elementary number theory course that I taught a while ago, there was an associated (optional) computer lab, where the students used Python to investigate various ideas, conjectures, examples, etc. One of the less open-ended questions might have been[1] something like:
"On August 23rd, 2008 a computer at UCLA found the first example of a prime with more than 10 million digits: p = 2**43112609-1. Find
(1) the exact number of digits in p, when written out in decimal (2) the last 100 digits of p (3) the first 100 digits of p."
It's wonderfully easy to get answers to these questions with Python:
import math e = 43112609 p = 2**e - 1 ndigits = int(math.ceil(math.log10(p))) last_100_digits = '{:0100d}'.format(p % 10**100) first_100_digits = str(p // 10**(ndigits-100))
Getting the first 100 digits takes a good few seconds; the other operations are faster.
But if you (not unreasonably) try to compute str(p), you'll find it's impossibly slow, and it's very handy that it's possible to interrupt that calculation, attempt to understand *why* it's slow, and then try different methods.
(And yes, there are better ways to get both the first and last hundred digits.)
Mark
[1] Might have been, but wasn't. Hence the *almost*. If I'd been teaching that course this semester I'd almost certainly have included something like this.

[Mark Dickinson]
By the way, here's an example of an *almost* real-life use of million digit calculations.
For an elementary number theory course that I taught a while ago, there was an associated (optional) computer lab, where the students used Python to investigate various ideas, conjectures, examples, etc. One of the less open-ended questions might have been[1] something like:
"On August 23rd, 2008 a computer at UCLA found the first example of a prime with more than 10 million digits: p = 2**43112609-1. Find
(1) the exact number of digits in p, when written out in decimal (2) the last 100 digits of p (3) the first 100 digits of p."
It's wonderfully easy to get answers to these questions with Python:
...
But if you (not unreasonably) try to compute str(p), you'll find it's impossibly slow, and it's very handy that it's possible to interrupt that calculation, attempt to understand *why* it's slow, and then try different methods.
Don't want to hijack this thread, but this is the kind of use case justifying keeping the 3-argument pow in the decimal module. People "playing" with number theory questions can learn a bag of tricks to worm around that non-decimal arithmetic can make it inconvenient & slow to get answers about decimal digits, but most people -- and especially beginners -- find life easier when doing calculations in decimal to begin with. Then they can use obvious methods, and not get sidetracked by wondering why they take forever to finish.
Although, to be fair, I started
decimal.getcontext().prec = 20000000 p10 = pow(decimal.Decimal(2), 43112609)
before I started typing this, and am still waiting for it to finish ;-)
OTOH,
decimal.getcontext().prec = 200 pow(decimal.Decimal(2), 43112609) # get the first 100 digits (& then some)
and
pow(decimal.Decimal(2), 43112609, 10**100) - 1 # get the last 100 digits
both appear to work instantaneously.

On Mon, Oct 19, 2009 at 9:58 PM, Tim Peters tim.peters@gmail.com wrote:
Don't want to hijack this thread, but this is the kind of use case justifying keeping the 3-argument pow in the decimal module. People "playing" with number theory questions can learn a bag of tricks to worm around that non-decimal arithmetic can make it inconvenient & slow to get answers about decimal digits, but most people -- and especially beginners -- find life easier when doing calculations in decimal to begin with. Then they can use obvious methods, and not get sidetracked by wondering why they take forever to finish.
That's true. I wouldn't relish the task of trying to teach the decimal module at the same time as introducing students to Python and to elementary number theory, though. It's not the most friendly module to get started with.
Although, to be fair, I started
decimal.getcontext().prec = 20000000 p10 = pow(decimal.Decimal(2), 43112609)
before I started typing this, and am still waiting for it to finish ;-)
Hmm, yes. The decimal module isn't (currently) well set up for this sort of calculation, mostly because almost every operation is implemented by converting to binary, invoking the appropriate int/long arithmetic operation, and converting back. Since the conversion is O(n^2), even addition and multiplication of Decimal instances end up being O(n^2) operations at the moment, instead of getting the linear and Karatsuba (resp.) running times that they deserve.
(The exception is rounding operations, which don't do any decimal <-> binary operations, but operate directly on the coefficient in its representation as a string of decimal digits.)
This high-precision inefficiency could easily be fixed by using a dedicated 'decimal natural number' extension type for the Decimal coefficient, stored internally in base a suitable power of 10. I think this may be worth considering seriously. I'm not proposing this as an alternative to the huge task of rewriting the entire decimal module in C, by the way; it would be more a stop-gap measure that would be easy to implement, would slightly increase efficiency for normal operations, and vastly increase efficiency for high-precision operations.
OTOH,
decimal.getcontext().prec = 200 pow(decimal.Decimal(2), 43112609) # get the first 100 digits (& then some)
and
pow(decimal.Decimal(2), 43112609, 10**100) - 1 # get the last 100 digits
both appear to work instantaneously.
Right. Low precision Decimal operations should be fine. I don't really see the advantage of the second operation over a simple
pow(2, 43112609, 10**100)
though.
By the way, despite the above use-case, and despite the fact that I use it frequently, I still don't believe 3-argument pow belongs in the core. But that's probably a discussion topic for Python 4000.
Mark

Mark Dickinson wrote:
This high-precision inefficiency could easily be fixed by using a dedicated 'decimal natural number' extension type for the Decimal coefficient, stored internally in base a suitable power of 10. I think this may be worth considering seriously. I'm not proposing this as an alternative to the huge task of rewriting the entire decimal module in C, by the way; it would be more a stop-gap measure that would be easy to implement, would slightly increase efficiency for normal operations, and vastly increase efficiency for high-precision operations.
Didn't you already start work on that concept with the deccoeff patch?
Cheers, Nick.

On Tue, Oct 20, 2009 at 11:49 AM, Nick Coghlan ncoghlan@gmail.com wrote:
Mark Dickinson wrote:
This high-precision inefficiency could easily be fixed by using a dedicated 'decimal natural number' extension type for the Decimal coefficient, stored internally in base a suitable power of 10. [...]
Didn't you already start work on that concept with the deccoeff patch?
I did: the code can be seen at:
http://svn.python.org/view/sandbox/trunk/decimal/decimal_in_c/
This code defines a Deccoeff type as above, providing base 10 natural number arithmetic, along with a skeletal _Decimal type. The work on Deccoeff is pretty much complete, but the _Decimal type is only just a beginning; the original intent was to slowly translate the Decimal code into C and move it into the _Decimal type.
The code was working a few months ago (with all Decimal tests passing), but there have been some changes and bugfixes since then. I might try to resurrect that code, dropping the _Decimal type and just concentrating on Deccoeff.
Mark

On Tue, Oct 20, 2009 at 11:49 AM, Nick Coghlan ncoghlan@gmail.com wrote:
Mark Dickinson wrote:
This high-precision inefficiency could easily be fixed by using a dedicated 'decimal natural number' extension type for the Decimal coefficient, stored internally in base a suitable power of 10. [...]
Didn't you already start work on that concept with the deccoeff patch?
I did: the code can be seen at:
http://svn.python.org/view/sandbox/trunk/decimal/decimal_in_c/
This code defines a Deccoeff type as above, providing base 10 natural number arithmetic, along with a skeletal _Decimal type. The work on Deccoeff is pretty much complete, but the _Decimal type is only just a beginning; the original intent was to slowly translate the Decimal code into C and move it into the _Decimal type.
The code was working a few months ago (with all Decimal tests passing), but there have been some changes and bugfixes since then. I might try to resurrect that code, dropping the _Decimal type and just concentrating on Deccoeff.
My only concern about this is the effect it would have on IronPython, Jython, PyPy, and other alternate implementations that use the stdlib.

On Tue, Oct 20, 2009 at 3:50 PM, Eric Smith eric@trueblade.com wrote:
The code was working a few months ago (with all Decimal tests passing), but there have been some changes and bugfixes since then. I might try to resurrect that code, dropping the _Decimal type and just concentrating on Deccoeff.
My only concern about this is the effect it would have on IronPython, Jython, PyPy, and other alternate implementations that use the stdlib.
Yes, that worries me a bit, too. I have the same worry with the idea of rewriting the entire decimal module in C.
The Deccoeff type is very simple, though. It would be easy to create a pure Python version of it, and then do something like:
try: from _decimal import Deccoeff # try to get C version except ImportError: from deccoeff import Deccoeff # else use Python fallback code.
Mark

On Tue, Oct 20, 2009 at 07:57, Mark Dickinson dickinsm@gmail.com wrote:
On Tue, Oct 20, 2009 at 3:50 PM, Eric Smith eric@trueblade.com wrote:
The code was working a few months ago (with all Decimal tests passing), but there have been some changes and bugfixes since then. I might try to resurrect that code, dropping the _Decimal type
and
just concentrating on Deccoeff.
My only concern about this is the effect it would have on IronPython, Jython, PyPy, and other alternate implementations that use the stdlib.
Yes, that worries me a bit, too. I have the same worry with the idea of rewriting the entire decimal module in C.
The Deccoeff type is very simple, though. It would be easy to create a pure Python version of it, and then do something like:
try: from _decimal import Deccoeff # try to get C version except ImportError: from deccoeff import Deccoeff # else use Python fallback code.
And this is why you shouldn't have to worry. As long as the tests exercise both the pure Python and C versions then the other VMs are covered.
-Brett

Brett Cannon wrote:
On Tue, Oct 20, 2009 at 07:57, Mark Dickinson <dickinsm@gmail.com mailto:dickinsm@gmail.com> wrote: The Deccoeff type is very simple, though. It would be easy to create a pure Python version of it, and then do something like:
try: from _decimal import Deccoeff # try to get C version except ImportError: from deccoeff import Deccoeff # else use Python fallback code.
And this is why you shouldn't have to worry. As long as the tests exercise both the pure Python and C versions then the other VMs are covered.
We need to worry at least a little bit, as the pure Python version doesn't exist at this point in time.
Cheers, Nick.
participants (7)
-
Antoine Pitrou
-
Brett Cannon
-
Daniel Stutzbach
-
Eric Smith
-
Mark Dickinson
-
Nick Coghlan
-
Tim Peters