Mailman 3 Add the imath module - Python-ideas

Add the imath module

Serhiy Storchaka

12 Jul 2018 12 Jul '18

11:55 a.m.

What are your thoughts about adding a new imath module for integer mathematics? It could contain the following functions: * factorial(n) Is just moved from the math module, but non-integer types are rejected. Currently math.factorial() accepts also integer floats like 3.0. It looks to me, the rationale was that at the time when math.factorial() was added, all function in the math module worked with floats. But now we can revise this decision. * gcd(n, m) Is just moved from the math module. * as_integer_ration(x) Equivalents to: def as_integer_ration(x): if hasattr(x, 'as_integer_ration'): return x.as_integer_ration() else: return (x.numerator, x.denominator) * binom(n, k) Returns factorial(n) // (factorial(k) * factorial(n-k)), but uses more efficient algorithm. * sqrt(n) Returns the largest integer r such that r**2 <= n and (r+1)**2 > n. * isprime(n) Tests if n is a prime number. * primes() Returns an iterator of prime numbers: 2, 3, 5, 7, 11, 13,... Are there more ideas?

Show replies by date

David Mertz

12 Jul 12 Jul

12:15 p.m.

On Thu, Jul 12, 2018, 7:56 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

...

* isprime(n) Tests if n is a prime number.

How would you test this? The Miller-Rabin Primality test? For large numbers, iterating through candidate prime divisors is pricey. * primes()

...

Returns an iterator of prime numbers: 2, 3, 5, 7, 11, 13,...

How would you implements this? Sieve of Eratosthenes is really fun to show students as a Python generator function. But the cached primes DO grow unboundedly as you utilize the generator. Wheel factorization as first pass? Do you cached the first N primes so the each instance of iterator can provide initial elements much faster? What is N?

...

Serhiy Storchaka

1:30 p.m.

12.07.18 15:15, David Mertz пише:

...

On Thu, Jul 12, 2018, 7:56 AM Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote:

* isprime(n) Tests if n is a prime number.

How would you test this? The Miller-Rabin Primality test? For large numbers, iterating through candidate prime divisors is pricey.

* primes() Returns an iterator of prime numbers: 2, 3, 5, 7, 11, 13,...

How would you implements this? Sieve of Eratosthenes is really fun to show students as a Python generator function. But the cached primes DO grow unboundedly as you utilize the generator. Wheel factorization as first pass? Do you cached the first N primes so the each instance of iterator can provide initial elements much faster? What is N?

I didn't mean any concrete implementation. Sure there are enough efficient and simple. I sometimes need a sequence of prime numbers for solving toy problems, and quickly write something like: def primes(n): return (i for i in range(2, n) if all(i % j for j in primes(int(sqrt(i)) + 1))) Having such feature in the stdlib would be very convenient. Any reasonable implementation is enough efficient for my purposes.

Paul Moore

2:30 p.m.

On 12 July 2018 at 14:30, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

12.07.18 15:15, David Mertz пише:

...
On Thu, Jul 12, 2018, 7:56 AM Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote:

* isprime(n) Tests if n is a prime number.

How would you test this? The Miller-Rabin Primality test? For large numbers, iterating through candidate prime divisors is pricey.

* primes() Returns an iterator of prime numbers: 2, 3, 5, 7, 11, 13,...

How would you implements this? Sieve of Eratosthenes is really fun to show students as a Python generator function. But the cached primes DO grow unboundedly as you utilize the generator. Wheel factorization as first pass? Do you cached the first N primes so the each instance of iterator can provide initial elements much faster? What is N?

I didn't mean any concrete implementation. Sure there are enough efficient and simple. I sometimes need a sequence of prime numbers for solving toy problems, and quickly write something like:

def primes(n): return (i for i in range(2, n) if all(i % j for j in primes(int(sqrt(i)) + 1)))

Having such feature in the stdlib would be very convenient. Any reasonable implementation is enough efficient for my purposes.

Agreed, having these functions in the stdlib would be convenient, but I think it's important that we have at least reasonably performing implementations (they don't have to be suitable for specialist use, but they should be performant enough for production, non-specialist use). IMO, the statistics module got this balance right - good enough for general use, but specialists will want something better. I'm -1 on having inefficient versions like your primes(n) implementation above in the stdlib. People will use them not realising they may not be suitable for anything other than toy problems. I'd be interested in the following (which is mostly a duplicate of your list) * factorial * gcd * binom * isprime (it's important to be clear if this is a guaranteed check, or a probabilistic one like Miller Rabin) * primes (an infinite generator of primes) * factors (generates the prime factors of a number, in increasing order, so list(factors(12)) = [2, 2, 3]) Your divide_and_round might be nice, but there are multiple common ways of rounding 0.5 (up, down, towards 0, away from 0, to even, to odd, ...) Your implementation does round to even, but that's not necessarily the obvious one (schools often teach round up or down, so use of your implementation in education could be confusing).

...

Because cmath. But if most core developers prefer intmath, I have no objections.

I prefer imath as a parallel with cmath, as you say. But it's not a big deal to me. Paul

Serhiy Storchaka

3:28 p.m.

12.07.18 17:30, Paul Moore пише:

...

I'm -1 on having inefficient versions like your primes(n) implementation above in the stdlib. People will use them not realising they may not be suitable for anything other than toy problems.

This is just the shortest version that I write every time when I need primes. For the stdlib implementation we don't have such strong limit on the size.

...

Your divide_and_round might be nice, but there are multiple common ways of rounding 0.5 (up, down, towards 0, away from 0, to even, to odd, ...) Your implementation does round to even, but that's not necessarily the obvious one (schools often teach round up or down, so use of your implementation in education could be confusing).

This is the one that is useful in many applications and can't be trivially written as expression in Python, but is useful in many applications (date and time, Fraction, Decimal). Divide and rounding down: n//m. Divide and rounding up: (n+m-1)//m or -((-n)//m).

Steven D'Aprano

4:14 p.m.

On Thu, Jul 12, 2018 at 06:28:19PM +0300, Serhiy Storchaka wrote:

...

This is the one that is useful in many applications and can't be trivially written as expression in Python, but is useful in many applications (date and time, Fraction, Decimal).

Divide and rounding down: n//m. Divide and rounding up: (n+m-1)//m or -((-n)//m).

Sorry, I don't understand why you say that divide and round up can't be trivially written in Python. Don't you give two implementations yourself? I have this in my toolbox: def ceildiv(a, b): """Return the ceiling of a/b.""" return -(-a//b) and if its buggy, I've never noticed it yet. -- Steve

Serhiy Storchaka

4:50 p.m.

12.07.18 19:14, Steven D'Aprano пише:

...

Sorry, I don't understand why you say that divide and round up can't be trivially written in Python. Don't you give two implementations yourself?

Sorry, perhaps I was unclear, but I said the opposite.

Steven D'Aprano

5:33 p.m.

On Thu, Jul 12, 2018 at 07:50:32PM +0300, Serhiy Storchaka wrote:

...

12.07.18 19:14, Steven D'Aprano пише:

...
Sorry, I don't understand why you say that divide and round up can't be trivially written in Python. Don't you give two implementations yourself?

Sorry, perhaps I was unclear, but I said the opposite.

Oh, sorry, you were talking about round to even! I misunderstood. -- Steve

David Mertz

5:26 p.m.

On Thu, Jul 12, 2018, 9:31 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

...

12.07.18 15:15, David Mertz пише:

...
On Thu, Jul 12, 2018, 7:56 AM Serhiy Storchaka I didn't mean any concrete implementation. Sure there are enough efficient and simple. I sometimes need a sequence of prime numbers for solving toy problems, and quickly write something like:

def primes(n): return (i for i in range(2, n) if all(i % j for j in primes(int(sqrt(i)) + 1)))

Having such feature in the stdlib would be very convenient. Any reasonable implementation is enough efficient for my purposes.

That's the point I was getting at. "For your purposes" isn't general enough to include in the standard library. There are quite a few algorithms and efficiency trade-offs to decide on. It's not clear to me that any choice is sufficiently "one size fits all" to include. I'm not saying there definitely IS NOT a reasonable compromise approach to these things, but I'd have to judge concrete suggestions... Or better still, people with better knowledge of number theory than mine should make those judgements.

Steven D'Aprano

6:11 p.m.

On Thu, Jul 12, 2018 at 01:26:45PM -0400, David Mertz wrote: [talking about prime numbers]

...

That's the point I was getting at. "For your purposes" isn't general enough to include in the standard library. There are quite a few algorithms and efficiency trade-offs to decide on. It's not clear to me that any choice is sufficiently "one size fits all" to include.

We're talking about "batteries included", not a nuclear reactor. (Thanks to Nick for that great analogy!) This is a fairly narrow and deterministic set of requirements: - test whether a number is prime; - return some primes; and maybe a couple of other related functions. Not a data structure, where we have to make serious tradeoffs in the type of values they support (hashable, orderable, or any arbitrary value, say) and what can be done with them. We already know what we have to support : ints, as large as will fit in memory. While of course there are efficiency trade-offs, at the point you really care about them, you're in nuclear reactor territory and are probably using some library too specialised even for numpy *wink* There is no reason why primality testing can't be deterministic up to 2**64, and probabilistic with a ludicrously small chance of false positives beyond that. The implementation I use can be expected to fail on average once every 18 thousand years if you did nothing but test primes every millisecond of the day, 24 hours a day. That's good enough for most purposes :-)

...

I'm not saying there definitely IS NOT a reasonable compromise approach to these things, but I'd have to judge concrete suggestions... Or better still, people with better knowledge of number theory than mine should make those judgements.

Miller-Rabin + trial division by the first few small primes is the work-horse of primality testing. There are "better, faster" algorithms, but they are more complex (more risk of bugs) and probably don't show any improvement until you're dealing with huge numbers. I mean *really* huge. My pure-Python version of Miller-Rabin takes: about 4.7 seconds to test 2**800 + 1; less than a tenth of a millisecond to test 2**900 + 1; and about 8.6 seconds to test 2**1000 + 1. (None of which are prime.) On a less old-and-tired computer, you can probably divide those numbers by, oh, five or ten. You probably won't find any record-breaking Mersenne Primes using my version (certainly not on *my* computer), but I think it would be good enough for the standard library if we decided to add an isprime() function. As far as generating primes, there's a lot more choices, but since they all do the same thing (generate prime numbers) we can easily swap one implementation for another and nobody will even know unless they're timing it. *If* it is desirable to have this in the stdlib, we shouldn't be paralysed by implementation choice. We can start with something "good enough", and move to more complex implementations when and if somebody wants to do the work. -- Steve

Chris Angelico

6:20 p.m.

On Fri, Jul 13, 2018 at 4:11 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

There is no reason why primality testing can't be deterministic up to 2**64, and probabilistic with a ludicrously small chance of false positives beyond that. The implementation I use can be expected to fail on average once every 18 thousand years if you did nothing but test primes every millisecond of the day, 24 hours a day. That's good enough for most purposes :-)

What about false negatives? Guaranteed none? The failure mode of the function should, IMO, be a defined and documented aspect of it. ChrisA

David Mertz

10:05 p.m.

On Thu, Jul 12, 2018 at 2:22 PM Chris Angelico <rosuav@gmail.com> wrote:

...

On Fri, Jul 13, 2018 at 4:11 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...
There is no reason why primality testing can't be deterministic up to 2**64, and probabilistic with a ludicrously small chance of false positives beyond that. The implementation I use can be expected to fail on average once every 18 thousand years if you did nothing but test primes every millisecond of the day, 24 hours a day. That's good enough for most purposes :-)

What about false negatives? Guaranteed none? The failure mode of the function should, IMO, be a defined and documented aspect of it.

Miller-Rabin or other pseudo-primality tests do not produce false negatives IIUC. I'm more concerned in the space/time tradeoff in the primes() generator. I like this implementation for teaching purposes, but I'm well aware that it grows in memory usage rather quickly (the one Tim Peters posted is probably a better balance; but it depends on how many of these you create and how long you run them). from math import sqrt, ceil def up_to(seq, lim): for n in seq: if n < lim: yield n else: break def sieve_generator(): "Pretty good Sieve; skip the even numbers, stop at sqrt(candidate)" yield 2 candidate = 3 found = [] while True: lim = int(ceil(sqrt(candidate))) if all(candidate % prime != 0 for prime in up_to(found, lim)): yield candidate found.append(candidate) candidate += 2 So then how do you implement isprime(). One naive way is to compare it against elements of sieve_generator() until we are equal or larger than the test element. But that's not super fast. A pseudo-primality test is much faster (except for in the first few hundred thousand primes, maybe). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

David Mertz

10:24 p.m.

I take it back... Tim's (really Will Ness's) version is *always* faster and more memory friendly. I'm still trying to understand it though :-). On Thu, Jul 12, 2018 at 6:05 PM David Mertz <mertz@gnosis.cx> wrote:

...

On Thu, Jul 12, 2018 at 2:22 PM Chris Angelico <rosuav@gmail.com> wrote:

...
On Fri, Jul 13, 2018 at 4:11 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...
There is no reason why primality testing can't be deterministic up to 2**64, and probabilistic with a ludicrously small chance of false positives beyond that. The implementation I use can be expected to fail on average once every 18 thousand years if you did nothing but test primes every millisecond of the day, 24 hours a day. That's good enough for most purposes :-)

What about false negatives? Guaranteed none? The failure mode of the function should, IMO, be a defined and documented aspect of it.

Miller-Rabin or other pseudo-primality tests do not produce false negatives IIUC.

I'm more concerned in the space/time tradeoff in the primes() generator. I like this implementation for teaching purposes, but I'm well aware that it grows in memory usage rather quickly (the one Tim Peters posted is probably a better balance; but it depends on how many of these you create and how long you run them).

from math import sqrt, ceil

def up_to(seq, lim): for n in seq: if n < lim: yield n else: break

def sieve_generator(): "Pretty good Sieve; skip the even numbers, stop at sqrt(candidate)" yield 2 candidate = 3 found = [] while True: lim = int(ceil(sqrt(candidate))) if all(candidate % prime != 0 for prime in up_to(found, lim)): yield candidate found.append(candidate) candidate += 2

So then how do you implement isprime(). One naive way is to compare it against elements of sieve_generator() until we are equal or larger than the test element. But that's not super fast. A pseudo-primality test is much faster (except for in the first few hundred thousand primes, maybe).

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Tim Peters

10:35 p.m.

[David Mertz]

...

Miller-Rabin or other pseudo-primality tests do not produce false negatives IIUC.

That's true if they're properly implemented ;-) If they say False, the input is certainly composite (but there isn't a clue as to what a factor may be); if the say True, the input is "probably" a prime.

...

I'm more concerned in the space/time tradeoff in the primes() generator. I like this implementation for teaching purposes, but I'm well aware that it grows in memory usage rather quickly (the one Tim Peters posted is probably a better balance; but it depends on how many of these you create and how long you run them).

As you noted later: I take it back... Tim's (really Will Ness's) version is *always* faster and

...

more

memory friendly. The original version of this came from an ActiveState recipe that many on c.l.py contributed to (including me) years ago, and that's where all the speed came from. There is, for example, no division or square roots. Will's contribution was the unexpected way to slash memory use, via the generator calling itself recursively. Elegant and effective!

...

I'm still trying to understand it though :-).

Picture doing the Sieve of Eratosthenes by hand, crossing out multiples of "the next" prime you find. That's basically what it's doing, except doing the "crossing out" part incrementally, using a dict to remember, for each prime p, "the next" multiple of p you haven't yet gotten to.

...

from math import sqrt, ceil

def up_to(seq, lim): for n in seq: if n < lim: yield n else: break

def sieve_generator(): "Pretty good Sieve; skip the even numbers, stop at sqrt(candidate)" yield 2 candidate = 3 found = [] while True: lim = int(ceil(sqrt(candidate))) if all(candidate % prime != 0 for prime in up_to(found, lim)): yield candidate found.append(candidate) candidate += 2

Note that this generates squares of odd primes too (9, 25, ...). `up_to` needs to be more like `up_to_and_including`.

David Mertz

10:47 p.m.

Oops... yeah. I had fixed the "up to and including" previously, but somehow I copied the wrong version to the thread. On Thu, Jul 12, 2018 at 6:36 PM Tim Peters <tim.peters@gmail.com> wrote:

...

[David Mertz]

...
Miller-Rabin or other pseudo-primality tests do not produce false negatives IIUC.

That's true if they're properly implemented ;-) If they say False, the input is certainly composite (but there isn't a clue as to what a factor may be); if the say True, the input is "probably" a prime.

...
I'm more concerned in the space/time tradeoff in the primes() generator. I like this implementation for teaching purposes, but I'm well aware that it grows in memory usage rather quickly (the one Tim Peters posted is probably a better balance; but it depends on how many of these you create and how long you run them).

As you noted later:

I take it back... Tim's (really Will Ness's) version is *always* faster

...
and more

memory friendly.

The original version of this came from an ActiveState recipe that many on c.l.py contributed to (including me) years ago, and that's where all the speed came from. There is, for example, no division or square roots. Will's contribution was the unexpected way to slash memory use, via the generator calling itself recursively. Elegant and effective!

...
I'm still trying to understand it though :-).

Picture doing the Sieve of Eratosthenes by hand, crossing out multiples of "the next" prime you find. That's basically what it's doing, except doing the "crossing out" part incrementally, using a dict to remember, for each prime p, "the next" multiple of p you haven't yet gotten to.

...
from math import sqrt, ceil

def up_to(seq, lim): for n in seq: if n < lim: yield n else: break

def sieve_generator(): "Pretty good Sieve; skip the even numbers, stop at sqrt(candidate)" yield 2 candidate = 3 found = [] while True: lim = int(ceil(sqrt(candidate))) if all(candidate % prime != 0 for prime in up_to(found, lim)): yield candidate found.append(candidate) candidate += 2

Note that this generates squares of odd primes too (9, 25, ...). `up_to` needs to be more like `up_to_and_including`.

Tim Peters

11:52 p.m.

[David Mertz]

...

Oops... yeah. I had fixed the "up to and including" previously, but somehow I copied the wrong version to the thread.

Doesn't matter ;-) The point here, to me, is that the prime generator I pointed at is significantly faster, more memory-frugal, and even "more correct", than even experienced Python programmers are likely to come up with at first. That gives it real value as a candidate for a standard library. But something much fancier than that? I don't think so, for reasons already given - the code is still understandable for non-experts if they give it some effort. Fancier stuff may not be. For example, in my own code I use a fancier version of that incorporating a wheel sieve too, and even the _setup_ function for that is harder to understand all on its own: def _setup_sieve(ps, show=True): from math import gcd assert ps[0] == 2 prod = totient = 1 for p in ps: assert pp(p) prod *= p totient *= p-1 if show: print("ps", ps, "prod", prod) mod2i = [None] * prod mod2i[1] = 0 deltas = [] last = 1 for i in range(3, prod, 2): if gcd(prod, i) == 1: deltas.append(i - last) mod2i[i] = len(deltas) last = i deltas.append(2) # wrap around from prod-1 to 1 assert len(deltas) == totient assert sum(deltas) == prod if show: print("deltas", deltas, len(deltas)) print("mod2i", mod2i) return ps, deltas * 2, mod2i _siv_glob = _setup_sieve([2, 3, 5, 7], show=False) I don't want that in the standard library - and twice as much don't want it if somebody else wrote it ;-)

Steven D'Aprano

13 Jul 13 Jul

1:42 a.m.

On Thu, Jul 12, 2018 at 05:35:54PM -0500, Tim Peters wrote:

...

[David Mertz]

...
Miller-Rabin or other pseudo-primality tests do not produce false negatives IIUC.

That's true if they're properly implemented ;-) If they say False, the input is certainly composite (but there isn't a clue as to what a factor may be); if the say True, the input is "probably" a prime.

I'm sure Tim knows this, but that's only sometimes true. Since I had no formal education on primality testing and had to pick this up in dribs and drabs over many years, I was under the impression for the longest time that Miller-Rabin was always probabilitistic, but that's not quite right. Without going into the gory details, it turns out that for any N, you can do a brute-force primality test by checking against *every* candidate up to some maximum value. (Which is how trial division works, only Miller-Rabin tests are more expensive than a single division.) Such an exhaustive check is not practical, hence the need to fall back on randomly choosing candidates. However, that's in the most general case. At least for small enough N, there are rather small sets of candidates which give a completely deterministic and conclusive result. For N <= 2**64, it never requires more than 12 Miller-Rabin tests to give a conclusive answer, and for some N, as few as a single test is enough. To be clear, these are specially chosen tests, not random tests. So in a good implementation, for N up to 2**64, there is never a need for a "probably prime" result. It either is, or isn't prime, and there's no question which. In principle, there could be similar small sets of conclusive tests for larger N too, but (as far as I know) nobody has discovered them yet. Hence we fall back on choosing Miller-Rabin tests at random. The chances of an arbitrary composite number passing k such tests is on average 1/4**k, so we can make the average probability of failure as small as we like, by doing more tests. With a dozen or twenty tests, the probability of a false positive (a composite number wrongly reported as prime) is of the same order of magnitude as that of a passing cosmic ray flipping a bit in memory and causing the wrong answer to appear. -- Steve

Steven D'Aprano

12:40 a.m.

On Fri, Jul 13, 2018 at 04:20:55AM +1000, Chris Angelico wrote:

...

On Fri, Jul 13, 2018 at 4:11 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...
There is no reason why primality testing can't be deterministic up to 2**64, and probabilistic with a ludicrously small chance of false positives beyond that. The implementation I use can be expected to fail on average once every 18 thousand years if you did nothing but test primes every millisecond of the day, 24 hours a day. That's good enough for most purposes :-)

What about false negatives? Guaranteed none? The failure mode of the function should, IMO, be a defined and documented aspect of it.

If a non-deterministic primality tester such as (but not limited to) Miller-Rabin says a number is composite, it is definitely composite (ignoring bugs in the implementation, or hardware failures). If it says it is prime, in the worst case it is merely "probably prime", with an error rate proportional to 1/4**k where we can choose the k. (The bigger the k the more work may be done in the worst case.) -- Steve

Chris Angelico

12:56 a.m.

On Fri, Jul 13, 2018 at 10:40 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Fri, Jul 13, 2018 at 04:20:55AM +1000, Chris Angelico wrote:

...
On Fri, Jul 13, 2018 at 4:11 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...
There is no reason why primality testing can't be deterministic up to 2**64, and probabilistic with a ludicrously small chance of false positives beyond that. The implementation I use can be expected to fail on average once every 18 thousand years if you did nothing but test primes every millisecond of the day, 24 hours a day. That's good enough for most purposes :-)

What about false negatives? Guaranteed none? The failure mode of the function should, IMO, be a defined and documented aspect of it.

If a non-deterministic primality tester such as (but not limited to) Miller-Rabin says a number is composite, it is definitely composite (ignoring bugs in the implementation, or hardware failures). If it says it is prime, in the worst case it is merely "probably prime", with an error rate proportional to 1/4**k where we can choose the k.

(The bigger the k the more work may be done in the worst case.)

You can say that about algorithms easily enough. My point is that this ought to be a constraint on the function - implementations may choose other algorithms, but they MUST follow one pattern or the other, meaning that a Python script can depend on it without knowing the implementation. Like guaranteeing that list.sort() is stable without stipulating the actual sort algo used. ChrisA

Steven D'Aprano

1:48 a.m.

On Fri, Jul 13, 2018 at 10:56:21AM +1000, Chris Angelico wrote:

...

You can say that about algorithms easily enough. My point is that this ought to be a constraint on the function - implementations may choose other algorithms, but they MUST follow one pattern or the other, meaning that a Python script can depend on it without knowing the implementation. Like guaranteeing that list.sort() is stable without stipulating the actual sort algo used.

I cannot imagine an algorithm that wasn't totally brain-dead (like "flip a coin") which could wrongly report a prime number as composite. Maybe I'm not imaginative enough :-) But yeah, if you want to formally specify that any such isprime test will never have false negatives (never report an actual prime as composite), sure thing. I think that's stating the bleeding obvious, like demanding that any algorithm we use for factorial(n) must not return the sqrt of n by mistake, but whatever :-) I suppose that if you cared more about running time than correctness, one might simply report anything bigger than (let's say) 2**20 as "composite". But I call that "brain dead" :-) -- Steve

Chris Angelico

2:02 a.m.

On Fri, Jul 13, 2018 at 11:48 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Fri, Jul 13, 2018 at 10:56:21AM +1000, Chris Angelico wrote:

...
You can say that about algorithms easily enough. My point is that this ought to be a constraint on the function - implementations may choose other algorithms, but they MUST follow one pattern or the other, meaning that a Python script can depend on it without knowing the implementation. Like guaranteeing that list.sort() is stable without stipulating the actual sort algo used.

I cannot imagine an algorithm that wasn't totally brain-dead (like "flip a coin") which could wrongly report a prime number as composite. Maybe I'm not imaginative enough :-)

Haha. Okay. I'm not familiar with every possible primality test, so I had no idea that they ALL have the same failure mode :) ChrisA

Tim Peters

4:58 a.m.

[Chris Angelico, on "probable prime" testers]

...

You can say that about algorithms easily enough. My point is that this ought to be a constraint on the function - implementations may choose other algorithms, but they MUST follow one pattern or the other, meaning that a Python script can depend on it without knowing the implementation. Like guaranteeing that list.sort() is stable without stipulating the actual sort algo used.

I agree! Don't worry about it - if this module gets added, I'll make sure of it. My own "probable prime" function starts like so: def pp(n, k=10): """ Return True iff n is a probable prime, using k Miller-Rabin tests. When False, n is definitely composite. """ In the context of algorithmic number theory, "no false negatives" is implied by that context, but it should be spelled out anyway. A "probable prime", by definition, is an integer that satisfies some property that _all_ primes satisfy, but that some composites may satisfy too. The variation among probable prime algorithms is in which specific property(ies) they test for. For example: def pp1(n): return True meets the promise, but is useless ;-) def pp2(n): return n % 3 != 0 or n == 3 is slightly less useless. But def pp3(n): return n % 3 != 0 would be incorrect, because pp3(3) returns False - `n % 3 != 0` is not a property that all primes satisfy.

Chris Angelico

5 a.m.

On Fri, Jul 13, 2018 at 2:58 PM, Tim Peters <tim.peters@gmail.com> wrote:

...

[Chris Angelico, on "probable prime" testers]

...
You can say that about algorithms easily enough. My point is that this ought to be a constraint on the function - implementations may choose other algorithms, but they MUST follow one pattern or the other, meaning that a Python script can depend on it without knowing the implementation. Like guaranteeing that list.sort() is stable without stipulating the actual sort algo used.

I agree! Don't worry about it - if this module gets added, I'll make sure of it. My own "probable prime" function starts like so:

def pp(n, k=10): """ Return True iff n is a probable prime, using k Miller-Rabin tests.

When False, n is definitely composite. """

In the context of algorithmic number theory, "no false negatives" is implied by that context, but it should be spelled out anyway. A "probable prime", by definition, is an integer that satisfies some property that _all_ primes satisfy, but that some composites may satisfy too. The variation among probable prime algorithms is in which specific property(ies) they test for.

For example:

def pp1(n): return True

meets the promise, but is useless ;-)

def pp2(n): return n % 3 != 0 or n == 3

is slightly less useless.

But

def pp3(n): return n % 3 != 0

would be incorrect, because pp3(3) returns False - `n % 3 != 0` is not a property that all primes satisfy.

Thank you. Great explanation. Much appreciated! ChrisA

Daniel Moisset

12 Jul 12 Jul

12:19 p.m.

primefactors(n): an iterator on the primes that evenly divide n (repeating such as the product is n (I don't have a good name): something telling that an integer can be represented exactly as a float On 12 July 2018 at 12:55, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

What are your thoughts about adding a new imath module for integer mathematics? It could contain the following functions:

* factorial(n)

Is just moved from the math module, but non-integer types are rejected. Currently math.factorial() accepts also integer floats like 3.0. It looks to me, the rationale was that at the time when math.factorial() was added, all function in the math module worked with floats. But now we can revise this decision.

* gcd(n, m)

Is just moved from the math module.

* as_integer_ration(x)

Equivalents to:

def as_integer_ration(x): if hasattr(x, 'as_integer_ration'): return x.as_integer_ration() else: return (x.numerator, x.denominator)

* binom(n, k)

Returns factorial(n) // (factorial(k) * factorial(n-k)), but uses more efficient algorithm.

* sqrt(n)

Returns the largest integer r such that r**2 <= n and (r+1)**2 > n.

* isprime(n)

Tests if n is a prime number.

* primes()

Returns an iterator of prime numbers: 2, 3, 5, 7, 11, 13,...

Are there more ideas?

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- <https://www.machinalis.com/> Daniel Moisset Technical Leader A: 1 Fore St, EC2Y 9DT London <https://goo.gl/maps/pH9BBLgE8dG2> P: +44 7398 827139 <+44+7398+827139> M: dmoisset@machinalis.com <dmoisset@machinalis.com> | S: dmoisset <http://www.linkedin.com/company/456525> <http://www.twitter.com/machinalis> <http://www.facebook.com/machinalis> <https://www.instagram.com/machinalis.life/>

Devin Jeanpierre

12:39 p.m.

On Thu, Jul 12, 2018 at 5:20 AM Daniel Moisset <dmoisset@machinalis.com> wrote:

...

(I don't have a good name): something telling that an integer can be represented exactly as a float

One might also ask that question of e.g. decimal.Decimal, fractions.Fraction, so maybe it's better as a method or somewhere else. (Is this any different than float(x).as_integer_ratio() == (x.numerator, x.denominator) for ints/fractions?) -- Devin

Serhiy Storchaka

1:11 p.m.

12.07.18 14:55, Serhiy Storchaka пише:

...

What are your thoughts about adding a new imath module for integer mathematics? It could contain the following functions:

I forgot yet one useful function: def divide_and_round(a, b): q, r = divmod(a, b) r *= 2 greater_than_half = r > b if b > 0 else r < b if greater_than_half or r == b and q % 2 == 1: q += 1 return q (This is a copy of the implementation from datetime.py, there are yet few implementations in Python and C in other files.)

Robert Vanden Eynde

1:15 p.m.

About the name, why not intmath ? And why not using sage ? Or create a package on pypi ? Le jeu. 12 juil. 2018 à 15:11, Serhiy Storchaka <storchaka@gmail.com> a écrit :

...

12.07.18 14:55, Serhiy Storchaka пише:

...
What are your thoughts about adding a new imath module for integer mathematics? It could contain the following functions:

I forgot yet one useful function:

def divide_and_round(a, b): q, r = divmod(a, b) r *= 2 greater_than_half = r > b if b > 0 else r < b if greater_than_half or r == b and q % 2 == 1: q += 1 return q

(This is a copy of the implementation from datetime.py, there are yet few implementations in Python and C in other files.)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Serhiy Storchaka

1:41 p.m.

12.07.18 16:15, Robert Vanden Eynde пише:

...

About the name, why not intmath ?

Because cmath. But if most core developers prefer intmath, I have no objections.

...

And why not using sage ? Or create a package on pypi ?

Because some of these function could be useful in the stdlib. Because the special purposed module is a better home for some math functions. Because it is convenient to have these simple batteries included. Sage is too heavyweight. The reasons are the same as for adding factorial() and gcd() in the math module. But after adding more such functions it is better to move them into the special module.

Nick Coghlan

15 Jul 15 Jul

4:49 a.m.

On 12 July 2018 at 23:41, Serhiy Storchaka <storchaka@gmail.com> wrote:

...

12.07.18 16:15, Robert Vanden Eynde пише:

...
About the name, why not intmath ?

Because cmath. But if most core developers prefer intmath, I have no objections.

My initial reaction from just the subject title was "But we already have cmath, why would we need imath?", based on the fact that mathematicians write complex numbers as "X + Yi", rather than the "X + Yj" that Python borrowed from electrical engineering (where "i" already had a different meaning as the symbol for AC current). Calling the proposed module "intmath" instead would be clearer to me (and I agree with the rationale that as the number of int-specific functions increases, separating them out from the more float-centric math module makes sense). Beyond that, I think the analogy with the statistics module is a good one: 1. There are a number of integer-specific algorithms where naive implementations are going to be slower than they need to be, subtly incorrect in some cases, or both. With a standard library module, we can provide a robust test suite, and pointers towards higher performance alternatives for folks that need them. 2. For educational purposes, being able to introduce the use cases for a capability before delving into the explanation of how that capability works can be quite a powerful technique (the standard implementations can also provide a cross-check on the accuracy of student implementations). And in the intmath case, there are likely to be more opportunities to delete private implementations from other standard library modules in favour of the public intmath functions. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Steven D'Aprano

12 Jul 12 Jul

4:05 p.m.

On Thu, Jul 12, 2018 at 02:55:58PM +0300, Serhiy Storchaka wrote:

...

What are your thoughts about adding a new imath module for integer mathematics?

I'm not sure that we need a specific module for integer-valued maths. I think a more important change would be to allow math functions to be written in Python, not just C: - move the current math library to a private _math library; - add math.py, which calls "from _math import *". I often miss having a good, fast, integer square root function in the stdlib. I have a slow implementation here: http://code.activestate.com/recipes/577821-integer-square-root-function/ and as the commentary discusses, a lot of implementations on the internet are buggy. As well as isqrt(n), there's also nsqrt(n), to return the integer nearest to the square root of n. They're equivalent to: isqrt = floor sqrt nsqrt = round sqrt without overflowing for large n. (Presumably there's a ceiling sqrt function too, but I've never needed it, or seen anyone else use it.) As far as your other suggestions: * factorial(n) * gcd(n, m) Moving those seem like a fairly pedantic change. But if we did it, we could allow gcd to take an arbitrary number of values: gcd(*args) and add a least common multiple function to match. * as_integer_ration(x) I think that's mispelled "ratio" :-) There's currently a private function statistics._exact_ratio which does that. * binom(n, k) If we have number of combinations, I'd like to have number of permutations as well. * isprime(n) * primes() Primality testing and generation of primes is a bigger job than it might seem, probably deserving of an entire library. (Not necessarily in the std lib.) But if we did provide a "batteries included" solution, it should also include next prime and previous prime functions. For isprime(), I have a pure-Python solution which is not too shabby, capable of giving exact results for all integers up to 2**64 and probabilitistic results above that. Despite being pure Python, it's reasonably fast. In the REPL, these are essentially instantaneous to the naked eye: py> pyprimes.prev_prime(2**64) 18446744073709551557L py> pyprimes.is_prime(18446744073709551557L) True py> pyprimes.next_prime(2**64) pyprimes/__init__.py:278: MaybeComposite: 18446744073709551629 is only only probably prime warnings.warn(message, MaybeComposite) 18446744073709551629L At the moment this is Python 2 only and frankly overkill for the stdlib. I have something like a dozen different implementations for generating or testing primes, some of which are (intentionally) just awful. One of my favourite so-awful-it's-good algorithms is this one for testing whether a number is prime: def isprime(n): # Kids: don't try this at home! return not re.match(r'^1?$|^(11+?)\1+$', '1'*n) But if there is interest in a pure-Python solution, I might be able to find time to clean it up to stdlib quality. (Not the regex solution, obviously.) -- Steve

Chris Barker

13 Jul 13 Jul

5:28 a.m.

On Thu, Jul 12, 2018 at 11:05 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

I'm not sure that we need a specific module for integer-valued maths. I think a more important change would be to allow math functions to be written in Python, not just C:

- move the current math library to a private _math library;

- add math.py, which calls "from _math import *".

FWIW, Guido rejected that idea when we added math.isclose() -- Victor had even written it up already. Not that people never change their mind... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Tim Peters

12 Jul 12 Jul

6:57 p.m.

]Serhiy Storchaka <storchaka@gmail.com>]

...

What are your thoughts about adding a new imath module for integer mathematics? It could contain the following functions:

I have mixed feelings :-( There are a number of integer functions broadly useful to people attracted to such things, and "it would be nice" to have basic implementations ship with the language. But the "basic" is important to me. Algorithms for unbounded integers can grow in complexity seemingly without bound too, and by the time there are 10 special cases 8 of which have been coded in C, it becomes an ongoing maintenance headache - and the code has become so complex that users can't easily look at the source to judge whether the implementation is _still_ too naive for their application. For example, for generating primes, I'd balk at anything fancier than my recoding in Python 3 of Will Ness's beautiful algorithm: https://stackoverflow.com/a/19391111/2705542 That generates primes in increasing order, with no need (or even possibility) to specify an upper bound in advance. But unlike nearly all other reasonably efficient approaches to that, the memory burden grows (very roughly) with the square root of the largest prime generated so far. Most sieve algorithms require instead remembering all the ~P/log(P) primes <= P, which grows much faster than sqrt(P). That's a real practical difference. There are faster ways to generate the primes, and even that way can be sped significantly by complicating the code significantly to incorporate a multiple-small-prime wheel. But I'd much rather leave significantly more complicated code to _specialized_, non-core packages. Anyway, in addition to your list, there are at least four others I view as fundamental: def iroot(n, k): # the obvious generalization of integer square root """Return the floor of the k'th root of n.""" def egcd(a, b): # extended gcd """Return (g, ga, gb) s.t. ga*a + gb*b == g == gcd(a, b).""" def minv(p, q): """ Return multiplicative modular inverse of p with respect to modulus q. (p * minv(p, q)) % q == 1 """ def factor(n): """Return a list containing n's prime factors.""" State-of-the-art code for `factor()` alone could double the size of the Python source distribution ;-)

Ross

21 Mar 21 Mar

12:42 p.m.

I support this proposal. This would fit nicely with my recent enhancement proposal raised to introduce a method for finding prime factors of non-negative integer n.

Ross

1:23 p.m.

Hello Serhiy, I support this proposal, thanks for sharing this thread in my enhancement proposal (issue #40028). I would be more than happy to help you where time permits in implementing this module if approval is granted. Ross

Jonathan Fine

4:18 p.m.

Hi In https://bugs.python.org/issue40028 <https://bugs.python.org/issue40028?>, Ross Rhodes suggested adding to the standard library a function that finds the prime factors of a non-negative integer. So far as I know, any general algorithm for doing this requires a list of prime numbers. To this issue I've just added https://bugs.python.org/issue40028?#msg364757 which reads: A pre-computed table of primes might be better. Of course, how long should

...

the table be. There's an infinity of primes.

...

Consider

...
...
...
2**32 4294967296

...

This number is approximately 4 * (10**9). According to https://en.wikipedia.org/wiki/Prime_number_theorem, there are 50,847,534 primes less than 10**9. So, very roughly, there are 200,000,000 primes less than 2**32.

...

Thus, storing a list of all these prime numbers as 32 bit unsigned integers would occupy about

...
...
...
200_000_000 / (1024**3) * 4 0.7450580596923828 or in other words 3/4 gigabytes on disk.

...

A binary search into this list, using as starting point the expected location provided by the prime number theorem, might very well require on average less than two block reads into the file that holds the prime number list on disk. And if someone needs to find primes of this size, they've probably got a spare gigabyte or two.

...

I'm naturally inclined to this approach because by mathematical research involves spending gigahertz days computing tables. I then use the tables to examine hypotheses. See https://arxiv.org/abs/1011.4269. This involves subsets of the vertices of the 5-dimensional cube. There are of course 2**32 such subsets.

-- Jonathan

Steven D'Aprano

5:37 p.m.

On Sat, Mar 21, 2020 at 04:18:32PM +0000, Jonathan Fine wrote:

...

Hi

In https://bugs.python.org/issue40028 <https://bugs.python.org/issue40028?>, Ross Rhodes suggested adding to the standard library a function that finds the prime factors of a non-negative integer. So far as I know, any general algorithm for doing this requires a list of prime numbers.

No, that's incorrect. Having a precomputed list of prime numbers is impractical for heavy-duty factorization of even moderately largish numbers, let alone genuinely large numbers, although trial division by the first few primes is often useful to eliminate the easy cases. But trying to hold onto a table of millions of primes is wasteful and most importantly unnecessary. Primes can be cheaply generated on the fly, as needed, and probably faster than they can be read off a hard drive: https://primes.utm.edu/notes/faq/LongestList.html There are 16,352,460,426,841,680,446,427,399 primes below `10**27` https://oeis.org/A006880 so even if we used a table of compact 32-bit integers, rather than int objects, that would take over 50 thousand million petabytes of storage if my calculations are correct. There may be more compact ways to store it, but I doubt you could get it under a few million petabytes. And 10**27 is not an especially large number. It has only 27 digits. `(2*17*31*101*157*199)**100` has over 900 digits, and sympy can factorise it effectively instantly, without needing to pre-compute a multiple petabyte table of primes :-) (In fairness, I did kinda cheat by generating a number I knew was easily factorisable. Working on an arbitrary 900+ digit number isn't so fast.) [...]

...

...
Consider

...
...
...
2**32 4294967296

...

...
And if someone needs to find primes of this size, they've probably got a spare gigabyte or two.

For what it's worth, I just found the above prime factorization on a computer with 145 MB available in my home directory. (I *really* need a new computer.) In your research, you may consider 2**32 to be a large number, but to people working with primes (for fun, or for research), it's tiny. Thousands of digits is getting large; the largest known prime, as of January 2020, has over 24 million digits. -- Steven

Neil Girdhar

22 Mar 22 Mar

6:01 p.m.

I really like this idea. It fixes the weirdness whereby cmath is for complex numbers, and math is for real numbers and integers. I like separating them into three categories. One suggestion: Consider generalizing binom to the multinomial coefficent. def binom(n, *ks): # Returns n! / prod(ki! for ki in ks) Best, Neil On Thursday, July 12, 2018 at 7:57:08 AM UTC-4, Serhiy Storchaka wrote:

...

What are your thoughts about adding a new imath module for integer mathematics? It could contain the following functions:

* factorial(n)

Is just moved from the math module, but non-integer types are rejected. Currently math.factorial() accepts also integer floats like 3.0. It looks to me, the rationale was that at the time when math.factorial() was added, all function in the math module worked with floats. But now we can revise this decision.

* gcd(n, m)

Is just moved from the math module.

* as_integer_ration(x)

Equivalents to:

def as_integer_ration(x): if hasattr(x, 'as_integer_ration'): return x.as_integer_ration() else: return (x.numerator, x.denominator)

* binom(n, k)

Returns factorial(n) // (factorial(k) * factorial(n-k)), but uses more efficient algorithm.

* sqrt(n)

Returns the largest integer r such that r**2 <= n and (r+1)**2 > n.

* isprime(n)

Tests if n is a prime number.

* primes()

Returns an iterator of prime numbers: 2, 3, 5, 7, 11, 13,...

Are there more ideas?

_______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Neil Girdhar

6:04 p.m.

I mean: def binom(n, *ks): # Check that there is at least one ki, and that their sum is less than n, and that they are all nonnegative. # Returns n! / (prod(ki! for ki in ks) * (n-sum(ks))!) This would still work for binom(n, k), but would also work for the mulinomial case. On Sunday, March 22, 2020 at 2:01:18 PM UTC-4, Neil Girdhar wrote:

...

I really like this idea. It fixes the weirdness whereby cmath is for complex numbers, and math is for real numbers and integers. I like separating them into three categories.

One suggestion: Consider generalizing binom to the multinomial coefficent.

def binom(n, *ks): # Returns n! / prod(ki! for ki in ks)

Best,

Neil

On Thursday, July 12, 2018 at 7:57:08 AM UTC-4, Serhiy Storchaka wrote:

...
What are your thoughts about adding a new imath module for integer mathematics? It could contain the following functions:

* factorial(n)

Is just moved from the math module, but non-integer types are rejected. Currently math.factorial() accepts also integer floats like 3.0. It looks to me, the rationale was that at the time when math.factorial() was added, all function in the math module worked with floats. But now we can revise this decision.

* gcd(n, m)

Is just moved from the math module.

* as_integer_ration(x)

Equivalents to:

def as_integer_ration(x): if hasattr(x, 'as_integer_ration'): return x.as_integer_ration() else: return (x.numerator, x.denominator)

* binom(n, k)

Returns factorial(n) // (factorial(k) * factorial(n-k)), but uses more efficient algorithm.

* sqrt(n)

Returns the largest integer r such that r**2 <= n and (r+1)**2 > n.

* isprime(n)

Tests if n is a prime number.

* primes()

Returns an iterator of prime numbers: 2, 3, 5, 7, 11, 13,...

Are there more ideas?

_______________________________________________ Python-ideas mailing list Python...@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

6:39 p.m.

On Mon, Mar 23, 2020 at 5:13 AM Neil Girdhar <mistersheik@gmail.com> wrote:

...

I mean:

def binom(n, *ks): # Check that there is at least one ki, and that their sum is less than n, and that they are all nonnegative. # Returns n! / (prod(ki! for ki in ks) * (n-sum(ks))!)

This would still work for binom(n, k), but would also work for the mulinomial case.

Thanks for pulling this up again. I actually would have very much liked to have an efficient and accurate binom(n,k) function available - everything I could find was either floating-point (with the limitations thereof) or too inefficient to be used for ridiculously large values of n and/or k. +1 on moving forward with adding an imath module. ChrisA

1659

Age (days ago)

2278

Last active (days ago)

List overview

Download

38 comments

14 participants

participants (14)

Chris Angelico
Chris Barker
Daniel Moisset
David Mertz
Devin Jeanpierre
Jonathan Fine
Neil Girdhar
Nick Coghlan
Paul Moore
Robert Vanden Eynde
Ross
Serhiy Storchaka
Steven D'Aprano
Tim Peters

Add the imath module

Daniel Moisset

tags

participants (14)