On Thu, Jul 12, 2018 at 01:26:45PM -0400, David Mertz wrote:
[talking about prime numbers]
That's the point I was getting at. "For your purposes" isn't general enough to include in the standard library. There are quite a few algorithms and efficiency trade-offs to decide on. It's not clear to me that any choice is sufficiently "one size fits all" to include.
We're talking about "batteries included", not a nuclear reactor. (Thanks to Nick for that great analogy!)
This is a fairly narrow and deterministic set of requirements:
- test whether a number is prime; - return some primes;
and maybe a couple of other related functions. Not a data structure, where we have to make serious tradeoffs in the type of values they support (hashable, orderable, or any arbitrary value, say) and what can be done with them. We already know what we have to support : ints, as large as will fit in memory.
While of course there are efficiency trade-offs, at the point you really care about them, you're in nuclear reactor territory and are probably using some library too specialised even for numpy *wink*
There is no reason why primality testing can't be deterministic up to 2**64, and probabilistic with a ludicrously small chance of false positives beyond that. The implementation I use can be expected to fail on average once every 18 thousand years if you did nothing but test primes every millisecond of the day, 24 hours a day. That's good enough for most purposes :-)
I'm not saying there definitely IS NOT a reasonable compromise approach to these things, but I'd have to judge concrete suggestions... Or better still, people with better knowledge of number theory than mine should make those judgements.
Miller-Rabin + trial division by the first few small primes is the work-horse of primality testing. There are "better, faster" algorithms, but they are more complex (more risk of bugs) and probably don't show any improvement until you're dealing with huge numbers. I mean *really* huge. My pure-Python version of Miller-Rabin takes:
about 4.7 seconds to test 2**800 + 1; less than a tenth of a millisecond to test 2**900 + 1; and about 8.6 seconds to test 2**1000 + 1.
(None of which are prime.) On a less old-and-tired computer, you can probably divide those numbers by, oh, five or ten.
You probably won't find any record-breaking Mersenne Primes using my version (certainly not on *my* computer), but I think it would be good enough for the standard library if we decided to add an isprime() function.
As far as generating primes, there's a lot more choices, but since they all do the same thing (generate prime numbers) we can easily swap one implementation for another and nobody will even know unless they're timing it.
*If* it is desirable to have this in the stdlib, we shouldn't be paralysed by implementation choice. We can start with something "good enough", and move to more complex implementations when and if somebody wants to do the work.