On Wed, 4 Dec 2019 at 05:41, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Dec 4, 2019 at 3:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Dec 04, 2019 at 01:47:53PM +1100, Chris Angelico wrote:
Integer sizes are a classic example of this. Is it acceptable to limit your integers to 2^16? 2^32? 2^64? Python made the choice to NOT limit its integers, and I haven't heard of any non-toy examples where an attacker causes you to evaluate 2**2**100 and eats up all your RAM.
Does self-inflicted attacks count? I've managed to bring down a production machine, causing data loss, *twice* by thoughtlessly running something like 10**100**100 at the interactive interpreter. (Neither case was a server, just a desktop machine, but the data loss was still very real.)
Hmm, and you couldn't Ctrl-C it? I tried and was able to.
I don't know if this is OS-dependent but I think that maybe there has been an improvement in recent CPython (3.8?) for using Ctrl-C in these cases. Certainly in the past I've seen situations where creating an absurdly large integer cannot be interrupted before it is too late and the system needs a hard reboot. This is actually a common source of bugs in SymPy e.g.: https://github.com/sympy/sympy/issues/17609#issuecomment-531327039 Those bugs in SymPy can be fixed in SymPy which is uniquely in a position to be able to represent large exponent operations without actually evaluating them in dense integer format. I would have thought though that on the spectrum of Python usage SymPy would be very much at the end that really wants to use enormous integers so the fact that it needs to be limited there makes me wonder who does really want to evaluate them. Note that CPython's implementation of large integers is not as optimised as gmp so I think that if someone was using Python for incredibly large integer calculations then they would be well advised not to use plain int in their calculations anyway (SymPy will try to use gmpy/gmpy2 if available).
There ARE a few situations where I'd rather get a simple and clean MemoryError than have it drive my system into the swapper, but there are at least as many situations where you'd rather be able to use virtual memory instead of being forced to manually break a job up. But even there, you can't enshrine a limit in the language definition, since the actual threshold depends on the running system. (And can be far better enforced externally, at least on a Unix-like OS.)
Another possibility is to have a configurable limit like the recursion limit so that users can increase it when they want to. The default limit can be something larger than most people would ever want but small enough that on typical hardware you can't bork the system in a single arithmetic operation. Then the default level and configurability of the limit can be implementation defined. -- Oscar