On Tue, Dec 3, 2019 at 8:20 AM Mark Shannon <mark@hotpy.org> wrote:
The Python language does not specify limits for many of its features. Not having any limit to these values seems to enhance programmer freedom, at least superficially, but in practice the CPython VM and other Python virtual machines have implicit limits or are forced to assume that the limits are astronomical, which is expensive.
The basic idea makes sense to me. Well-defined limits that can be supported properly are better than vague limits that are supported by wishful thinking.
This PR lists a number of features which are to have a limit of one million. If a language feature is not listed but appears unlimited and must be finite, for physical reasons if no other, then a limit of one million should be assumed.
This language is probably too broad... for example, there's certainly a limit on how many objects can be alive at the same time due to the physical limits of memory, but that limit is way higher than a million.
This PR proposes that the following language features and runtime values be limited to one million.
* The number of source code lines in a module * The number of bytecode instructions in a code object. * The sum of local variables and stack usage for a code object. * The number of distinct names in a code object * The number of constants in a code object.
These are all attributes of source files, so sure, a million is plenty, and the interpreter spends a ton of time manipulating tables of these things.
* The number of classes in a running interpreter.
This one isn't as obvious to me... classes are basically just objects of type 'type', and there is definitely code out there that creates classes dynamically. A million still seems like a lot, and I'm not saying I'd *recommend* a design that involves creating millions of different type objects, but it might exist already.
* The number of live coroutines in a running interpreter.
I don't get this one. I'm not thinking of any motivation (the interpreter doesn't track live coroutines differently from any other object), and the limit seems dangerously low. A million coroutines only requires a few gigabytes of RAM, and there are definitely people who run single process systems with >1e6 concurrent tasks (random example: https://goroutines.com/10m) I don't know if there's anyone doing this in *Python right now, due to Python's performance limitations, but it's nowhere near as silly as a function with a million local variables.
Total number of classes in a running interpreter ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This limit has to the potential to reduce the size of object headers considerably.
Currently objects have a two word header, for objects without references (int, float, str, etc.) or a four word header for objects with references. By reducing the maximum number of classes, the space for the class reference can be reduced from 64 bits to fewer than 32 bits allowing a much more compact header.
For example, a super-compact header format might look like this:
.. code-block::
struct header { uint32_t gc_flags:6; /* Needs finalisation, might be part of a cycle, etc. */ uint32_t class_id:26; /* Can be efficiently mapped to address by ensuring suitable alignment of classes */ uint32_t refcount; /* Limited memory or saturating */ }
This format would reduce the size of a Python object without slots, on a 64 bit machine, from 40 to 16 bytes.
In this example, I can't figure out how you'd map your 26 bit class_id to a class object. On a 32-bit system it would be fine, you just need 64 byte alignment, but you're talking about 64-bit systems, so... I know you aren't suggesting classes should have 2**(64 - 26) = ~3x10**11 byte alignment :-) -n -- Nathaniel J. Smith -- https://vorpus.org