Assuming a code base of 50M loc, *and* that all the code would be loaded into a single application (I sincerely hope that isn't the case) *and* that each class is only 100 lines, even then there would only be 500,000 classes. If a single application has 500k classes, I don't think that a limit of 1M classes would be its biggest problem :)
It is more like 1 million calls to `type` adding some linear combination of attributes to a base class. Think of a persistently running server that would create dynamic named tuples lazily. (I am working on code that does that, but with currently 5-6 attributes - that gives me up to 64 classes, but if I had 20 attributes this code would hit that limit - (if one would use the lib in a persistent server, that is :-) ) Anyway, not happening soon - I am just writting to say that one million classes does not mean 1 million hard-codeed 100 LoC classes, rather, it is 1 million calls to "namedtuple". On Thu, 5 Dec 2019 at 11:30, Mark Shannon <mark@hotpy.org> wrote:
Hi Guido,
On 04/12/2019 3:51 pm, Guido van Rossum wrote:
I am overwhelmed by this thread (and a few other things in real life) but here are some thoughts.
1. It seems the PEP doesn't sufficiently show that there is a problem to be solved. There are claims of inefficiency but these aren't substantiated and I kind of doubt that e.g. representing line numbers in 32 bits rather than 20 bits is a problem.
Fundamentally this is not about the immediate performance gains, but about the potential gains from not having to support huge, vaguely defined limits that are never needed in practice.
Regarding line numbers, decoding the line number table for exception tracebacks, profiling and debugging is expensive and the cost is linear in the size of the code object. So, the performance benefit would be largest for the code that is nearest to the limits.
2. I have handled complaints in the past about existing (accidental) limits that caused problems for generated code. People occasionally generate *really* wacky code (IIRC the most recent case was a team that was generating Python code from machine learning models they had developed using other software) and as long as it works I don't want to limit such applications.
The key word here is "occasionally". How much do we want to increase the costs of every Python user for the very rare code generator that might bump into a limit?
3. Is it easy to work around a limit? Even if it is, it may be a huge pain. I've heard of a limit of 65,000 methods in Java on Android, and my understanding was that it was actually a huge pain for both the toolchain maintainers and app developers (IIRC the toolchain had special tricks to work around it, but those required app developers to change their workflow). Yes, 65,000 is a lot smaller than a million, but in a different context the same concern applies.
64k *methods* is much, much less than 1M *classes*. At 6 methods per class, it is 100 times less.
The largest Python code bases, that I am aware of, are at JP Morgan, with something like 36M LOC and Bank of America with a similar number.
Assuming a code base of 50M loc, *and* that all the code would be loaded into a single application (I sincerely hope that isn't the case) *and* that each class is only 100 lines, even then there would only be 500,000 classes. If a single application has 500k classes, I don't think that a limit of 1M classes would be its biggest problem :)
4. What does Python currently do if you approach or exceed one of these limits? I tried a simple experiment, eval(str(list(range(2000000)))), and this completes in a few seconds, even though the source code is a single 16 Mbyte-long line.
You can have lines as long as you like :)
5. On the other hand, the current parser cannot handle more than 100 nested parentheses, and I've not heard complaints about this. I suspect the number of nested indent levels is similarly constrained by the parser. The default function call recursion limit is set to 1000 and bumping it significantly risks segfaults. So clearly some limits exist and are apparently acceptable.
6. In Linux and other UNIX-y systems, there are many per-process or per-user limits, and they can be tuned -- the user (using sudo) can change many of those limits, the sysadmin can change the defaults within some range, and sometimes the kernel can be recompiled with different absolute limits (not an option for most users or even sysadmins). These limits are also quite varied -- the maximum number of open file descriptors is different than the maximum pipe buffer size. This is of course as it should be -- the limits exist to protect the OS and other users/processes from runaway code and intentional attacks on resources. (And yet, fork bombs exist, and it's easy to fill up a filesystem...) I take from this that limits are useful, may have to be overridable, and should have values that make sense given the resource they guard.
Being able to dynamically *reduce* a limit from one million seems like a good idea.
-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>) /Pronouns: he/him //(why is my pronoun here?)/ <
http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Z4QO3SJD... Code of Conduct: http://python.org/psf/codeofconduct/