Oddly, I did not get Mark's original e-mail, but am seeing replies here. Piggybacking off of James' email here... On 03/12/2019 16:15, Mark Shannon wrote: > Hi Everyone, > > I am proposing a new PEP, still in draft form, to impose a limit of one > million on various aspects of Python programs, such as the lines of code > per module. My main concern about this PEP is it doesn't specify the behavior when a given limit is exceeded. Whether you choose 10 lines or 10 billion lines as the rule, someone annoying (like me) is going to want to know what's going to happen if I break the rule. Non-exhaustively, you could: 1. Say the behavior is implementation defined 2. Physically prohibit the limit from being exceeded (limited by construction/physics) 3. Generate a warning 4. Raise an exception early (during parse/analysis/bytecode generation) 5. Raise an exception during runtime The first two will keep people who hate limits happy, but essentially give the limit no teeth. The last three are meaningful but will upset people when a previously valid program breaks. 1. The C and C++ standards are littered with limits (many of which you have to violate to create a real-world program) that ultimately specify that the resulting behavior is "implementation defined." Most general-purpose compilers have reasonable implementations (e.g. I can actually end my file without a newline and not have it call abort() or execve("/usr/bin/nethack"), behaviors both allowed by the C99 standard). You could go this route, but the end result isn't much better than not having done the PEP in the first place (beyond having an Ivory Tower to sit upon and taunt the unwashed masses, "I told you so," when you do decide to break their code). Don't go this route unless absolutely necessary. Of course, the C/C++ standard isn't for an implementation; this PEP has the luxury of addressing a single implementation (CPython). 2. Many of Java's limits are by construction. You can't exceed 2**16 bytecode instructions for a method because they only allocated a uint16_t (u2 in the classfile spec) for the program counter in various places. (Bizarrely, the size of the method itself is stored as a uint32_t/u4.) I believe these limits are less useful because you'll never hit them in a running program; you simply can't create an invalid program. This would be like saying the size of Python bytecode is limited to the number of particles in the universe (~10**80). You don't have to specify the consequences because physics won't let you violate them. This is more useful for documenting format limits, but probably doesn't achieve what you're trying to achieve. 3. Realistically, this is probably what you'd have to do in the first version for PEP adoption to get non-readers of python-dev@ ready, but, again, it doesn't achieve what you're setting out to do. We'd still accept programs that exceed these limits, and whatever optimizations that depend on these limits being in place wouldn't work. Which brings us to the real meat, 4&5. Some limits don't really distinguish between these cases. Exceeding the total bytecode size for a module, for example, would have to fail at bytecode generation time (ignoring truly irrational behavior like silently truncating the bytecode). But others aren't so cut-and-dry. For example, a module that is compliant except for a single function that contains too many local variables. Whether you do 4 or 5 isn't so obvious: Pros of choosing 4 (exception at load): * I'm alerted of errors early, before I start a 90-hour compute job, only to have it crash in the write_output() function. * Don't have to keep a poisoned function that your optimizers have to special case. Pros of choosing 5 (exception at runtime): * If I never call that function (maybe it's something in a library I don't use), I don't get penalized. * In line with other Python (mis-)behaviors, e.g. raising NameError() at runtime if you typo a variable name. On Tue 12/03/19, 10:05 AM, "Rhodri James" <rhodri@kynesim.co.uk> wrote: On 03/12/2019 16:15, Mark Shannon wrote: > Isn't this "640K ought to be enough for anybody" again? > ------------------------------------------------------- > > The infamous 640K memory limit was a limit on machine usable resources. > The proposed one million limit is a limit on human generated code. > > While it is possible that generated code could exceed the limit, > it is easy for a code generator to modify its output to conform. > The author has hit the 64K limit in the JVM on at least two occasions > when generating Java code. > The workarounds were relatively straightforward and > probably wouldn't have been necessary with a limit of one million > bytecodes or lines of code. I can absolutely guarantee that this will come back and bite you. Someone out there will be doing something more complicated than you think is plausible, and eventually someone will hit your limits. It may not take as long as you think, either. I'm in between Rhodri and Mark here. I've also been bitten by the 64k JVM bytecode limit when generating code, but I did *not* find it so easy to work around. What was a dumb translator suddenly had to get a lot more smarts. Having predictable behavior *is* important, though, and having limits with specified behavior when those limits are exceeded helps. Keep in mind that I'm going to be annoyed when I hit those limits, so having an engineering justification for why the limit was set to a certain value will go a long way into buying you credibility. One million does not feel credible -- that's "we're setting a limit because we couldn't be bothered to figure out what the limit should be." OTOH, 16,777,215 (2**24-1) does feel credible -- that's "no processor is capable of holding this many TLB entries in the level 2 cache with retpolines active without introducing extreme swapping on write-limited SSDs, but you can get around it if you're willing to adjust this constant and recompile." Or whatever. (Ok, don't BS us like I just did, but you get the idea. :-) ) Dave