Make Python code read-only

Hi, I'm trying to find the best option to make CPython faster. I would like to discuss here a first idea of making the Python code read-only to allow new optimizations. Make Python code read-only ========================== I propose to add an option to Python to make the code read-only. In this mode, module namespace, class namespace and function attributes become read-only. It is still be possible to add a "__readonly__ = False" marker to keep a module, a class and/or a function modifiable. I chose to make the code read-only by default instead of the opposite. In my test, almost all code can be made read-only without major issue, few code requires the "__readonly__ = False" marker. A module is only made read-only by importlib after the module is loaded. The module is stil modifiable when code is executed until importlib has set all its attributes (ex: __loader__). I have a proof of concept: a fork of Python 3.5 making code read-only if the PYTHONREADONLY environment variable is set to 1. Commands to try it: hg clone http://hg.python.org/sandbox/readonly cd readonly && ./configure && make PYTHONREADONLY=1 ./python -c 'import os; os.x = 1' # ValueError: read-only dictionary Status of the standard library (Lib/*.py): 139 modules are read-only, 25 are modifiable. Except of the sys module, all modules writen in C are read-only. I'm surprised that so few code rely on the ability to modify everything. Most of the code can be read-only. Optimizations possible when the code is read-only ================================================= * Inline calls to functions. * Replace calls to pure functions (without side effect) with the result. For example, len("abc") can be replaced with 3. * Constants can be replaced with their values (at least for simple types like bytes, int and str). It is for example possible to implement these optimizations by manipulating the Abstract Syntax Tree (AST) during the compilation from the source code to bytecode. See my astoptimizer project which already implements similar optimizations: https://bitbucket.org/haypo/astoptimizer More optimizations ================== My main motivation to make code read-only is to specialize a function: optimize a function for a specific environment (type of parameters, external symbols like other functions, etc). Checking the type of parameters can be fast (especially when implemented in C), but it would be expensive to check that all global variables used in the function were not modified since the function has been "specialized". For example, if os.path.isabs(path) is called: you have to check that "os.path" and "os.path.isabs" attributes were not modified and that the isabs() was not modified. If we know that globals are read-only, these checks are no more needed and so it becomes cheap to decide if the specialized function can be used or not. It becomes possible to "learn" types (trace the execution of the application, and then compile for the recorded types). Knowing the type of function parameters, result and local variables opens an interesting class of new optimizations, but I prefer to discuss this later, after discussing the idea of making the code read-only. One point remains unclear to me. There is a short time window between a module is loaded and the module is made read-only. During this window, we cannot rely on the read-only property of the code. Specialized code cannot be used safetly before the module is known to be read-only. I don't know yet how the switch from "slow" code to optimized code should be implemented. Issues with read-only code ========================== * Currently, it's not possible to allow again to modify a module, class or function to keep my implementation simple. With a registry of callbacks, it may be possible to enable again modification and call code to disable optimizations. * PyPy implements this but thanks to its JIT, it can optimize again the modified code during the execution. Writing a JIT is very complex, I'm trying to find a compromise between the fast PyPy and the slow CPython. Add a JIT to CPython is out of my scope, it requires too much modifications of the code. * With read-only code, monkey-patching cannot be used anymore. It's annoying to run tests. An obvious solution is to disable read-only mode to run tests, which can be seen as unsafe since tests are usually used to trust the code. * The sys module cannot be made read-only because modifying sys.stdout and sys.ps1 is a common use case. * The warnings module tries to add a __warningregistry__ global variable in the module where the warning was emited to not repeat warnings that should only be emited once. The problem is that the module namespace is made read-only before this variable is added. A workaround would be to maintain these dictionaries in the warnings module directly, but it becomes harder to clear the dictionary when a module is unloaded or reloaded. Another workaround is to add __warningregistry__ before making a module read-only. * Lazy initialization of module variables does not work anymore. A workaround is to use a mutable type. It can be a dict used as a namespace for module modifiable variables. * The interactive interpreter sets a "_" variable in the builtins namespace. I have no workaround for this. The "_" variable is no more created in read-only mode. Don't run the interactive interpreter in read-only mode. * It is not possible yet to make the namespace of packages read-only. For example, "import encodings.utf_8" adds the symbol "utf_8" to the encodings namespace. A workaround is to load all submodules before making the namespace read-only. This cannot be done for some large modules. For example, the encodings has a lot of submodules, only a few are needed. Read the documentation for more information: http://hg.python.org/sandbox/readonly/file/tip/READONLY.txt More optimizations ================== See my notes for all ideas to optimize CPython: http://haypo-notes.readthedocs.org/faster_cpython.html I explain there why I prefer to optimize CPython instead of working on PyPy or another Python implementation like Pyston, Numba or similar projects. Victor

On Tue, May 20, 2014 at 06:57:53PM +0200, Victor Stinner wrote:
At least for me, this represents a material change to the philosophy of the language. While frowned upon, monkey patching is extremely useful while debugging, and occasionally in emergencies. :) Definitely not worth it for a few extra % IMHO David

On Tue, May 20, 2014 at 10:22 AM, <dw+python-ideas@hmmz.org> wrote:
I think part of the point was that this read-only mode would be entirely optional. One of the main reasons that I don't use Python for all of my projects is the speed issue, so anything that's a "free" speedup seems like a great thing. The main cost I can see here is in maintaining the readonly mode and the perhaps subtle bugs that would arise in many people's code when run in readonly mode. As an official feature, there would be a documentation and maintenance cost to the community, but I do think that there's substantial benefit, and especially as an opt-in feature, if the optimizations really speed things up, this seems quite useful. I guess the question is: How does this compare to other "drop-in" speedup solutions, like PyPy. Is it applicable to more existing code? Is it easier to apply? Does it provide a better speed increase? If there's a niche for it in one of those three areas and it's an opt-in system, I see the issue being a cost-benefit analysis of what is gained (whatever that niche is) vs. the maintenance cost in terms of bug reports etc. -Peter Mawhorter

On Wed, May 21, 2014 at 3:36 AM, Peter Mawhorter <pmawhorter@gmail.com> wrote:
Here's a stupid-crazy idea to chew on. (Fortunately this is not python-good-ideas@python.org - I wouldn't have much to contribute there!) Make the per-module flag opt-in-only, but the overall per-application flag active by default. Then, read-only mode applies to a small number of standard library modules (plus any user modules that specifically request it), and will thus be less surprising; and a small rewording of the error message (eg "... - run Python with the -O0 parameter to disable this check") would mean the monkey-patchers could still do their stuff, at the cost of this optimization. It's less likely to be surprising, because development would be done with read-only mode active, rather than "Okay, let's try this in optimized mode now - no asserts and read-only dicts... oh dear, it's not working". Big downside: Time machine policy prevents us from going back to 2.0 and implementing it there. There's going to be an even worse boundary when people upgrade to a Python with this active by default. So it's probably better to NOT make either half active by default, but to recommend that new projects be developed with read-only mode active. ChrisA

On Wed, May 21, 2014 at 2:57 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
* The sys module cannot be made read-only because modifying sys.stdout and sys.ps1 is a common use case.
I think this highlights the biggest concern with defaulting to read-only. Currently, most Python code won't reach into another module and change anything, but any program could monkey-patch any module at any time. You've noted that modifying sys's attributes is common enough to prevent its being made read-only; how do you know what else will be broken if this change goes through? For that reason, even though the read-only state would be the more common one, I would strongly recommend flagging those modules which _are_ read-only, rather than those which aren't. Then it becomes a documentable part of the module's interface: "This module will be frozen when Python is run in read-only mode". Setting that flag and then modifying your own state would be a mistake on par with using assert for crucial checks; monkey-patching someone else's read-only module makes your app incompatible with read-only mode. Any problems would come from use of *both* read-only mode *and* the __readonly__ flag, rather than unexpectedly cropping up when someone loads up a module from PyPI and it turns out to depend on mutability. Also, flagging the ones that have the changed behaviour means it's quite easy to get partial benefit from this, with no risk. In fact, you could probably turn this on for arbitrary Python programs, as long as only the standard library uses __readonly__; going the other way, having a single module that doesn't have the flag and requires mutability would prevent the whole app from being run in read-only mode. With that (rather big, and yet quite trivial) caveat, though: Looks interesting. Optimizing for the >99% of code that doesn't do weird things makes very good sense, just as long as the <1% can be catered for. ChrisA

2014-05-20 19:37 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
Hum, maybe my email was unclear: the read-only mode is disabled by default. When you enable the read-only mode, all modules are read-only except the modules explicitly configured to be modifiable. I don't have a strong opinion on this choice. We may only make modules read-only when the read-only mode is activated and the module is explicitly configured to be read-only. Another option is to have a list of modules which should be made read-only, configurable by the application.
Yeah, the whole stdlib doesn't need to be read-only to make an application faster. Victor

On 05/20/2014 01:32 PM, Victor Stinner wrote:
Ah, that's good.
Another option is to have a list of modules which should be made read-only, configurable by the application.
Or a list of modules that should remain read/write. As an application dev I should know which modules I am going to be modifying after initial load, so as long as I can easily add them to a read/write list I would be happy (especially when it came time to debug something). -- ~Ethan~

On Wed, May 21, 2014 at 6:32 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
There are two read-only states: 1) Is this application running in read-only mode? (You give an example of setting this by an env var.) 2) Is this module read-only? (You give an example of setting this to False.) It's the second one that I'm talking about. If, once you turn on read-only mode (the first state), every module is read-only except those marked __readonly__=False, you're going to have major backward incompatibility problems. All it takes is one single module that ought to be marked __readonly__=False and isn't, and read-only mode is broken. Yes, it may be that most of the standard library can be made read-only; but I believe it would still be better to explicitly say __readonly__=True on each of those modules, than __readonly__=False on the others - because of all the *non* stdlib modules. ChrisA

On Wed, May 21, 2014 at 03:37:42AM +1000, Chris Angelico wrote:
"99% of Python code doesn't do weird things..." It seems to me that this is a myth, or at least unjustifiable by the facts as we have seen it. Victor's experiment shows 25 modules from the standard library are modifiable, with 139 read-only. That's more like 15% than 1% "weird". I don't consider setting sys.ps1 and sys.stdout to be "weird", which is why Victor has to leave sys unlocked. Read-only by default would play havok with such simple idioms as global variables. (Globals may be considered harmful, but they're not considered "weird", and they're also more intuitive to many beginners than function arguments and return values. Strange but true.) As much as I wish to discourage people from using the global statement to rebind globals, I consider it completely unacceptable to have to teach beginners how to disable read-only mode before they've even mastered writing simple functions. I applaud Victor for his experiment, and would like to suggest a couple of approaches he might like to think about. I assume that read-only mode can be set on a per-module basis. * For simplicity, read-only mode is all-or-nothing on a per module basis. If the module is locked, so are the functions and classes defined by that module. If the module is not locked, neither are the functions and classes. (By locked, I mean Victor's read-only mode where globals and class attributes cannot be re-bound, etc.) * For backwards compatibility, any (eventual) production use of this would have to default to off. Perhaps in Python 4 or 5 we can consider defaulting to on. * Define an (optional) module global, say, __readonly__, which defaults to False. The module author must explicitly set it to True if they wish to lock the module in read-only mode. There's no way to enable the read-only optimizations by accident, you have to explicitly turn them on. * However there are ways to auto-detect when *not* to enable them. E.g. if a module uses the global statement in any function or method, read-only mode is disabled for that module. * Similarly, a Python switch to enable/disable read-only mode. I don't mind if the switch --enable-readonly is true by defalt, so long as individual modules default to unlocked. How about testing? It's very common, useful, and very much non-weird to reach into a module and monkey-patch it for the purposes of testing. I don't have a good solution to that, but a couple of stream of consciousness suggestions: - Would it help if there was a statement "import unlocked mymodule" that forces mymodule to remain unlocked rather than read-only? - Would it help if you could make a copy of a readonly module in an unlocked state? - Obviously the "best" (most obvious) solution would be if there was a way to unlock modules on the fly, but Victor suggests that's hard. -- Steven

On Wed, May 21, 2014 at 11:42 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Allow me to clarify. A module mutating its own globals is not at all weird; the only thing I'm calling weird is reaching into another module's globals and changing things. In a few rare cases (like sys.ps1 and sys.stdout), this is part of the documented interface of the module; but if (in a parallel universe) Python were designed such that this sort of thing is impossible, it wouldn't be illogical to have a "sys.set_ps1()" function, because the author(s) of the sys module *expect* ps1 to be changed. In contrast, the random module makes use of a bunch of stuff from math (importing them all with underscores, presumably to keep them out of "from random import *", although __all__ is also set), and it is NOT normal to reach in and change them. And before you say "Well, that has an underscore, of course you don't fiddle with it", other modules like imaplib will happily import without underscores - is it expected that you should be able to change imaplib.random to have it use a different random number generator? Or, for that matter, to replace some of its helper functions like imaplib.Int2AP? That, I think, would be considered weird. So there are 15% that change their own globals, which is fine. In this particular instance, we can't optimize for the whole of the 99%, but I maintain that the 15% is not all "weird" just because it's not optimizable. How many modules actually expect that their globals will be externally changed? ChrisA

Steven D'Aprano wrote:
Read-only by default would play havok with such simple idioms as global variables.
I don't see why there couldn't be a way to exempt selected names in a module from read-only status. An exemption could be inferred whenever a name is referenced by a 'global' statement. There should also be a way to explicitly mark a name as exempt, to take care of sys.stdout etc., and cases where the only mutations are done from a different module, so there is no global statement. For modules implemented in Python, the explicit marker could consist of a global statement at the top level, which is currently allowed but redundant. -- Greg

On 21 May 2014 11:48, "Steven D'Aprano" <steve@pearwood.info> wrote:
It also misses the big reason I am a Python programmer rather than a Java programmer. For me, Python is primarily an orchestration language. It is the language for the code that is telling everything else what to do. If my Python code is an overall performance bottleneck, then "Huzzah!", as it means I have finally engineered all the other structural bottlenecks out of the system. For this use case, monkey patching is not an incidental feature to be tolerated merely for backwards compatibility reasons: it is a key capability that makes Python an ideal language for me, as it takes ultimate control of what dependencies do away from the original author and places it in my hands as the system integrator. This is a dangerous power, not to be used lightly, but it also grants me the ability to work around critical bugs in dependencies at run time, rather than having to fork and patch the source the way Java developers tend to do. Victor's proposal is to make Python more complicated and a worse orchestration language, for the sake of making it a better applications programming language. In isolation, it might be possible to make that case, but in the presence of PyPy for a full dynamically optimised runtime and tools like Cython and Numba for selective optimisation within CPython, no. Regards, Nick.

Another +inf from me. Mind if I quote you on this next time I'm trying to convince C# developers to take Python seriously? :) Top-posted from my Windows Phone ________________________________ From: Ethan Furman<mailto:ethan@stoneleaf.us> Sent: 5/21/2014 7:37 To: python-ideas@python.org<mailto:python-ideas@python.org> Subject: Re: [Python-ideas] Make Python code read-only On 05/21/2014 02:43 AM, Nick Coghlan wrote:
+inf -- ~Ethan~ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On 22 May 2014 00:56, "Steve Dower" <Steve.Dower@microsoft.com> wrote:
Another +inf from me.
Mind if I quote you on this next time I'm trying to convince C#
developers to take Python seriously? :) Sure - I expect your conversations with C# devs resemble some of mine with Java devs :) Cheers, Nick.

2014-05-21 3:42 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:
- Obviously the "best" (most obvious) solution would be if there was a way to unlock modules on the fly, but Victor suggests that's hard.
The problem is to react to such event. In a function has a specialized version for a set of read-only objects, the specialized version should not be used anymore. Ok, I create a new branch "readonly_cb" branch where it is possible to make again modules, types and functions modifiable. *But* when the readonly state is modified, a callback is called. It can be used to disable optimizations relying on it. So all issues listed in this thread are away. It's possible again to use monkey-patching, lazy initialization of module variables and class variables, etc. I hope that such callback is enough to make optimizations efficient. Victor

An interesting idea. Comments below. On May 20, 2014 10:58 AM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
Make __readonly__ a data descriptor (getset in the C-API) on ModuleType, type, and FunctionType and people could toggle it as needed. The descriptor could look something like this (in pure Python): class ReadonlyDescriptor: DEFAULT = os.environ.get(b'PYTHONREADONLY', False) # i.e. ignore changes to PYTHONREADONLY def __init__(self, *, default=None): if default is None: default = cls.DEFAULT self.default = default def __get__(self, obj, cls): if obj is None: return self try: return obj.__dict__['__readonly__'] except KeyError: readonly = bool(self.default) obj.__dict__['__readonly__'] = readonly return readonly def __set__(self, obj, value): obj.__dict__['__readonly__'] = value Alternately, the object structs for the 3 types (e.g. PyModuleObject) could each grow a "readonly" field (or an extra flag option if there is an appropriate flag). The descriptor (in C) would use that instead of obj.__dict__['__readonly__']. However, I'd prefer going through __dict__. Either way, the 3 types would share a tp_setattro implementation that checked the read-only flag. That way there's no need to make sweeping changes to the 3 types, nor to the dict type. def __setattr__(self, name, value): if self.__readonly__: raise AttributeError('readonly') super().__setattr__(name, value) FWIW, the idea of a flag for read-only could be applied to objects in general, particularly in a future language addition. "__readonly__" is a good name for the flag so the precedent set by the three types in this proposal would be a good one.
Read-only by default would be backwards-incompatible, but having a commandline flag (and/or env var) to enable it would be useful. For classes a decorator could be nice, though it should wait until it was more obviously worth doing. I'm not sure it would matter for functions, though the same decorator would probably work.
With a data descriptor and __setattr__ like I described above, there is no need to make any changes to importlib.
+1
How big a problem would this be in practice?
With the data descriptor approach toggling read-only would work. Enabling/disabling optimizations at that point would depend on how they were implemented.
What do you mean by "lazy initialization of module variables"?
If read-only is only enforced via __setattr__ then the workaround is to bind the submodule directly via pkg.__dict__. -eric

2014-05-21 0:04 GMT+02:00 Eric Snow <ericsnowcurrently@gmail.com>:
In my PoC, I chose to modify directly the builtin type "dict". I don't think that I will keep this solution because I would prefer to not touch such critical Python type. I may use a subclass instead. I added a dict.setreadonly() method which can be used to make a dict read-only, but a read-only dict cannot be made modifiable again. I added a type.setreadonly() method which calls type.__dict__.setreadonly(). I did this to access the underlying dict, type.setreadonly() also works on builtin types like str. For example, str.__dict__ is a mappingproxy, not the real dictionary.
There is already a function.__readonly__ property (I just modified its name, it was called __modifiable__ before, the opposite). It is used to make a function read-only by importlib.
Are you sure that it's not possible to retrieve the underlying dictionary somehow? For example, functions have a func.__dict__ attribute.
Read-only by default would be backwards-incompatible, but having a commandline flag (and/or env var) to enable it would be useful.
My PoC had a PYTHONREADONLY env var to enable the read-only mode. I just added a -r command line option for the same purpose. It's disabled by default for backward compatibility. Only enable it if you want to try my optimizations :-)
I just pushed a change to make the classes read-only by default to make also nested classes read-only. I modified the builtin __build_class__ function for that. The decorator is called after the class is defined, it's too late. That's why I chose a class attribute.
I have no idea right now :)
Hum, I should try to use your descriptor. I'm not sure that it works for modules and classes. (Functions already have a __readonly__ property.)
To reduce the memory footprint, "large" precomputed tables of the base64 module are only filled at the first call of the function needing the tables. I also saw in other modules that a module is only imported the first time that is it loaded. Example: "def _lazy_import_sys(): global sys; import sys" and then "if sys is None: _lazy_import_sys(); # use sys".
I don't like the idea of an "almost" read-only module object. In one of my project, I would like to emit machine code. If a module is modified whereas the machine code relies on the module read-only property, Python may crash. Victor

2014-05-21 0:44 GMT+02:00 Greg Ewing <greg.ewing@canterbury.ac.nz>:
If the class is read-only and has a __slots__ class attribute, methods cannot be modified anymore. If you are able to get (compute) the type of an object, you can optimize the call to the method. Dummy example: --- chars=[] for ch in range(32, 126): chars.append(chr(ch)) print(''.join(chars)) --- Here you can guess that the type of chars in "chars.append" is list. The list.append() method is well known (and it is read-only, even if my global read-only mode is disabled, because list.append is a builtin type). You may inline the call in the body of the loop. Or you can at least move the lookup of the append method out of the loop. Victor

2014-05-21 2:00 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I don't want to optimize a single function, I want to optimize a whole application. If possible, I would prefer to not have to modify the application to run it faster. Numba plays very well with numbers and arrays, but I'm not sure that it is able to inline arbitrary Python function for example. Victor

On 21/05/14 02:16, Victor Stinner wrote:
I don't want to optimize a single function, I want to optimize a whole application.
Right. Even Java does not do that. (Hence the name 'Hotspot')
Numba will compile the Python overhead out of function calls, if that is what you mean. Numba will also accelerate Python objects (method calls and attribute access). LLVM knows how to do simple optimisations like function inlining. When a Python function is JIT compiled to LLVM bytecode by Numba, LLVM knows what to do with ut. If the function body is small enough, LLVM will inline it completely. Numba is still under development, so is might no be considered "production ready" yet. Currently it will give you performance comparable to -O2 in C for most algorithmic Python code. Sturla

On Tue, May 20, 2014 at 10:57 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
Issues with read-only code ==========================
Other things to consider: * reload() will no longer work (it loads into the existing module ns) * the module-replaces-self-in-sys-modules hack will be weird * class decorators that modify the class will no longer work * caching class attrs that are lazily set by instances will no longer work (similar to modules) * singletons stored on the class will break -eric

On May 20, 2014, at 12:57 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
I did two passes on read-only functionality for PyParallel. First attempt was similar to yours; I instrumented various core Python objects such that mutations could be detected against read-only objects (and subsequently raised as an exception). That didn’t pan out the way I wanted it to, especially in the PyParallel multiple-interpreter-threads-running-in-parallel environment. Second attempt: use memory protection. CPUs and OSes are really good at enforcing memory protection — leverage that. Don’t try and do it yourself in userspace. This worked much better. That work is described starting here: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite... Relevant bits of implementation: obmalloc.c: http://hg.python.org/sandbox/trent/rev/0e70a0caa1c0#l6.299 ceval.c: http://hg.python.org/sandbox/trent/rev/0e70a0caa1c0#l9.30 On POSIX you’d achieve the same affect via mprotect and a SIGSEV trap. Just FYI. Regards, Trent.

2014-06-05 14:02 GMT+02:00 Trent Nelson <trent@snakebite.org>:
My first attempt to "make the code read-only" was a big fail. Lot of errors and complains :-) I'm now moving to a different approach: "notify changes of the code". In PyParellel, you raise an error if something is modified. I don't need such restriction, I "just" want to disable optimizations if the code changed.
On POSIX you’d achieve the same affect via mprotect and a SIGSEV trap.
I don't think that relying on SIGSEGV is reliable :-( Such signal can be emitted for various reasons and you have to use sigsetjmp/siglongjmp which is unsafe: you cannot cleanup state when an error occurs. Or did you implement it differently? Victor

On Tue, May 20, 2014 at 06:57:53PM +0200, Victor Stinner wrote:
At least for me, this represents a material change to the philosophy of the language. While frowned upon, monkey patching is extremely useful while debugging, and occasionally in emergencies. :) Definitely not worth it for a few extra % IMHO David

On Tue, May 20, 2014 at 10:22 AM, <dw+python-ideas@hmmz.org> wrote:
I think part of the point was that this read-only mode would be entirely optional. One of the main reasons that I don't use Python for all of my projects is the speed issue, so anything that's a "free" speedup seems like a great thing. The main cost I can see here is in maintaining the readonly mode and the perhaps subtle bugs that would arise in many people's code when run in readonly mode. As an official feature, there would be a documentation and maintenance cost to the community, but I do think that there's substantial benefit, and especially as an opt-in feature, if the optimizations really speed things up, this seems quite useful. I guess the question is: How does this compare to other "drop-in" speedup solutions, like PyPy. Is it applicable to more existing code? Is it easier to apply? Does it provide a better speed increase? If there's a niche for it in one of those three areas and it's an opt-in system, I see the issue being a cost-benefit analysis of what is gained (whatever that niche is) vs. the maintenance cost in terms of bug reports etc. -Peter Mawhorter

On Wed, May 21, 2014 at 3:36 AM, Peter Mawhorter <pmawhorter@gmail.com> wrote:
Here's a stupid-crazy idea to chew on. (Fortunately this is not python-good-ideas@python.org - I wouldn't have much to contribute there!) Make the per-module flag opt-in-only, but the overall per-application flag active by default. Then, read-only mode applies to a small number of standard library modules (plus any user modules that specifically request it), and will thus be less surprising; and a small rewording of the error message (eg "... - run Python with the -O0 parameter to disable this check") would mean the monkey-patchers could still do their stuff, at the cost of this optimization. It's less likely to be surprising, because development would be done with read-only mode active, rather than "Okay, let's try this in optimized mode now - no asserts and read-only dicts... oh dear, it's not working". Big downside: Time machine policy prevents us from going back to 2.0 and implementing it there. There's going to be an even worse boundary when people upgrade to a Python with this active by default. So it's probably better to NOT make either half active by default, but to recommend that new projects be developed with read-only mode active. ChrisA

On Wed, May 21, 2014 at 2:57 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
* The sys module cannot be made read-only because modifying sys.stdout and sys.ps1 is a common use case.
I think this highlights the biggest concern with defaulting to read-only. Currently, most Python code won't reach into another module and change anything, but any program could monkey-patch any module at any time. You've noted that modifying sys's attributes is common enough to prevent its being made read-only; how do you know what else will be broken if this change goes through? For that reason, even though the read-only state would be the more common one, I would strongly recommend flagging those modules which _are_ read-only, rather than those which aren't. Then it becomes a documentable part of the module's interface: "This module will be frozen when Python is run in read-only mode". Setting that flag and then modifying your own state would be a mistake on par with using assert for crucial checks; monkey-patching someone else's read-only module makes your app incompatible with read-only mode. Any problems would come from use of *both* read-only mode *and* the __readonly__ flag, rather than unexpectedly cropping up when someone loads up a module from PyPI and it turns out to depend on mutability. Also, flagging the ones that have the changed behaviour means it's quite easy to get partial benefit from this, with no risk. In fact, you could probably turn this on for arbitrary Python programs, as long as only the standard library uses __readonly__; going the other way, having a single module that doesn't have the flag and requires mutability would prevent the whole app from being run in read-only mode. With that (rather big, and yet quite trivial) caveat, though: Looks interesting. Optimizing for the >99% of code that doesn't do weird things makes very good sense, just as long as the <1% can be catered for. ChrisA

2014-05-20 19:37 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
Hum, maybe my email was unclear: the read-only mode is disabled by default. When you enable the read-only mode, all modules are read-only except the modules explicitly configured to be modifiable. I don't have a strong opinion on this choice. We may only make modules read-only when the read-only mode is activated and the module is explicitly configured to be read-only. Another option is to have a list of modules which should be made read-only, configurable by the application.
Yeah, the whole stdlib doesn't need to be read-only to make an application faster. Victor

On 05/20/2014 01:32 PM, Victor Stinner wrote:
Ah, that's good.
Another option is to have a list of modules which should be made read-only, configurable by the application.
Or a list of modules that should remain read/write. As an application dev I should know which modules I am going to be modifying after initial load, so as long as I can easily add them to a read/write list I would be happy (especially when it came time to debug something). -- ~Ethan~

On Wed, May 21, 2014 at 6:32 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
There are two read-only states: 1) Is this application running in read-only mode? (You give an example of setting this by an env var.) 2) Is this module read-only? (You give an example of setting this to False.) It's the second one that I'm talking about. If, once you turn on read-only mode (the first state), every module is read-only except those marked __readonly__=False, you're going to have major backward incompatibility problems. All it takes is one single module that ought to be marked __readonly__=False and isn't, and read-only mode is broken. Yes, it may be that most of the standard library can be made read-only; but I believe it would still be better to explicitly say __readonly__=True on each of those modules, than __readonly__=False on the others - because of all the *non* stdlib modules. ChrisA

On Wed, May 21, 2014 at 03:37:42AM +1000, Chris Angelico wrote:
"99% of Python code doesn't do weird things..." It seems to me that this is a myth, or at least unjustifiable by the facts as we have seen it. Victor's experiment shows 25 modules from the standard library are modifiable, with 139 read-only. That's more like 15% than 1% "weird". I don't consider setting sys.ps1 and sys.stdout to be "weird", which is why Victor has to leave sys unlocked. Read-only by default would play havok with such simple idioms as global variables. (Globals may be considered harmful, but they're not considered "weird", and they're also more intuitive to many beginners than function arguments and return values. Strange but true.) As much as I wish to discourage people from using the global statement to rebind globals, I consider it completely unacceptable to have to teach beginners how to disable read-only mode before they've even mastered writing simple functions. I applaud Victor for his experiment, and would like to suggest a couple of approaches he might like to think about. I assume that read-only mode can be set on a per-module basis. * For simplicity, read-only mode is all-or-nothing on a per module basis. If the module is locked, so are the functions and classes defined by that module. If the module is not locked, neither are the functions and classes. (By locked, I mean Victor's read-only mode where globals and class attributes cannot be re-bound, etc.) * For backwards compatibility, any (eventual) production use of this would have to default to off. Perhaps in Python 4 or 5 we can consider defaulting to on. * Define an (optional) module global, say, __readonly__, which defaults to False. The module author must explicitly set it to True if they wish to lock the module in read-only mode. There's no way to enable the read-only optimizations by accident, you have to explicitly turn them on. * However there are ways to auto-detect when *not* to enable them. E.g. if a module uses the global statement in any function or method, read-only mode is disabled for that module. * Similarly, a Python switch to enable/disable read-only mode. I don't mind if the switch --enable-readonly is true by defalt, so long as individual modules default to unlocked. How about testing? It's very common, useful, and very much non-weird to reach into a module and monkey-patch it for the purposes of testing. I don't have a good solution to that, but a couple of stream of consciousness suggestions: - Would it help if there was a statement "import unlocked mymodule" that forces mymodule to remain unlocked rather than read-only? - Would it help if you could make a copy of a readonly module in an unlocked state? - Obviously the "best" (most obvious) solution would be if there was a way to unlock modules on the fly, but Victor suggests that's hard. -- Steven

On Wed, May 21, 2014 at 11:42 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Allow me to clarify. A module mutating its own globals is not at all weird; the only thing I'm calling weird is reaching into another module's globals and changing things. In a few rare cases (like sys.ps1 and sys.stdout), this is part of the documented interface of the module; but if (in a parallel universe) Python were designed such that this sort of thing is impossible, it wouldn't be illogical to have a "sys.set_ps1()" function, because the author(s) of the sys module *expect* ps1 to be changed. In contrast, the random module makes use of a bunch of stuff from math (importing them all with underscores, presumably to keep them out of "from random import *", although __all__ is also set), and it is NOT normal to reach in and change them. And before you say "Well, that has an underscore, of course you don't fiddle with it", other modules like imaplib will happily import without underscores - is it expected that you should be able to change imaplib.random to have it use a different random number generator? Or, for that matter, to replace some of its helper functions like imaplib.Int2AP? That, I think, would be considered weird. So there are 15% that change their own globals, which is fine. In this particular instance, we can't optimize for the whole of the 99%, but I maintain that the 15% is not all "weird" just because it's not optimizable. How many modules actually expect that their globals will be externally changed? ChrisA

Steven D'Aprano wrote:
Read-only by default would play havok with such simple idioms as global variables.
I don't see why there couldn't be a way to exempt selected names in a module from read-only status. An exemption could be inferred whenever a name is referenced by a 'global' statement. There should also be a way to explicitly mark a name as exempt, to take care of sys.stdout etc., and cases where the only mutations are done from a different module, so there is no global statement. For modules implemented in Python, the explicit marker could consist of a global statement at the top level, which is currently allowed but redundant. -- Greg

On 21 May 2014 11:48, "Steven D'Aprano" <steve@pearwood.info> wrote:
It also misses the big reason I am a Python programmer rather than a Java programmer. For me, Python is primarily an orchestration language. It is the language for the code that is telling everything else what to do. If my Python code is an overall performance bottleneck, then "Huzzah!", as it means I have finally engineered all the other structural bottlenecks out of the system. For this use case, monkey patching is not an incidental feature to be tolerated merely for backwards compatibility reasons: it is a key capability that makes Python an ideal language for me, as it takes ultimate control of what dependencies do away from the original author and places it in my hands as the system integrator. This is a dangerous power, not to be used lightly, but it also grants me the ability to work around critical bugs in dependencies at run time, rather than having to fork and patch the source the way Java developers tend to do. Victor's proposal is to make Python more complicated and a worse orchestration language, for the sake of making it a better applications programming language. In isolation, it might be possible to make that case, but in the presence of PyPy for a full dynamically optimised runtime and tools like Cython and Numba for selective optimisation within CPython, no. Regards, Nick.

Another +inf from me. Mind if I quote you on this next time I'm trying to convince C# developers to take Python seriously? :) Top-posted from my Windows Phone ________________________________ From: Ethan Furman<mailto:ethan@stoneleaf.us> Sent: 5/21/2014 7:37 To: python-ideas@python.org<mailto:python-ideas@python.org> Subject: Re: [Python-ideas] Make Python code read-only On 05/21/2014 02:43 AM, Nick Coghlan wrote:
+inf -- ~Ethan~ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On 22 May 2014 00:56, "Steve Dower" <Steve.Dower@microsoft.com> wrote:
Another +inf from me.
Mind if I quote you on this next time I'm trying to convince C#
developers to take Python seriously? :) Sure - I expect your conversations with C# devs resemble some of mine with Java devs :) Cheers, Nick.

2014-05-21 3:42 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:
- Obviously the "best" (most obvious) solution would be if there was a way to unlock modules on the fly, but Victor suggests that's hard.
The problem is to react to such event. In a function has a specialized version for a set of read-only objects, the specialized version should not be used anymore. Ok, I create a new branch "readonly_cb" branch where it is possible to make again modules, types and functions modifiable. *But* when the readonly state is modified, a callback is called. It can be used to disable optimizations relying on it. So all issues listed in this thread are away. It's possible again to use monkey-patching, lazy initialization of module variables and class variables, etc. I hope that such callback is enough to make optimizations efficient. Victor

An interesting idea. Comments below. On May 20, 2014 10:58 AM, "Victor Stinner" <victor.stinner@gmail.com> wrote:
Make __readonly__ a data descriptor (getset in the C-API) on ModuleType, type, and FunctionType and people could toggle it as needed. The descriptor could look something like this (in pure Python): class ReadonlyDescriptor: DEFAULT = os.environ.get(b'PYTHONREADONLY', False) # i.e. ignore changes to PYTHONREADONLY def __init__(self, *, default=None): if default is None: default = cls.DEFAULT self.default = default def __get__(self, obj, cls): if obj is None: return self try: return obj.__dict__['__readonly__'] except KeyError: readonly = bool(self.default) obj.__dict__['__readonly__'] = readonly return readonly def __set__(self, obj, value): obj.__dict__['__readonly__'] = value Alternately, the object structs for the 3 types (e.g. PyModuleObject) could each grow a "readonly" field (or an extra flag option if there is an appropriate flag). The descriptor (in C) would use that instead of obj.__dict__['__readonly__']. However, I'd prefer going through __dict__. Either way, the 3 types would share a tp_setattro implementation that checked the read-only flag. That way there's no need to make sweeping changes to the 3 types, nor to the dict type. def __setattr__(self, name, value): if self.__readonly__: raise AttributeError('readonly') super().__setattr__(name, value) FWIW, the idea of a flag for read-only could be applied to objects in general, particularly in a future language addition. "__readonly__" is a good name for the flag so the precedent set by the three types in this proposal would be a good one.
Read-only by default would be backwards-incompatible, but having a commandline flag (and/or env var) to enable it would be useful. For classes a decorator could be nice, though it should wait until it was more obviously worth doing. I'm not sure it would matter for functions, though the same decorator would probably work.
With a data descriptor and __setattr__ like I described above, there is no need to make any changes to importlib.
+1
How big a problem would this be in practice?
With the data descriptor approach toggling read-only would work. Enabling/disabling optimizations at that point would depend on how they were implemented.
What do you mean by "lazy initialization of module variables"?
If read-only is only enforced via __setattr__ then the workaround is to bind the submodule directly via pkg.__dict__. -eric

2014-05-21 0:04 GMT+02:00 Eric Snow <ericsnowcurrently@gmail.com>:
In my PoC, I chose to modify directly the builtin type "dict". I don't think that I will keep this solution because I would prefer to not touch such critical Python type. I may use a subclass instead. I added a dict.setreadonly() method which can be used to make a dict read-only, but a read-only dict cannot be made modifiable again. I added a type.setreadonly() method which calls type.__dict__.setreadonly(). I did this to access the underlying dict, type.setreadonly() also works on builtin types like str. For example, str.__dict__ is a mappingproxy, not the real dictionary.
There is already a function.__readonly__ property (I just modified its name, it was called __modifiable__ before, the opposite). It is used to make a function read-only by importlib.
Are you sure that it's not possible to retrieve the underlying dictionary somehow? For example, functions have a func.__dict__ attribute.
Read-only by default would be backwards-incompatible, but having a commandline flag (and/or env var) to enable it would be useful.
My PoC had a PYTHONREADONLY env var to enable the read-only mode. I just added a -r command line option for the same purpose. It's disabled by default for backward compatibility. Only enable it if you want to try my optimizations :-)
I just pushed a change to make the classes read-only by default to make also nested classes read-only. I modified the builtin __build_class__ function for that. The decorator is called after the class is defined, it's too late. That's why I chose a class attribute.
I have no idea right now :)
Hum, I should try to use your descriptor. I'm not sure that it works for modules and classes. (Functions already have a __readonly__ property.)
To reduce the memory footprint, "large" precomputed tables of the base64 module are only filled at the first call of the function needing the tables. I also saw in other modules that a module is only imported the first time that is it loaded. Example: "def _lazy_import_sys(): global sys; import sys" and then "if sys is None: _lazy_import_sys(); # use sys".
I don't like the idea of an "almost" read-only module object. In one of my project, I would like to emit machine code. If a module is modified whereas the machine code relies on the module read-only property, Python may crash. Victor

2014-05-21 0:44 GMT+02:00 Greg Ewing <greg.ewing@canterbury.ac.nz>:
If the class is read-only and has a __slots__ class attribute, methods cannot be modified anymore. If you are able to get (compute) the type of an object, you can optimize the call to the method. Dummy example: --- chars=[] for ch in range(32, 126): chars.append(chr(ch)) print(''.join(chars)) --- Here you can guess that the type of chars in "chars.append" is list. The list.append() method is well known (and it is read-only, even if my global read-only mode is disabled, because list.append is a builtin type). You may inline the call in the body of the loop. Or you can at least move the lookup of the append method out of the loop. Victor

2014-05-21 2:00 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I don't want to optimize a single function, I want to optimize a whole application. If possible, I would prefer to not have to modify the application to run it faster. Numba plays very well with numbers and arrays, but I'm not sure that it is able to inline arbitrary Python function for example. Victor

On 21/05/14 02:16, Victor Stinner wrote:
I don't want to optimize a single function, I want to optimize a whole application.
Right. Even Java does not do that. (Hence the name 'Hotspot')
Numba will compile the Python overhead out of function calls, if that is what you mean. Numba will also accelerate Python objects (method calls and attribute access). LLVM knows how to do simple optimisations like function inlining. When a Python function is JIT compiled to LLVM bytecode by Numba, LLVM knows what to do with ut. If the function body is small enough, LLVM will inline it completely. Numba is still under development, so is might no be considered "production ready" yet. Currently it will give you performance comparable to -O2 in C for most algorithmic Python code. Sturla

On Tue, May 20, 2014 at 10:57 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
Issues with read-only code ==========================
Other things to consider: * reload() will no longer work (it loads into the existing module ns) * the module-replaces-self-in-sys-modules hack will be weird * class decorators that modify the class will no longer work * caching class attrs that are lazily set by instances will no longer work (similar to modules) * singletons stored on the class will break -eric

On May 20, 2014, at 12:57 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
I did two passes on read-only functionality for PyParallel. First attempt was similar to yours; I instrumented various core Python objects such that mutations could be detected against read-only objects (and subsequently raised as an exception). That didn’t pan out the way I wanted it to, especially in the PyParallel multiple-interpreter-threads-running-in-parallel environment. Second attempt: use memory protection. CPUs and OSes are really good at enforcing memory protection — leverage that. Don’t try and do it yourself in userspace. This worked much better. That work is described starting here: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite... Relevant bits of implementation: obmalloc.c: http://hg.python.org/sandbox/trent/rev/0e70a0caa1c0#l6.299 ceval.c: http://hg.python.org/sandbox/trent/rev/0e70a0caa1c0#l9.30 On POSIX you’d achieve the same affect via mprotect and a SIGSEV trap. Just FYI. Regards, Trent.

2014-06-05 14:02 GMT+02:00 Trent Nelson <trent@snakebite.org>:
My first attempt to "make the code read-only" was a big fail. Lot of errors and complains :-) I'm now moving to a different approach: "notify changes of the code". In PyParellel, you raise an error if something is modified. I don't need such restriction, I "just" want to disable optimizations if the code changed.
On POSIX you’d achieve the same affect via mprotect and a SIGSEV trap.
I don't think that relying on SIGSEGV is reliable :-( Such signal can be emitted for various reasons and you have to use sigsetjmp/siglongjmp which is unsafe: you cannot cleanup state when an error occurs. Or did you implement it differently? Victor
participants (14)
-
Charles-François Natali
-
Chris Angelico
-
dw+python-ideas@hmmz.org
-
Eric Snow
-
Ethan Furman
-
Greg Ewing
-
Nick Coghlan
-
Peter Mawhorter
-
Steve Dower
-
Steven D'Aprano
-
Sturla Molden
-
Trent Nelson
-
Victor Stinner
-
Wolfgang Maier