From ncoghlan at gmail.com Sat Jan 1 01:43:48 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 1 Jan 2011 10:43:48 +1000 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: On Sat, Jan 1, 2011 at 7:51 AM, Guido van Rossum wrote: > and of course for more fun you can make it more dynamic (think > obfuscated code contests). Not to mention the champions of obfuscation for CPython: doing the same things from an extension module, or by using ctypes to invoke the C API (although such mechanisms are obviously outside the language definition itself, they're still technically legal for non-portable CPython code) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sat Jan 1 01:50:09 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 1 Jan 2011 10:50:09 +1000 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: Message-ID: On Sat, Jan 1, 2011 at 4:49 AM, Guido van Rossum wrote: > (FWIW, optimizing "x[i] = i" would be much simpler -- I don't really > care about the argument that a debugger might interfere. But again, > apart from the simplest cases, it requires a sophisticated parser to > determine that it is really safe to do so.) Back on topic, we've certainly made much bigger bytecode changes that would appear differently in a debugger. Collapsing most of the with statement entry overhead into the single SETUP_WITH opcode is the biggest recent(-ish) example that comes to mind. A more general peephole optimisation that picks up a repeated load operation in a sequence of load commands and replaces it with a single load and some stack rotations may be feasible, but I'm not entirely sure that would actually be an optimisation (especially for LOAD_FAST) - reordering the stack may be slower than the load operation. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Sat Jan 1 02:17:20 2011 From: guido at python.org (Guido van Rossum) Date: Fri, 31 Dec 2010 17:17:20 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: On Fri, Dec 31, 2010 at 4:43 PM, Nick Coghlan wrote: > On Sat, Jan 1, 2011 at 7:51 AM, Guido van Rossum wrote: >> and of course for more fun you can make it more dynamic (think >> obfuscated code contests). > > Not to mention the champions of obfuscation for CPython: doing the > same things from an extension module, or by using ctypes to invoke the > C API (although such mechanisms are obviously outside the language > definition itself, they're still technically legal for non-portable > CPython code) Hm. I wouldn't even call such things "legal" -- rather accidents of the implementation. If someone depended on such an effect, and we changed things to make that no longer work, good luck arguing that we violated a compatibility promise. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Sat Jan 1 02:52:54 2011 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 01 Jan 2011 12:52:54 +1100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: Message-ID: <4D1E88F6.9000701@pearwood.info> Guido van Rossum wrote: > [Changed subject *and* list] > >> 2010/12/31 Maciej Fijalkowski >>> How do you know that range is a builtin you're thinking >>> about and not some other object? > > On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro > wrote: >> By a special opcode which could do this work. ]:-) > > That can't be the answer, because then the question would become "how > does the compiler know it can use the special opcode". This particular > issue (generating special opcodes for certain builtins) has actually > been discussed many times before. Alas, given Python's extremely > dynamic promises it is very hard to do it in a way that is > *guaranteed* not to change the semantics. Just tossing ideas out here... pardon me if they've been discussed before, but I read the three PEPs you mentioned later (266, 267 and 280) and they didn't cover any of this. I wonder whether we need to make that guarantee? Perhaps we should distinguish between "safe" optimizations, like constant folding which can't change behaviour, and "unsafe" optimizations which can go wrong under (presumably) rare circumstances. The compiler can continue to apply whatever safe optimizations it likes, but unsafe optimizations must be explicitly asked for by the user. If subtle or not subtle bugs occur, well, Python does allow people to shoot themselves in the foot. There's precedence for this. Both -O and -OO optimization switches potentially change behaviour. -O *should* be safe if code only uses asserts for assertions, but many people (especially beginners) use assert for input checking. If their code breaks under -O they have nobody to blame but themselves. Might we not say that -OO will optimize access to builtins, and if things break, the solution is not to use -OO? [...] > Now, *in practice* such manipulations are rare (with the possible > exception of people replacing open() with something providing hooks > for e.g. a virtual filesystem) and there is probably some benefit to > be had. (I expect that the biggest benefit might well be from > replacing len() with an opcode.) I have in the past proposed to change > the official semantics of the language subtly to allow such > optimizations (i.e. recognizing builtins and replacing them with > dedicated opcodes). There should also be a simple way to disable them, > e.g. by setting "len = len" at the top of a module, one would be > signalling that len() is not to be replaced by an opcode. But it > remains messy and nobody has really gotten very far with implementing > this. It is certainly not "low-hanging fruit" to do it properly. Here's another thought... suppose (say) "builtin" became a reserved word. builtin.range (for example) would always refer to the built-in range, and could be optimized by the compiler. It wouldn't do much for the general case of wanting to optimize non-built-in globals, but this could be optimized safely: def f(): for i in builtin.range(10): builtin.print(i) while this would keep the current semantics: def f(): for i in range(10): print(i) -- Steven From cesare.di.mauro at gmail.com Sat Jan 1 10:32:30 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Sat, 1 Jan 2011 10:32:30 +0100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: Message-ID: 2010/12/31 Guido van Rossum > [Changed subject *and* list] > > > 2010/12/31 Maciej Fijalkowski > >> How do you know that range is a builtin you're thinking > >> about and not some other object? > > On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro > wrote: > > By a special opcode which could do this work. ]:-) > > That can't be the answer, because then the question would become "how > does the compiler know it can use the special opcode". This particular > issue (generating special opcodes for certain builtins) has actually > been discussed many times before. Alas, given Python's extremely > dynamic promises it is very hard to do it in a way that is > *guaranteed* not to change the semantics. For example, I could have > replaced builtins['range'] with something else; or I could have > inserted a variable named 'range' into the module's __dict__. (Note > that I am not talking about just creating a global variable named > 'range' in the module; those the compiler could recognize. I am > talking about interceptions that a compiler cannot see, assuming it > compiles each module independently, i.e. without whole-program > optimizations.) > Yes, I know it, but the special opcode which I was talking about has a very different usage. The primary goal was to speed-up fors, generating specialized code when the proper range builtin is found at runtime, and it's convenient to have such optimized code. As you stated, the compiler don't know if range is a builtin until runtime (at the precise moment of for execution), so it'll generate two different code paths. The function's bytecode will look like that: 0 SETUP_LOOP 62 2 JUMP_IF_BUILTIN_OR_LOAD_GLOBAL 'range', 40 # Usual, slow, code starts here 40 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack 44 LOAD_FAST_MORE_TIMES x, 5 # Loads x 5 times on the stack 46 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack 48 STACK_ZIP 3, 5 # "zips" 3 sequences of 5 elements each on the stack 52 STORE_SUBSCR 54 STORE_SUBSCR 56 STORE_SUBSCR 58 STORE_SUBSCR 60 STORE_SUBSCR 62 POP_BLOCK 64 RETURN_CONST 'None' It's just an example; cde can be different based on the compiler optimizations and opcodes available in the virtual machine. The most important thing is that the semantic will be preserved (I never intended to drop it! ;) > Now, *in practice* such manipulations are rare (with the possible > exception of people replacing open() with something providing hooks > for e.g. a virtual filesystem) and there is probably some benefit to > be had. (I expect that the biggest benefit might well be from > replacing len() with an opcode.) I have in the past proposed to change > the official semantics of the language subtly to allow such > optimizations (i.e. recognizing builtins and replacing them with > dedicated opcodes). There should also be a simple way to disable them, > e.g. by setting "len = len" at the top of a module, one would be > signalling that len() is not to be replaced by an opcode. But it > remains messy and nobody has really gotten very far with implementing > this. It is certainly not "low-hanging fruit" to do it properly. > > I should also refer people interested in this subject to at least > three PEPs that were written about this topic: PEP 266, PEP 267 and > PEP 280. All three have been deferred, since nobody was bold enough to > implement at least one of them well enough to be able to tell if it > was even worth the trouble. I read them, and they are interesting, but my case is quite different. > I haven't read either of those in a long > time, and they may well be outdated by current JIT technology. I just > want to warn folks that it's not such a simple matter to replace "for > i in range(....):" with a special opcode. > May be trying to optimize the current non-JITed Python implementation is a death binary. JITs are evolving so much that all the things we have discussed here are already took into account. (FWIW, optimizing "x[i] = i" would be much simpler -- I don't really > care about the argument that a debugger might interfere. But again, > apart from the simplest cases, it requires a sophisticated parser to > determine that it is really safe to do so.) > > -- > --Guido van Rossum (python.org/~guido) > It depends strictly by the goals we want to reach. A more advanced parser with a simple type-inference system can be implemented without requiring a complete parser rebuild, and can give good (albeit not optimal) results. At least lists, dictionaries, tuples, and sets operations, which are very common on Python, can benefit; something for ints, doubles and complexes can be done, also. But looking at the JITs it can be just lost time... Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From cesare.di.mauro at gmail.com Sat Jan 1 10:52:37 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Sat, 1 Jan 2011 10:52:37 +0100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: <4D1E88F6.9000701@pearwood.info> References: <4D1E88F6.9000701@pearwood.info> Message-ID: 2011/1/1 Steven D'Aprano > I wonder whether we need to make that guarantee? Perhaps we should > distinguish between "safe" optimizations, like constant folding which can't > change behaviour, and "unsafe" optimizations which can go wrong under > (presumably) rare circumstances. The compiler can continue to apply whatever > safe optimizations it likes, but unsafe optimizations must be explicitly > asked for by the user. If subtle or not subtle bugs occur, well, Python does > allow people to shoot themselves in the foot. > Do we consider local variable removing (due to internal optimizations) a safe or unsafe operation? Do we consider local variable values "untouchable"? Think about a locals() call that return a list for a variable; lists are mutable objects, so they can be changed by the caller, but the internally generated bytecode can work on a "private" (on stack) copy which doesn't "see" the changes made due to the locals() call. Also, there's the tracing to consider. When trace is enabled, the "handler" cannot find some variables due to some optimizations. Another funny thing that can happen is that if I "group together" some assignment operations into a single, "multiassignment", one (it's another optimization I was thinking about from long time) and you are tracing it, only one tracing event will be generated instead of n. Are such optimizations "legal" / "safe"? For me the answer is yes, because I think that they must be implementation-specific. > > Now, *in practice* such manipulations are rare (with the possible >> exception of people replacing open() with something providing hooks >> for e.g. a virtual filesystem) and there is probably some benefit to >> be had. (I expect that the biggest benefit might well be from >> replacing len() with an opcode.) I have in the past proposed to change >> the official semantics of the language subtly to allow such >> optimizations (i.e. recognizing builtins and replacing them with >> dedicated opcodes). There should also be a simple way to disable them, >> e.g. by setting "len = len" at the top of a module, one would be >> signalling that len() is not to be replaced by an opcode. But it >> remains messy and nobody has really gotten very far with implementing >> this. It is certainly not "low-hanging fruit" to do it properly. >> > > Here's another thought... suppose (say) "builtin" became a reserved word. > builtin.range (for example) would always refer to the built-in range, and > could be optimized by the compiler. It wouldn't do much for the general case > of wanting to optimize non-built-in globals, but this could be optimized > safely: > > def f(): > for i in builtin.range(10): builtin.print(i) > > while this would keep the current semantics: > > > def f(): > for i in range(10): print(i) > > -- > Steven I think that it's not needed. Optimizations must stay behind the scene. We can speedup the code which makes use of builtins without resorting to language changes. JITs already do this, but some ways are possible even on non-JITed VMs. However, they require a longer parse / compile time, which can undesirable. Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrts.pydev at gmail.com Sat Jan 1 13:57:12 2011 From: mrts.pydev at gmail.com (=?ISO-8859-1?Q?Mart_S=F5mermaa?=) Date: Sat, 1 Jan 2011 14:57:12 +0200 Subject: [Python-ideas] object construction (was: Re: ML Style Pattern Matching for Python) In-Reply-To: <20101219204424.3630e62e@o> References: <201012180021.37232.eike.welk@gmx.net> <201012181223.46790.eike.welk@gmx.net> <4D0D59EF.3070900@pearwood.info> <201012191952.30430.eike.welk@gmx.net> <20101219204424.3630e62e@o> Message-ID: On Sun, Dec 19, 2010 at 9:44 PM, spir wrote: > I find those loads of "self.x=x" in constructors sooo stupid --I want the machine to do it for me. __init__ should only define the essential part of obj construction; while the final constructor would do some mechanical job in addition. Automating that is quite easy with keyword arguments: >>> class Foo(object): ... def __init__(self, **kwargs): ... self.__dict__.update(kwargs) ... >>> f = Foo(a=1, b=2) >>> f.a 1 >>> f.b 2 If you want to play safe, filter out keys that start with '__'. Best regards and a happy new year! Mart S?mermaa From denis.spir at gmail.com Sat Jan 1 15:43:21 2011 From: denis.spir at gmail.com (spir) Date: Sat, 1 Jan 2011 15:43:21 +0100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: <4D1E88F6.9000701@pearwood.info> References: <4D1E88F6.9000701@pearwood.info> Message-ID: <20110101154321.5aadde38@o> On Sat, 01 Jan 2011 12:52:54 +1100 Steven D'Aprano wrote: > > Now, *in practice* such manipulations are rare (with the possible > > exception of people replacing open() with something providing hooks > > for e.g. a virtual filesystem) and there is probably some benefit to > > be had. (I expect that the biggest benefit might well be from > > replacing len() with an opcode.) I have in the past proposed to change > > the official semantics of the language subtly to allow such > > optimizations (i.e. recognizing builtins and replacing them with > > dedicated opcodes). There should also be a simple way to disable them, > > e.g. by setting "len = len" at the top of a module, one would be > > signalling that len() is not to be replaced by an opcode. But it > > remains messy and nobody has really gotten very far with implementing > > this. It is certainly not "low-hanging fruit" to do it properly. > > Here's another thought... suppose (say) "builtin" became a reserved > word. builtin.range (for example) would always refer to the built-in > range, and could be optimized by the compiler. It wouldn't do much for > the general case of wanting to optimize non-built-in globals, but this > could be optimized safely: > > def f(): > for i in builtin.range(10): builtin.print(i) > > while this would keep the current semantics: > > def f(): > for i in range(10): print(i) I had a similar question in a different context (non-python). The point was to prevent core semantic changes in a "pedagogic" mode, such as the sense of an operator on a builtin type. Eg have Real.sum 'untouchable' so that 1.1+2.2 returns what is expected. Instead of protected kywords, my thought went toward read-only (immutable?) containers, where 'container' is a very lose notion including types and scopes that hold them (and even individual objects that can be generated without type). Denis -- -- -- -- -- -- -- vit esse estrany ? spir.wikidot.com From guido at python.org Sat Jan 1 17:41:35 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Jan 2011 08:41:35 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: On Sat, Jan 1, 2011 at 5:11 AM, Maciej Fijalkowski wrote: > On Sat, Jan 1, 2011 at 11:32 AM, Cesare Di Mauro wrote: >> Yes, I know it, but the special opcode which I was talking about has a very >> different usage. >> The primary goal was to speed-up fors, generating specialized code when the >> proper range builtin is found at runtime, and it's convenient to have such >> optimized code. >> As you stated, the compiler don't know if range is a builtin until runtime >> (at the precise moment of for execution), so it'll generate two different >> code paths. The function's bytecode will look like that: >> 0 SETUP_LOOP 62 >> 2 JUMP_IF_BUILTIN_OR_LOAD_GLOBAL 'range', 40 >> # Usual, slow, code starts here >> 40 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack >> 44 LOAD_FAST_MORE_TIMES x, 5 # Loads x 5 times on the stack >> 46 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack >> 48 STACK_ZIP 3, 5 # "zips" 3 sequences of 5 elements each on the stack >> 52 STORE_SUBSCR >> 54 STORE_SUBSCR >> 56 STORE_SUBSCR >> 58 STORE_SUBSCR >> 60 STORE_SUBSCR >> 62 POP_BLOCK >> 64 RETURN_CONST 'None' >> It's just an example; cde can be different based on the compiler >> optimizations and opcodes available in the virtual machine. >> The most important thing is that the semantic will be preserved (I never >> intended to drop it! ;) > > The thing is, having a JIT, this all is completely trivial (as well as > bunch of other stuff like avoiding allocating ints at all). Right. That's a much saner solution than trying to generate bulky bytecode as Cesare proposed. The advantage of a JIT is also that it allows doing these optimizations only in those places where it matters. In general I am not much in favor of trying to optimize Python's bytecode. I prefer the bytecode to be dead simple. This probably makes it an easy target for CS majors interested in code generation, and it probably is a great exercise trying to do something like that, but let's please not confuse that with actual speed improvements to Python -- those come from careful observation (& instrumentation) of real programs, not from looking at toy bytecode samples. (Most of the bytecode improvements that actually made a difference were done in the first 5 years of Python's existence.) > Generating two different code paths has a tendency to lead to code > explosion (even exponential if you're not careful enough), which has > it's own set of problems (including cache locality, because code > executed is no longer a small continuous chunk of memory). What we > (PyPy) do, is to compile only the common path (using JIT) and then > have unlikely path fall back to the interpreter. This generally solves > all of nasty problems you can possibly encounter. Great observation! -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Jan 1 17:50:08 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Jan 2011 08:50:08 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: <4D1E88F6.9000701@pearwood.info> References: <4D1E88F6.9000701@pearwood.info> Message-ID: On Fri, Dec 31, 2010 at 5:52 PM, Steven D'Aprano wrote: > Guido van Rossum wrote: >> That can't be the answer, because then the question would become "how >> does the compiler know it can use the special opcode". This particular >> issue (generating special opcodes for certain builtins) has actually >> been discussed many times before. Alas, given Python's extremely >> dynamic promises it is very hard to do it in a way that is >> *guaranteed* not to change the semantics. > > Just tossing ideas out here... pardon me if they've been discussed before, > but I read the three PEPs you mentioned later (266, 267 and 280) and they > didn't cover any of this. > > I wonder whether we need to make that guarantee? Perhaps we should > distinguish between "safe" optimizations, like constant folding which can't > change behaviour, (Though notice that our historic track record indicates that they are very dangerous -- we've introduced subtle bugs several times in "trivial" constant folding optimizations.) > and "unsafe" optimizations which can go wrong under > (presumably) rare circumstances. The compiler can continue to apply whatever > safe optimizations it likes, but unsafe optimizations must be explicitly > asked for by the user. If subtle or not subtle bugs occur, well, Python does > allow people to shoot themselves in the foot. > > There's precedence for this. Both -O and -OO optimization switches > potentially change behaviour. -O *should* be safe if code only uses asserts > for assertions, but many people (especially beginners) use assert for input > checking. If their code breaks under -O they have nobody to blame but > themselves. Might we not say that -OO will optimize access to builtins, and > if things break, the solution is not to use -OO? Maybe. But that means it will probably rarely be used -- realistically, who uses -O or -OO? I don't ever. Even so, there would have to be a way to turn the optimization off even under -OO for a particular module or function or code location, or for a particular builtin (again, open() comes to mind). > Here's another thought... suppose (say) "builtin" became a reserved word. > builtin.range (for example) would always refer to the built-in range, and > could be optimized by the compiler. It wouldn't do much for the general case > of wanting to optimize non-built-in globals, but this could be optimized > safely: > > def f(): > ? ?for i in builtin.range(10): builtin.print(i) > > while this would keep the current semantics: > > def f(): > ? ?for i in range(10): print(i) That defaults the wrong way. You want the optimization to work (if the compiler does not see explicit reasons to avoid it) except in rare cases where the compiler cannot know that you're dynamically modifying the ennvironment (globals or builtins). Also I would very much worry that people would start putting this in everywhere out of a mistaken defensive attitude. (Like what has happened to certain micro-optimization patterns, which are being overused, making code less readable.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Jan 1 17:59:57 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Jan 2011 08:59:57 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: <4D1E88F6.9000701@pearwood.info> Message-ID: On Sat, Jan 1, 2011 at 1:52 AM, Cesare Di Mauro wrote: > 2011/1/1 Steven D'Aprano >> >> I wonder whether we need to make that guarantee? Perhaps we should >> distinguish between "safe" optimizations, like constant folding which can't >> change behaviour, and "unsafe" optimizations which can go wrong under >> (presumably) rare circumstances. The compiler can continue to apply whatever >> safe optimizations it likes, but unsafe optimizations must be explicitly >> asked for by the user. If subtle or not subtle bugs occur, well, Python does >> allow people to shoot themselves in the foot. > > Do we consider local variable removing (due to internal optimizations) a > safe or unsafe operation? I would consider it safe unless the function locals() is called directly in the function -- always assuming the compiler does not see obvious other evidence (like a local being used by a nested function). Locals are "special" in many ways already. There should be a way to disable this (globally) in case you want to step through the code with a debugger though -- IDEs like WingIDE and PyCharm make stepping through code very easy to set up, and variable inspection is a big part of the process of debugging this way. It's probably fine if such optimizations are only enabled by -O. Still, I wonder if it isn't much better to try to do this using a JIT instead of messing with the bytecode. You'll find that the current compiler just really doesn't have the datastructures needed to do these kind of optimizations right. > Do we consider local variable values "untouchable"? Think about a locals() > call that return a list for a variable; lists are mutable objects, so they > can be changed by the caller, but the internally generated bytecode can work > on a "private" (on stack) copy which doesn't "see" the changes made due to > the locals() call. Are you sure? locals() makes only a shallow copy, so changes to the list's contents made via the list returned by locals() should be completely visible by the bytecode. > Also, there's the tracing to consider. When trace is enabled, the "handler" > cannot find some variables due to some optimizations. Tracing is a special case of debugging. > Another funny thing that can happen is that if I "group together" some > assignment operations into a single, "multiassignment", one (it's another > optimization I was thinking about from long time) and you are tracing it, > only one tracing event will be generated instead of n. > Are such optimizations "legal" / "safe"? For me the answer is yes, because I > think that they must be implementation-specific. I've traced through C code generated by gcc with an optimization setting. It can be a bit confusing to be jumping around in the optimized code, and it's definitely easier to trace through unoptimized code, but if you have the choice between tracing the (optimized) binary you have, or not tracing at all, I'll take what I can get. Still, when you're planning to trace/debug it's better to have a flag to disable it, and not using -O sounds like the right thing to me. -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sat Jan 1 21:30:42 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 01 Jan 2011 15:30:42 -0500 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: On 1/1/2011 11:41 AM, Guido van Rossum wrote: > In general I am not much in favor of trying to optimize Python's > bytecode. I prefer the bytecode to be dead simple. I think people constantly underestimate the virtue of Python and CPython simplicity. Projects that depend on a couple of genius ubergeeks die when the ubergeeks leave. The executable-pseudocode simplicity of the language makes it a favorite for scientific programming, spilling over into financial programming. The simplicity of the code allows competent students (and non-CS major adults) become developers. -- Terry Jan Reedy From benjamin at python.org Sat Jan 1 21:35:28 2011 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 1 Jan 2011 20:35:28 +0000 (UTC) Subject: [Python-ideas] Optimizing builtins References:

Message-ID: Guido van Rossum writes: > The compiler has no way to notice this when a.py is being compiled. You could still optimize it if you insert a runtime "guard" before the range usage and see if its been overridden. From guido at python.org Sat Jan 1 21:37:05 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Jan 2011 12:37:05 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: On Sat, Jan 1, 2011 at 12:30 PM, Terry Reedy wrote: > On 1/1/2011 11:41 AM, Guido van Rossum wrote: > >> In general I am not much in favor of trying to optimize Python's >> bytecode. I prefer the bytecode to be dead simple. > > I think people constantly underestimate the virtue of Python and CPython > simplicity. Projects that depend on a couple of genius ubergeeks die when > the ubergeeks leave. The executable-pseudocode simplicity of the language > makes it a favorite for scientific programming, spilling over into financial > programming. The simplicity of the code allows competent students (and > non-CS major adults) become developers. And, of course, the (relative) simplicity of the implementation will always draw CS students looking for compiler optimization projects (just as the simplicity of the language draws CS students looking to write a complete compiler). But it's one thing to get a degree out of some clever optimization; it's another thing to actually make it stick in the context of CPython, with the concerns you mention (and others I only have in my guts :-). -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Jan 1 21:37:59 2011 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Jan 2011 12:37:59 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: On Sat, Jan 1, 2011 at 12:35 PM, Benjamin Peterson wrote: > Guido van Rossum writes: >> The compiler has no way to notice this when a.py is being compiled. > > You could still optimize it if you insert a runtime "guard" before the range > usage and see if its been overridden. Yeah, that was Cesare's idea. I think that's a great strategy for a JIT compiler, but not appropriate for bytecode (IMO). -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sat Jan 1 23:16:16 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 01 Jan 2011 17:16:16 -0500 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: On 1/1/2011 3:37 PM, Guido van Rossum wrote: > And, of course, the (relative) simplicity of the implementation will > always draw CS students looking for compiler optimization projects And, ironically, slightly reduce the simplicity that attracted them. No one thinks that their straw will break the camel's back (or cause him to drop to his knees), and they are usually right. But when the camel sags, all added straws are equally responsible. > (just as the simplicity of the language draws CS students looking to > write a complete compiler). But it's one thing to get a degree out of > some clever optimization; it's another thing to actually make it stick > in the context of CPython, with the concerns you mention (and others I > only have in my guts :-). For one thing, you have your eye on the camel ;-). And your current job keep you grounded in the needs of real code. (In a current python-list discussion, someone demonstrated with timeit that in late 2.x, each iteration of 'while 1: pass' takes about a microsecond less than for 'while True: pass'. The reason for that, and the disappearance of the difference in 3.x is mildly interesting, but the practical import for any real code that does anything inside the loop is essentially 0.) -- Terry Jan Reedy From cesare.di.mauro at gmail.com Sun Jan 2 07:07:21 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Sun, 2 Jan 2011 07:07:21 +0100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: 2011/1/1 Guido van Rossum > Right. That's a much saner solution than trying to generate bulky > bytecode as Cesare proposed. The advantage of a JIT is also that it > allows doing these optimizations only in those places where it > matters. > > In general I am not much in favor of trying to optimize Python's > bytecode. I prefer the bytecode to be dead simple. If Python direction is to embrace some JIT technology, I fully agree with you: it is best to make VM & compiler simpler. Anyway, and as I already said before, mine were just examples of possible things that can happen with optimizations. > This probably makes > it an easy target for CS majors interested in code generation, and it > probably is a great exercise trying to do something like that, but > let's please not confuse that with actual speed improvements to Python > -- those come from careful observation (& instrumentation) of real > programs, not from looking at toy bytecode samples. (Most of the > bytecode improvements that actually made a difference were done in the > first 5 years of Python's existence.) > > --Guido van Rossum (python.org/~guido) > But research never stops. SETUP_WITH is just a recent example. Also, sometimes completely different ideas can bring some innovation. ;) Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From cesare.di.mauro at gmail.com Sun Jan 2 07:22:36 2011 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Sun, 2 Jan 2011 07:22:36 +0100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: <4D1E88F6.9000701@pearwood.info>

Message-ID: 2011/1/1 Guido van Rossum > On Sat, Jan 1, 2011 at 1:52 AM, Cesare Di Mauro > > Do we consider local variable removing (due to internal optimizations) a > > safe or unsafe operation? > > I would consider it safe unless the function locals() is called > directly in the function -- always assuming the compiler does not see > obvious other evidence (like a local being used by a nested function). > The problem here is that locals is a builtin function, not a keyword, so the compiler must resort to something like the "code fork" that I showed before, if we want to keep the correct language semantic. > Still, I wonder if it isn't much better to try to do this using a JIT > instead of messing with the bytecode. Ditto for me, if a good (and not resource hungry) JIT will come. > You'll find that the current > compiler just really doesn't have the datastructures needed to do > these kind of optimizations right. > Right. Not now, but something can be made if and only if it makes sense. A JIT can make it non-sense, of course. > Do we consider local variable values "untouchable"? Think about a locals() > > call that return a list for a variable; lists are mutable objects, so > they > > can be changed by the caller, but the internally generated bytecode can > work > > on a "private" (on stack) copy which doesn't "see" the changes made due > to > > the locals() call. > > Are you sure? locals() makes only a shallow copy, so changes to the > list's contents made via the list returned by locals() should be > completely visible by the bytecode. > > --Guido van Rossum (python.org/~guido) > Nice to know it. Reading from the doc ( http://docs.python.org/library/functions.html#locals ) it was not clear for me. Thanks. Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sun Jan 2 10:10:51 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 02 Jan 2011 10:10:51 +0100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: Benjamin Peterson, 01.01.2011 21:35: > Guido van Rossum writes: >> The compiler has no way to notice this when a.py is being compiled. > > You could still optimize it if you insert a runtime "guard" before the range > usage and see if its been overridden. The problem here is that you wouldn't save the lookup. So you'd still pay a high price to find out that the builtin has not been overridden. There can be substantial savings for builtins that can be optimised away or replaced by a tighter/adapted implementation. We do that a lot in Cython where builtins are (by default) considered static unless redefined inside of the module. An important example are generator expressions like "any(genexpr)". If the function was known to be builtin at compile time, CPython could generate much simpler byte code for these, dropping the need for a generator and its closure. But as long as you have to check for an override at each call, you end up with the duplicated code (optimised and fall-back version) and an increased entry overhead that may well kill the savings. Stefan From stefan_ml at behnel.de Sun Jan 2 10:38:06 2011 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 02 Jan 2011 10:38:06 +0100 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: <4D1E88F6.9000701@pearwood.info> Message-ID: Guido van Rossum, 01.01.2011 17:50: > On Fri, Dec 31, 2010 at 5:52 PM, Steven D'Aprano wrote: >> and "unsafe" optimizations which can go wrong under >> (presumably) rare circumstances. The compiler can continue to apply whatever >> safe optimizations it likes, but unsafe optimizations must be explicitly >> asked for by the user. If subtle or not subtle bugs occur, well, Python does >> allow people to shoot themselves in the foot. >> >> There's precedence for this. Both -O and -OO optimization switches >> potentially change behaviour. -O *should* be safe if code only uses asserts >> for assertions, but many people (especially beginners) use assert for input >> checking. If their code breaks under -O they have nobody to blame but >> themselves. Might we not say that -OO will optimize access to builtins, and >> if things break, the solution is not to use -OO? > > Maybe. But that means it will probably rarely be used -- > realistically, who uses -O or -OO? I don't ever. Even so, there would > have to be a way to turn the optimization off even under -OO for a > particular module or function or code location, or for a particular > builtin (again, open() comes to mind). If this ever happes, -O and -OO will no longer be expressive enough (IMHO, -OO currently isn't anyway). There would be a need to support options like "-Ostatic-builtins" and the like. The problem then is how to keep users from applying a particular optimisation to a particular module. New settings in distutils could help to enable optimisations and maybe even to explicitly forbid optimisations, but life would certainly become more error prone for distributors and users. It's hard to keep track of the amount of bug reports and help requests we get from (mostly new) Cython users about a missing "-fno-strict-aliasing" when compiled modules don't work in Python 2. I expect the same to come up when users start to install Python modules with all sorts of great CPython optimisations. Test suites may well fail to catch the one bug that an optimisation triggers. >> Here's another thought... suppose (say) "builtin" became a reserved word. >> builtin.range (for example) would always refer to the built-in range, and >> could be optimized by the compiler. It wouldn't do much for the general case >> of wanting to optimize non-built-in globals, but this could be optimized >> safely: >> >> def f(): >> for i in builtin.range(10): builtin.print(i) >> >> while this would keep the current semantics: >> >> def f(): >> for i in range(10): print(i) > > That defaults the wrong way. My impression exactly, so I'm -1. But the trade-off behind this is: complicating new code versus breaking old code (Python 3 classic). Stefan From cmjohnson.mailinglist at gmail.com Sun Jan 2 10:46:53 2011 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Sat, 1 Jan 2011 23:46:53 -1000 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: <4D1E88F6.9000701@pearwood.info> Message-ID: On Sat, Jan 1, 2011 at 11:38 PM, Stefan Behnel wrote: > If this ever happes, -O and -OO will no longer be expressive enough (IMHO, > -OO currently isn't anyway). There would be a need to support options like > "-Ostatic-builtins" and the like. The problem then is how to keep users from > applying a particular optimisation to a particular module. New settings in > distutils could help to enable optimisations and maybe even to explicitly > forbid optimisations, but life would certainly become more error prone for > distributors and users. It's hard to keep track of the amount of bug reports > and help requests we get from (mostly new) Cython users about a missing > "-fno-strict-aliasing" when compiled modules don't work in Python 2. I > expect the same to come up when users start to install Python modules with > all sorts of great CPython optimisations. Test suites may well fail to catch > the one bug that an optimisation triggers. If a flag wouldn't work, what about a pragma? Pragma smell a bit unpythonic to me, but we did have a pragma for source encoding and unicode literals in Python 2, so it's not unprecedented. How much would it solve efficiency-wise if you could just write at the top of a particular module ##ORIGINAL BUILTINS ONLY PLEASE ? Or is this the first step on the dark path to Perl 6? -- Carl Johnson From guido at python.org Sun Jan 2 17:10:44 2011 From: guido at python.org (Guido van Rossum) Date: Sun, 2 Jan 2011 08:10:44 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References: <4D1E88F6.9000701@pearwood.info> Message-ID: > If a flag wouldn't work, what about a pragma? Pragma smell a bit > unpythonic to me, but we did have a pragma for source encoding and > unicode literals in Python 2, so it's not unprecedented. How much > would it solve efficiency-wise if you could just write at the top of a > particular module ##ORIGINAL BUILTINS ONLY PLEASE ? Or is this the > first step on the dark path to Perl 6? Again, this would encourage people to put such junk in every module they write, so it would lose its value. At this point in the thread I am tempted to propose an optimization moratorium, just to stop the flood of poorly-thought-through proposals. If you really want to make Python faster, don't waste your time in this thread. Go contribute to PyPy or Unladen Swallow. Or go fix the GIL, so we can use multiple cores. -- --Guido van Rossum (python.org/~guido) From fuzzyman at voidspace.org.uk Mon Jan 3 15:33:06 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 03 Jan 2011 14:33:06 +0000 Subject: [Python-ideas] Optimizing builtins In-Reply-To: References:

Message-ID: <4D21DE22.3040409@voidspace.org.uk> On 31/12/2010 21:51, Guido van Rossum wrote: > On Fri, Dec 31, 2010 at 11:59 AM, Michael Foord > wrote: >> >> On 31 December 2010 18:49, Guido van Rossum wrote: >>> [Changed subject *and* list] >>> >>>> 2010/12/31 Maciej Fijalkowski >>>>> How do you know that range is a builtin you're thinking >>>>> about and not some other object? >>> On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro >>> wrote: >>>> By a special opcode which could do this work. ]:-) >>> That can't be the answer, because then the question would become "how >>> does the compiler know it can use the special opcode". This particular >>> issue (generating special opcodes for certain builtins) has actually >>> been discussed many times before. Alas, given Python's extremely >>> dynamic promises it is very hard to do it in a way that is >>> *guaranteed* not to change the semantics. For example, I could have >>> replaced builtins['range'] with something else; or I could have >>> inserted a variable named 'range' into the module's __dict__. (Note >>> that I am not talking about just creating a global variable named >>> 'range' in the module; those the compiler could recognize. I am >>> talking about interceptions that a compiler cannot see, assuming it >>> compiles each module independently, i.e. without whole-program >>> optimizations.) >>> >>> Now, *in practice* such manipulations are rare >> Actually range is the one I've seen *most* overridden, not in order to >> replace functionality but because range is such a useful (or relevant) >> variable name in all sorts of circumstances... > No, you're misunderstanding. I was not referring to the overriding a > name using Python's regular syntax for defining names. If you set a > (global or local) variable named 'range', the compiler is perfectly > capable of noticing. E.g.: > > range = 42 > def foo(): > for i in range(10): print(i) > Right, in the same way the compiler notices local and global variable use and compiles different bytecode for lookups. It's just that accidentally overriding range is the source of my favourite "confusing Python error messages" story and I look for any opportunity to repeat it. A few years ago I worked for a company where most of the (very talented) developers were new to Python. They called me over to explain what "UnboundLocalError" meant and why they were getting it in what looked (to them) like perfectly valid code. The code looked something like: def something(start, stop): positions = range(start, stop) # more code here... range = process(positions) All the best, Michael Foord > While this will of course fail with a TypeError if you try to execute > it, a (hypothetical) optimizing compiler would have no trouble > noticing that the 'range' in the for-loop must refer to the global > variable of that name, not to the builtin of the same name. > > I was referring to an innocent module containing a use of the builtin > range function, e.g. > > # a.py > def f(): > for i in range(10): print(i) > > which is imported by another module which manipulates a's globals, for example: > > # b.py > import a > a.range = 42 > a.f() > > The compiler has no way to notice this when a.py is being compiled. > > Variants of "hiding" a mutation like this include: > > a.__dict__['range'] = 42 > > or > > import builtins > builtins.range = 42 > > and of course for more fun you can make it more dynamic (think > obfuscated code contests). > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From guido at python.org Mon Jan 3 19:09:07 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 3 Jan 2011 10:09:07 -0800 Subject: [Python-ideas] Optimizing builtins In-Reply-To: <4D21DE22.3040409@voidspace.org.uk> References:

<4D21DE22.3040409@voidspace.org.uk> Message-ID: On Mon, Jan 3, 2011 at 6:33 AM, Michael Foord wrote: > A few years ago I worked for a company where most of the (very talented) > developers were new to Python. They called me over to explain what > "UnboundLocalError" meant and why they were getting it in what looked (to > them) like perfectly valid code. The code looked something like: > > def something(start, stop): > ? ?positions = range(start, stop) > > ? ?# more code here... > > ? ?range = process(positions) Yeah, and the really annoying thing for us old-timers is that this used to work (in Python 1.0 or so :-). Once upon a time, looking up locals was as dynamic as looking up globals and builtins is today. Still, I think for optimizing builtins we can do slightly better. -- --Guido van Rossum (python.org/~guido) From rich at noir.com Mon Jan 3 22:09:14 2011 From: rich at noir.com (K. Richard Pixley) Date: Mon, 03 Jan 2011 13:09:14 -0800 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. Message-ID: <4D223AFA.1010802@noir.com> There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirkjan at ochtman.nl Mon Jan 3 22:22:14 2011 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Mon, 3 Jan 2011 22:22:14 +0100 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: <4D223AFA.1010802@noir.com> References: <4D223AFA.1010802@noir.com> Message-ID: On Mon, Jan 3, 2011 at 22:09, K. Richard Pixley wrote: > I think the meanings of the new ones are pretty straightforward, but in case they are not... > > @staticproperty - like @property only without an implicit first argument.? Allows the property to be called directly from the class without requiring a throw-away instance. > > @classproperty - like @property, only the implicit first argument to the method is the class.? Allows the property to be called directly from the class without requiring a throw-away instance. > > @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses > > @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty > > @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty Do you have actual use cases for these? Cheers, Dirkjan From rich at noir.com Mon Jan 3 23:06:35 2011 From: rich at noir.com (K. Richard Pixley) Date: Mon, 03 Jan 2011 14:06:35 -0800 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: References: <4D223AFA.1010802@noir.com> Message-ID: <4D22486B.1090607@noir.com> On 20110103 13:22, Dirkjan Ochtman wrote: > On Mon, Jan 3, 2011 at 22:09, K. Richard Pixley wrote: >> I think the meanings of the new ones are pretty straightforward, but in case they are not... >> >> @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. >> >> @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. >> >> @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses >> >> @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty >> >> @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty > Do you have actual use cases for these? Yes. Here's a toy example for abstractclassproperty: class InstanceKeeper(object): __metaclass__ = abc.ABCMeta @abc.abstractclassproperty def all_instances(cls): raise NotImplementedError class InstancesByList(InstanceKeeper): instances = [] @classproperty def all_instances(cls): return cls.instances class InstancesByDict(InstanceKeeper): instances = {} @classproperty def all_instances(cls): return list(cls.instances) class WhateversByList(InstancesByList): instances = [] ... class OthersByList(InstancesByList): instances = [] ... class StillMoreByDict(InstancesByDict): instances = {} ... class MoreAgainByDict(InstancesByDict): instances = {} ... I'm working on a library for reading and writing ELF format object files. I have a bunch of classes representing various structs. And the structs have, or point to, other structs. I'm using different subclasses to describe different byte ordering, (endianness), and word size, (32 vs 64 bit). Here are examples for the others. class Codable(object): __metaclass__ = abc.ABCMeta @abc.abstractattribute coder = None @classproperty def size(cls): return cls.coder.size class FileHeader(Codable): __metaclass__ = abc.ABCMeta @abc.abstractattribute sectionHeaderClass = None """ Used to create new instances. """" sectionHeader = None @abc.abstractstaticproperty def word_size(): raise NotImplementedError def __new__(...): """ factory function reading the first few bytes and returning an instance of one of the subclasses """ ... def __init__(self, ...): ... self.sectionHeader = self.sectionHeaderClass(...) class Bit64(object): @staticproperty def word_size(): return 64 class Bit32(object): @staticproperty def word_size(): return 32 class FileHeader64l(FileHeader, Bit64): coder = struct.Struct(...) sectionHeaderClass = SectionHeader64l class FileHeader64b(FileHeader, Bit64): coder = struct.Struct(...) sectionHeaderClass = SectionHeader64b class FileHeader32l(FileHeader, Bit32): coder = struct.Struct(...) sectionHeaderClass = SectionHeader32l class FileHeader32b(FileHeader, Bit32): coder = struct.Struct(...) sectionHeaderClass = SectionHeader32b class SectionHeader(Codable): __metaclass__ = ABCMeta @abc.abstractattribute subsectionHeaderClass = None ... class SectionHeader64l(SectionHeader, Bit64): coder = struct.Struct(...) .... --rich From rich at noir.com Mon Jan 3 23:18:42 2011 From: rich at noir.com (K. Richard Pixley) Date: Mon, 03 Jan 2011 14:18:42 -0800 Subject: [Python-ideas] read-only attributes Message-ID: <4D224B42.5070204@noir.com> It seems to me that one of the more common reasons for using @property is to create a read-only attribute. I wonder if it would make sense to simply create a read-only decorator. Compare: class Foo(object): _size = 4 @property def size(self): return _size against: class Foo(object): @read-only size = 4 This gets more interesting if decorators nest: class Foo(object): __metaclass__ = abc.ABCMeta @abstract @classattribute @read-only size = None --rich From solipsis at pitrou.net Mon Jan 3 23:31:11 2011 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 3 Jan 2011 23:31:11 +0100 Subject: [Python-ideas] read-only attributes References: <4D224B42.5070204@noir.com> Message-ID: <20110103233111.034660c3@pitrou.net> On Mon, 03 Jan 2011 14:18:42 -0800 "K. Richard Pixley" wrote: > > class Foo(object): > @read-only > size = 4 What's wrong with: >>> class Foo: ... @property ... def foo(self): ... return 4 ... >>> Foo().foo 4 >>> Foo().foo = 5 Traceback (most recent call last): File "", line 1, in AttributeError: can't set attribute ? From guido at python.org Tue Jan 4 01:56:05 2011 From: guido at python.org (Guido van Rossum) Date: Mon, 3 Jan 2011 16:56:05 -0800 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: <4D22486B.1090607@noir.com> References: <4D223AFA.1010802@noir.com> <4D22486B.1090607@noir.com> Message-ID: > On 20110103 13:22, Dirkjan Ochtman wrote: >> Do you have actual use cases for these? On Mon, Jan 3, 2011 at 2:06 PM, K. Richard Pixley wrote: > Yes. ?Here's a toy example for abstractclassproperty: Um, a toy example is pretty much the opposite of a use case. :-( That said, I am sure there are use cases for static property and class property -- I've run into them myself. An example use case for class property: in App Engine, we have a Model class (it's similar to Django's Model class). A model has a "kind" which is a string (it's the equivalent of an SQL table name). The kind is usually the class name but sometimes there's a need to override it. However once the class is defined it should be considered read-only. Currently our choices are to make this an instance property (but there are some situations where we don't have an instance, e.g. when creating a new instance using a class method); or to make it a class attribute (but this isn't read-only); or to make it a class method (which requires the user to write M.kind() instead of M.kind). If I had class properties I'd use one here. -- --Guido van Rossum (python.org/~guido) From cool-rr at cool-rr.com Tue Jan 4 03:28:25 2011 From: cool-rr at cool-rr.com (cool-RR) Date: Tue, 4 Jan 2011 04:28:25 +0200 Subject: [Python-ideas] An improved `ContextManager` Message-ID: Hello folks. Ever since Michael Foord talked about `ContextDecorator` in python-ideas I've been kicking around an idea for my own take on it. It's a `ContextManager` class which provides the same thing that Foord's `ContextDecorator` does, but also provides a few more goodies, chief of which being the `manage_context` method. I've been working on this for a few days and I think it's ready for review. It's well-tested and extensively documented. I started using it wherever I have context managers in GarlicSim. I'll be happy to get your opinions on my approach and any critiques you may have. If there are no problems with this approach, I'll probably release it with GarlicSim 0.6.1 and blog about it. Here is my `context_manager` module. Here are its tests . Following is the module's docstring which explains the module in more detail. Ram. Defines the `ContextManager` and `ContextManagerType` classes. These classes allow for greater freedom both when (a) defining context managers and when (b) using them. Inherit all your context managers from `ContextManager` (or decorate your generator functions with `ContextManagerType`) to enjoy all the benefits described below. Defining context managers ------------------------- There are 3 different ways in which context managers can be defined, and each has their own advantages and disadvantages over the others. 1. The classic way to define a context manager is to define a class with `__enter__` and `__exit__` methods. This is allowed, and if you do this you should still inherit from `ContextManager`. Example: class MyContextManager(ContextManager): def __enter__(self): pass # preparation def __exit__(self, type_=None, value=None, traceback=None): pass # cleanup 2. As a decorated generator, like so: @ContextManagerType def MyContextManager(): try: yield finally: pass # clean-up This usage is nothing new; It's also available when using the standard library's `contextlib.contextmanager` decorator. One thing that is allowed here that `contextlib` doesn't allow is to yield the context manager itself by doing `yield SelfHook`. 3. The third and novel way is by defining a class with a `manage_context` method which returns a decorator. Example: class MyContextManager(ContextManager): def manage_context(self): do_some_preparation() try: with some_lock: yield self finally: do_some_cleanup() This approach is sometimes cleaner than defining `__enter__` and `__exit__`; Especially when using another context manager inside `manage_context`. In our example we did `with some_lock` in our `manage_context`, which is shorter and more idiomatic than calling `some_lock.__enter__` in an `__enter__` method and `some_lock.__exit__` in an `__exit__` method. These were the different ways of *defining* a context manager. Now let's see the different ways of *using* a context manager: Using context managers ---------------------- There are 2 different ways in which context managers can be used: 1. The plain old honest-to-Guido `with` keyword: with MyContextManager() as my_context_manager: do_stuff() 2. As a decorator to a function @MyContextManager() def do_stuff(): pass # doing stuff When the `do_stuff` function will be called, the context manager will be used. This functionality is also available in the standard library of Python 3.2+ by using `contextlib.ContextDecorator`, but here it is combined with all the other goodies given by `ContextManager`. That's it. Inherit all your context managers from `ContextManager` (or decorate your generator functions with `ContextManagerType`) to enjoy all these benefits. -- Sincerely, Ram Rachum -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott+python-ideas at scottdial.com Tue Jan 4 03:13:02 2011 From: scott+python-ideas at scottdial.com (Scott Dial) Date: Mon, 03 Jan 2011 21:13:02 -0500 Subject: [Python-ideas] read-only attributes In-Reply-To: <20110103233111.034660c3@pitrou.net> References: <4D224B42.5070204@noir.com> <20110103233111.034660c3@pitrou.net> Message-ID: <4D22822E.8020009@scottdial.com> On 1/3/2011 5:31 PM, Antoine Pitrou wrote: > On Mon, 03 Jan 2011 14:18:42 -0800 > "K. Richard Pixley" wrote: >> >> class Foo(object): >> @read-only >> size = 4 > > What's wrong with: > >>>> class Foo: > ... @property > ... def foo(self): > ... return 4 > ... >>>> Foo().foo > 4 >>>> Foo().foo = 5 > Traceback (most recent call last): > File "", line 1, in > AttributeError: can't set attribute > > ? > s/4/Bar()/: >>> class Foo: ... @property ... def foo(self): ... return Bar() -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From fuzzyman at voidspace.org.uk Tue Jan 4 12:00:16 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 4 Jan 2011 11:00:16 +0000 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: References: <4D223AFA.1010802@noir.com> <4D22486B.1090607@noir.com> Message-ID: On 4 January 2011 00:56, Guido van Rossum wrote: > > On 20110103 13:22, Dirkjan Ochtman wrote: > >> Do you have actual use cases for these? > > On Mon, Jan 3, 2011 at 2:06 PM, K. Richard Pixley wrote: > > Yes. Here's a toy example for abstractclassproperty: > > Um, a toy example is pretty much the opposite of a use case. :-( > > That said, I am sure there are use cases for static property and class > property -- I've run into them myself. > > An example use case for class property: in App Engine, we have a Model > class (it's similar to Django's Model class). A model has a "kind" > which is a string (it's the equivalent of an SQL table name). The kind > is usually the class name but sometimes there's a need to override it. > However once the class is defined it should be considered read-only. > Currently our choices are to make this an instance property (but there > are some situations where we don't have an instance, e.g. when > creating a new instance using a class method); or to make it a class > attribute (but this isn't read-only); or to make it a class method > (which requires the user to write M.kind() instead of M.kind). If I > had class properties I'd use one here. > A class property that can be fetched is very easy to implement. Because of asymmetry in the descriptor protocol I don't think you can create a class property with behaviour on set though (unless you use a metaclass I guess). class classproperty(object): def __init__(self, function): self._function = function def __get__(self, instance, owner): return self._function(owner) All the best, Michael > -- > --Guido van Rossum (python.org/~guido ) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 4 15:16:55 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 5 Jan 2011 00:16:55 +1000 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: <4D223AFA.1010802@noir.com> References: <4D223AFA.1010802@noir.com> Message-ID: On Tue, Jan 4, 2011 at 7:09 AM, K. Richard Pixley wrote: > I think the meanings of the new ones are pretty straightforward, but in > case they are not... > > @staticproperty - like @property only without an implicit first argument. > Allows the property to be called directly from the class without requiring a > throw-away instance. > > @classproperty - like @property, only the implicit first argument to the > method is the class. Allows the property to be called directly from the > class without requiring a throw-away instance. > As Michael mentions later in the thread, these can't really work due to the asymmetry in the descriptor protocol: if you retrieve a descriptor object directly from a class, the interpreter will consult the __get__ method of that descriptor, but if you set or delete it through the class, it will just perform the set or delete - the descriptor has no say in the matter, even if it defines __set__ or __delete__ methods. (See the example interpreter session at http://pastebin.com/1M7KYB9d). The only way to get static or class properties to work correctly is to define them on the metaclass, in which case you can just use the existing property descriptor (although you have to then jump through additional hoops to make access via instances work properly - off the top of my head, I'm actually not sure how to make that happen). @abc.abstractattribute - a simple, non-callable variable that must be > overridden in subclasses > You can't decorate attributes, only functions. > @abc.abstractstaticproperty - like @abc.abstractproperty only for > @staticproperty > > @abc.abstractclassproperty - like @abc.abstractproperty only for > @classproperty > See above. These don't exist because staticproperty and classproperty don't work. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikegraham at gmail.com Tue Jan 4 16:23:52 2011 From: mikegraham at gmail.com (Mike Graham) Date: Tue, 4 Jan 2011 10:23:52 -0500 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: <4D223AFA.1010802@noir.com> References: <4D223AFA.1010802@noir.com> Message-ID: On Mon, Jan 3, 2011 at 4:09 PM, K. Richard Pixley wrote: > Essentially the permutation are, I think: > {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable > attribute}. > > At the abstract level, a property and a normal, non-callable attribute are the same thing. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikegraham at gmail.com Tue Jan 4 16:31:21 2011 From: mikegraham at gmail.com (Mike Graham) Date: Tue, 4 Jan 2011 10:31:21 -0500 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: References: <4D223AFA.1010802@noir.com> <4D22486B.1090607@noir.com> Message-ID: On Mon, Jan 3, 2011 at 7:56 PM, Guido van Rossum wrote: > That said, I am sure there are use cases for static property and class > property -- I've run into them myself. > > An example use case for class property: in App Engine, we have a Model > class (it's similar to Django's Model class). A model has a "kind" > which is a string (it's the equivalent of an SQL table name). The kind > is usually the class name but sometimes there's a need to override it. > However once the class is defined it should be considered read-only. > Currently our choices are to make this an instance property (but there > are some situations where we don't have an instance, e.g. when > creating a new instance using a class method); or to make it a class > attribute (but this isn't read-only); or to make it a class method > (which requires the user to write M.kind() instead of M.kind). If I > had class properties I'd use one here. > > -- > --Guido van Rossum (python.org/~guido) This attitude seems to go against the we're-all-adults-here attitude that Python, for better or worse, really wants us to take. If we want to turn "there's no good reason to do X; it's pointless and you'd have to be insane to try, and I even documented that you can't do it" into "it's programmability enforced that you can't do X", it seems like we should be migrating to a language that enforces this with nicer syntax, more fidelity, and less overhead. Mike From fuzzyman at voidspace.org.uk Tue Jan 4 18:27:26 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 4 Jan 2011 17:27:26 +0000 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: References: <4D223AFA.1010802@noir.com> <4D22486B.1090607@noir.com>

Message-ID: On 4 January 2011 15:31, Mike Graham wrote: > On Mon, Jan 3, 2011 at 7:56 PM, Guido van Rossum wrote: > > That said, I am sure there are use cases for static property and class > > property -- I've run into them myself. > > > > An example use case for class property: in App Engine, we have a Model > > class (it's similar to Django's Model class). A model has a "kind" > > which is a string (it's the equivalent of an SQL table name). The kind > > is usually the class name but sometimes there's a need to override it. > > However once the class is defined it should be considered read-only. > > Currently our choices are to make this an instance property (but there > > are some situations where we don't have an instance, e.g. when > > creating a new instance using a class method); or to make it a class > > attribute (but this isn't read-only); or to make it a class method > > (which requires the user to write M.kind() instead of M.kind). If I > > had class properties I'd use one here. > > > I'm not entirely sure what you're referring to here - but if you're referring to the desire to make an attribute read-only then there is a different principle at work. If setting something is a programmer error, then it is better that the error become an exception at the point the error is made rather than become a different exception somewhere else later on. Michael Foord > > -- > > --Guido van Rossum (python.org/~guido ) > > This attitude seems to go against the we're-all-adults-here attitude > that Python, for better or worse, really wants us to take. If we want > to turn "there's no good reason to do X; it's pointless and you'd have > to be insane to try, and I even documented that you can't do it" into > "it's programmability enforced that you can't do X", it seems like we > should be migrating to a language that enforces this with nicer > syntax, more fidelity, and less overhead. > > Mike > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From rich at noir.com Tue Jan 4 18:44:30 2011 From: rich at noir.com (K. Richard Pixley) Date: Tue, 04 Jan 2011 09:44:30 -0800 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: References: <4D223AFA.1010802@noir.com> Message-ID: <4D235C7E.7030401@noir.com> On 1/4/11 07:23 , Mike Graham wrote: > On Mon, Jan 3, 2011 at 4:09 PM, K. Richard Pixley > wrote: > > Essentially the permutation are, I think: > {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable > attribute}. > > > At the abstract level, a property and a normal, non-callable attribute > are the same thing. They are from the instantiation perspective but not from the subclassing perspective. From the subclassing perspective, it's the difference between: class Foo(object): @property def bar(self): ... and: class Foo(object): bar = ... If an abstract property were to be answered by a simple assignment, then the "read-only" trait would be lost. --rich -------------- next part -------------- An HTML attachment was scrubbed... URL: From rich at noir.com Tue Jan 4 18:50:42 2011 From: rich at noir.com (K. Richard Pixley) Date: Tue, 04 Jan 2011 09:50:42 -0800 Subject: [Python-ideas] @classproperty, @abc.abstractclasspropery, etc. In-Reply-To: References: <4D223AFA.1010802@noir.com> <4D22486B.1090607@noir.com>

Message-ID: <4D235DF2.5060000@noir.com> On 1/4/11 07:31 , Mike Graham wrote: > On Mon, Jan 3, 2011 at 7:56 PM, Guido van Rossum wrote: >> That said, I am sure there are use cases for static property and class >> property -- I've run into them myself. >> >> An example use case for class property: in App Engine, we have a Model >> class (it's similar to Django's Model class). A model has a "kind" >> which is a string (it's the equivalent of an SQL table name). The kind >> is usually the class name but sometimes there's a need to override it. >> However once the class is defined it should be considered read-only. >> Currently our choices are to make this an instance property (but there >> are some situations where we don't have an instance, e.g. when >> creating a new instance using a class method); or to make it a class >> attribute (but this isn't read-only); or to make it a class method >> (which requires the user to write M.kind() instead of M.kind). If I >> had class properties I'd use one here. >> >> -- >> --Guido van Rossum (python.org/~guido) > This attitude seems to go against the we're-all-adults-here attitude > that Python, for better or worse, really wants us to take. If we want > to turn "there's no good reason to do X; it's pointless and you'd have > to be insane to try, and I even documented that you can't do it" into > "it's programmability enforced that you can't do X", it seems like we > should be migrating to a language that enforces this with nicer > syntax, more fidelity, and less overhead. It's not the restriction that I'm looking for - it's the expressive grace. These concepts are pretty straightforward given the beginnings of them that we have now. Filling out the matrix is a pretty obvious concept. The idea that while the concepts are available to anyone, straightforward, and recur, but can only be implemented by someone with extremely current and advanced knowledge of the interpreter, resulting in code which is less transparent rather than more, is the restrictive idea. --rich From guido at python.org Tue Jan 4 23:52:39 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 4 Jan 2011 14:52:39 -0800 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: Hmm... I starred this and am finally dug out enough to comment. Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name? You can currently get a crude version of this by simply assigning to __name__ at the top of the module. That sounds like it would be too confusing, however, so perhaps we could make it so that, when the __module__ attribute is initialized, it first looks for __canonical__ and then for __name__? This may still be too crude though -- I looked at the one example I could think of where this might be useful, the unittest package, and realized that it would set __module__ to 'unittest' even for classes that are not actually re-exported via the unittest namespace. So maybe it would be better in that case to just patch the __module__ attribute of all the public classes in unittest/__import__.py? OTOH for things named __main__, setting __canonical__ (automatically, by -m or whatever other mechanism starts execution, like "python " might actually work. On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad... --Guido On Thu, Dec 30, 2010 at 6:52 PM, Nick Coghlan wrote: > On Thu, Dec 30, 2010 at 11:48 AM, Ron Adam wrote: >> This sounds like two different separate issues to me. >> >> One is the leaking-out of lower level details. >> >> The other is abstracting a framework with the minimal amount of details >> needed. > > Yeah, sort of. Really, the core issue is that some objects live in two places: > - where they came from right now, in the current interpreter > - where they should be retrieved from "officially" (e.g. since another > interpreter may not provide an accelerated version, or because the > appropriate submodule may be selected at runtime based on the current > platform) > > There's currently no systematic way of flagging objects or modules > where the latter location differs from the former, so the components > that leak the low level details (such as pickling and pydoc) have no > way to avoid it. Once a system is in place to identify such objects > (or perhaps just the affected modules), then the places that leak that > information can be updated to deal with the situation appropriately > (e.g. pickling would likely just use the official names, while pydoc > would display both, indicating which one was the 'official' location, > and which one reflected the current interpreter behaviour). > > So it's really one core problem (non-portable module details), which > then leads to an assortment of smaller problems when other parts of > the standard library are forced to rely on those non-portable details > because that's the only information available. > > Cheers, > Nick. > > -- > Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Wed Jan 5 02:55:04 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 5 Jan 2011 11:55:04 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: On Wed, Jan 5, 2011 at 8:52 AM, Guido van Rossum wrote: > Hmm... I starred this and am finally dug out enough to comment. > > Would it be sufficient if the __module__ attribute of classes and > functions got set to the "canonical" name rather than the "physical" > name? > > You can currently get a crude version of this by simply assigning to > __name__ at the top of the module. > > That sounds like it would be too confusing, however, so perhaps we > could make it so that, when the __module__ attribute is initialized, > it first looks for __canonical__ and then for __name__? > > This may still be too crude though -- I looked at the one example I > could think of where this might be useful, the unittest package, and > realized that it would set __module__ to 'unittest' even for classes > that are not actually re-exported via the unittest namespace. > > So maybe it would be better in that case to just patch the __module__ > attribute of all the public classes in unittest/__import__.py? I did think about that - for classes, it would probably be sufficient, but for functions the fact that we'd be breaking the identity that "f.__globals__ is sys.modules[f.__module__]" scares me. Then again, the fact that "f.__module__ != f.__globals__['__name__']" would provide exactly the indicator of "two names" that I am talking about (at least where functions are concerned) - things like pydoc and the inspect module could definitely be updated to check both module names. On the gripping hand, there would still be problems with things like methods and nested classes and functions (unless tools were provided to recurse down through a class to update the subcomponents as well as the class itself). So perhaps the granularity on my initial suggestion wasn't fine enough - if the "__canonical__" idea was extended to all objects with a __module__ attribute, then objects could either be relocated at creation time (by setting __canonical__ in the module globals), or after the fact by assigning to the __canonical__ attribute on the object. > OTOH for things named __main__, setting __canonical__ (automatically, > by -m or whatever other mechanism starts execution, like "python > " might actually work. Yes, although a related modification is needed in those cases (to actual insert the module being executed into sys.modules under its module name as well as under __main__). > On the third hand, maybe you've finally hit upon a reason why the "if > __name__ == '__main__': main()" idiom is bad... I can't take credit for that particular observation - I've certainly heard others complain about that in the context of pickling objects over the years. It is one of the main things that got me thinking along these lines in the first place. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Wed Jan 5 03:00:05 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 5 Jan 2011 12:00:05 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: <4D23CDD0.6080601@ronadam.com> References: <4D1BE506.2070806@ronadam.com> <4D23CDD0.6080601@ronadam.com> Message-ID: On Wed, Jan 5, 2011 at 11:48 AM, Ron Adam wrote: > (This is probably something that was suggested more than a few times > before.) > > Would it help if global name space acquired a __main__ name? ?Then the > standard if line becomes only a slightly different "if __name__ == __main__: > main()". ?I think that would make more sense to beginners also and it is a > bit less magical. > > For now, both ways could work, __main__ would be "__main__" or None, but > down the road, (long enough to be sure everyone knows to drop the quotes), > both __main__ and __name__ could be switched to the actual module name so > that __name__ and __module__ attributes would always be correct. If we decided to actually change the way the main module was executed, the most likely result would be to resurrect PEP 299. Changing that particular idiom is probably a Py4k scale of change though :P Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rrr at ronadam.com Wed Jan 5 02:48:00 2011 From: rrr at ronadam.com (Ron Adam) Date: Tue, 04 Jan 2011 19:48:00 -0600 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: <4D23CDD0.6080601@ronadam.com> On 01/04/2011 04:52 PM, Guido van Rossum wrote: > Hmm... I starred this and am finally dug out enough to comment. > > Would it be sufficient if the __module__ attribute of classes and > functions got set to the "canonical" name rather than the "physical" > name? > > You can currently get a crude version of this by simply assigning to > __name__ at the top of the module. > > That sounds like it would be too confusing, however, so perhaps we > could make it so that, when the __module__ attribute is initialized, > it first looks for __canonical__ and then for __name__? > > This may still be too crude though -- I looked at the one example I > could think of where this might be useful, the unittest package, and > realized that it would set __module__ to 'unittest' even for classes > that are not actually re-exported via the unittest namespace. > > So maybe it would be better in that case to just patch the __module__ > attribute of all the public classes in unittest/__import__.py? > > OTOH for things named __main__, setting __canonical__ (automatically, > by -m or whatever other mechanism starts execution, like "python > " might actually work. > > On the third hand, maybe you've finally hit upon a reason why the "if > __name__ == '__main__': main()" idiom is bad... (This is probably something that was suggested more than a few times before.) Would it help if global name space acquired a __main__ name? Then the standard if line becomes only a slightly different "if __name__ == __main__: main()". I think that would make more sense to beginners also and it is a bit less magical. For now, both ways could work, __main__ would be "__main__" or None, but down the road, (long enough to be sure everyone knows to drop the quotes), both __main__ and __name__ could be switched to the actual module name so that __name__ and __module__ attributes would always be correct. Cheers, Ron From rrr at ronadam.com Wed Jan 5 05:18:16 2011 From: rrr at ronadam.com (Ron Adam) Date: Tue, 04 Jan 2011 22:18:16 -0600 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> <4D23CDD0.6080601@ronadam.com> Message-ID: <4D23F108.6060704@ronadam.com> On 01/04/2011 08:00 PM, Nick Coghlan wrote: > On Wed, Jan 5, 2011 at 11:48 AM, Ron Adam wrote: >> (This is probably something that was suggested more than a few times >> before.) >> >> Would it help if global name space acquired a __main__ name? Then the >> standard if line becomes only a slightly different "if __name__ == __main__: >> main()". I think that would make more sense to beginners also and it is a >> bit less magical. >> >> For now, both ways could work, __main__ would be "__main__" or None, but >> down the road, (long enough to be sure everyone knows to drop the quotes), >> both __main__ and __name__ could be switched to the actual module name so >> that __name__ and __module__ attributes would always be correct. > > If we decided to actually change the way the main module was executed, > the most likely result would be to resurrect PEP 299. Changing that > particular idiom is probably a Py4k scale of change though :P Well, changing it in the way PEP 299 suggests is probably even a Py5k change. Which is why I didn't suggest that. ;-) Also PEP 299 main motivation is different than what is being discussed here. Cheers, Ron From guido at python.org Wed Jan 5 05:47:01 2011 From: guido at python.org (Guido van Rossum) Date: Tue, 4 Jan 2011 20:47:01 -0800 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: On Tue, Jan 4, 2011 at 5:55 PM, Nick Coghlan wrote: > On Wed, Jan 5, 2011 at 8:52 AM, Guido van Rossum wrote: >> Hmm... I starred this and am finally dug out enough to comment. >> >> Would it be sufficient if the __module__ attribute of classes and >> functions got set to the "canonical" name rather than the "physical" >> name? >> >> You can currently get a crude version of this by simply assigning to >> __name__ at the top of the module. >> >> That sounds like it would be too confusing, however, so perhaps we >> could make it so that, when the __module__ attribute is initialized, >> it first looks for __canonical__ and then for __name__? >> >> This may still be too crude though -- I looked at the one example I >> could think of where this might be useful, the unittest package, and >> realized that it would set __module__ to 'unittest' even for classes >> that are not actually re-exported via the unittest namespace. >> >> So maybe it would be better in that case to just patch the __module__ >> attribute of all the public classes in unittest/__import__.py? > > I did think about that - for classes, it would probably be sufficient, > but for functions the fact that we'd be breaking the identity that > "f.__globals__ is sys.modules[f.__module__]" scares me. Really? Why? Who would ever depend on that? (You also probably meant sys.modules[...].__dict__ -- f.__globals__ is a dict, not a module object.) Note that for classes you'd have the same issue, since each method references the module globals in its f.__globals__. > Then again, > the fact that "f.__module__ != f.__globals__['__name__']" would > provide exactly the indicator of "two names" that I am talking about > (at least where functions are concerned) - things like pydoc and the > inspect module could definitely be updated to check both module names. I think the more important question to answer first would be what you'd want pydoc and inspect to do. > On the gripping hand, there would still be problems with things like > methods and nested classes and functions (unless tools were provided > to recurse down through a class to update the subcomponents as well as > the class itself). Well, method references (even unbound) are not picklable anyway. > So perhaps the granularity on my initial suggestion wasn't fine enough > - if the "__canonical__" idea was extended to all objects with a > __module__ attribute, then objects could either be relocated at > creation time (by setting __canonical__ in the module globals), or > after the fact by assigning to the __canonical__ attribute on the > object. BTW, I think we need to come up with a better word than __canonical__. In general I don't like using adjectives as attribute names. >> OTOH for things named __main__, setting __canonical__ (automatically, >> by -m or whatever other mechanism starts execution, like "python >> " might actually work. > > Yes, although a related modification is needed in those cases (to > actual insert the module being executed into sys.modules under its > module name as well as under __main__). That's the easy part. The hard part is to make the "real name" (i.e. not __main__) the name used by the classes and functions it defines, without breaking the "if __name__ == '__main__': main()" idiom... >> On the third hand, maybe you've finally hit upon a reason why the "if >> __name__ == '__main__': main()" idiom is bad... > > I can't take credit for that particular observation - I've certainly > heard others complain about that in the context of pickling objects > over the years. It is one of the main things that got me thinking > along these lines in the first place. Why didn't you say so in the first place? :-) I think it's easier to come up with a solution for just this case; the issue with e.g. unittest doesn't seem quite as hard (after all, "unittest.case" will always exist). We could just call it __real_name__ and use that in preference over __name__ for all __module__ attributes whenever it's set. (Or we could always set both...) -- --Guido van Rossum (python.org/~guido) From jimjjewett at gmail.com Wed Jan 5 06:58:32 2011 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 5 Jan 2011 00:58:32 -0500 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: On Tue, Jan 4, 2011 at 5:52 PM, Guido van Rossum wrote: > Would it be sufficient if the __module__ attribute of classes and > functions got set to the "canonical" name rather than the "physical" > name? Not unless it were documented as an acceptable practice supported by the introspection libraries, with examples pointing to stdlib usage in places like elementTree. Even then it may not work out, but that is the rest of the thread; I just wanted to emphasize that this is a case where "yup, it works" isn't good enough, because of confusion over specification vs implementation vs accidentally worked this time. -jJ From tjreedy at udel.edu Wed Jan 5 11:07:58 2011 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 05 Jan 2011 05:07:58 -0500 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: On 1/4/2011 5:52 PM, Guido van Rossum wrote: Nick's concern does not affect me, > On the third hand, maybe you've finally hit upon a reason why the "if > __name__ == '__main__': main()" idiom is bad... but I use this all the time. A suggested alternative and possible eventual replacement: give *every* module an attribute __main__ set to either True or False. Then the idiom would be much simpler and easier to learn and write: 'if __main__: ...'. If there were no other use of the fake '__main__' name, the simple and unconditional replacement would be much less disruptive than, say, the int division change. But the first 10 pages of codesearch on '__main__' shows things like django/test/_doctest.py - 107 identical elif module.__name__ == '__main__': 1850: m = sys.modules.get('__main__') another sys.modules.get(), a sys.modules(), and Formulator/tests/framework.py - many identical 57: if p0 and __name__ == '__main__': 58: os.chdir(p0) The variant conditionals are easy to patch (by hand). The sys.modules lookup suggests that the main module should continue to be keyed under '__main__', even if also keyed under its 'real' name. [Keying modules under a canonical name would eliminate duplicate import bugs, but that is another issue.] -- Terry Jan Reedy From ncoghlan at gmail.com Wed Jan 5 13:15:28 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 5 Jan 2011 22:15:28 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

Message-ID: On Wed, Jan 5, 2011 at 2:47 PM, Guido van Rossum wrote: > On Tue, Jan 4, 2011 at 5:55 PM, Nick Coghlan wrote: >> I can't take credit for that particular observation - I've certainly >> heard others complain about that in the context of pickling objects >> over the years. It is one of the main things that got me thinking >> along these lines in the first place. > > Why didn't you say so in the first place? :-) Well, I did put that "half-baked" disclaimer in for a reason... I'm still trying to figure out exactly what I think the real problem here is, so my expression of it is probably as clear as mud :) > I think it's easier to come up with a solution for just this case; the > issue with e.g. unittest doesn't seem quite as hard (after all, > "unittest.case" will always exist). Perhaps it would focus the discussion if we picked one or two modules (in addition to __main__) as example cases. functools comes in two pieces - partial and reduce are implemented in C in the _functools module, everything else is implemented in Python in functools itself. datetime, on the other hand, is a case of a pure acceleration module - if _datetime is available, it is expected to completely implement the datetime API. _functools.partial and the classes in datetime all adopt the strategy of lying about their original location in __module__. This is probably the best available choice, as it makes pickling do the right thing. The main downside with this approach is the way it confuses things like inspect.getsource (for datetime, it reports the pure Python versions as the source code for the C accelerated versions, for functools.partial it gives a technically accurate, but potentially misleading error message. If inspect could easily *tell* that the accelerated versions were in use, then it could handle the situation a bit more gracefully). To eliminate that issue, what if, whenever we're setting a __module__ attribute (e.g. during class creation), we also set a "__real_module__" attribute? Then code could happily adjust __module__ to point to the official location (as it already does), but tools like inspect wouldn't be fooled regarding the state of the *current* interpreter. Most of the time, __module__ and __real_module__ will point to the same place, but the cases where they're different will be handled far more gracefully. (I suspect that is significantly easier said than done though - I expect it would be a very manual process getting an extension module to do this correctly) > We could just call it __real_name__ and use that in preference over > __name__ for all __module__ attributes whenever it's set. (Or we could > always set both...) The stuff I wrote above applies to pretty much everything *except* the __main__ module. For the __main__ module, I'm inclined to revisit Brett's idea from PEP 3122: put the real name of the __main__ module in a sys.main attribute. However, unlike that PEP, we would continue to set __name__ to "__main__" in the main module. The new attribute would be a transition step allowing manual reversal of the name mangling: # Near top of module if __name__ = "__main__": running_as_main = True import sys __name__ = sys.main # Rest of module # Near end of module if running_as_main: # Actually do "main" type stuff. Alternatively, we could just do nothing about the problem with __main__ and continue to encourage people to separate their "main" modules from the modules that define classes. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Wed Jan 5 14:28:30 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 5 Jan 2011 13:28:30 +0000 Subject: [Python-ideas] python -c "..." should output result like the interpreter In-Reply-To: References:

Message-ID: On 29 December 2010 15:45, Michael Foord wrote: > > > On 29 December 2010 15:18, Georg Brandl wrote: > >> Am 29.12.2010 15:46, schrieb Michael Foord: >> >> > I like the idea, but that's a fairly big semantic change. What about >> > adding an -e option that takes an expression, and prints its value? >> So >> > you'd have >> > >> > python -e "12 / 4.1" >> > >> > (AFAICT, -e is unused at present). >> > >> > That would be great. I did worry that changing the output would be >> backwards >> > incompatible with code that shells out to Python using "-c", so a >> different >> > command line option would be great. So long as it works with multiple >> statements >> > (semi-colon separated) like the current "-c" behaviour. >> >> Hey, what about this little module: >> >> import sys >> for x in sys.argv[1:]: >> exec compile(x, '', 'single') >> >> Then: >> >> $ python -me '1+1; 2+2' >> 2 >> 4 >> > > So now you can `pip install e` and then `python -me`... > Just as a follow up, for which we should still be blaming Georg, you can now do `pip install oo` followed by `python -moo`. (Requires pygame - tested on Linux and Mac should be cross platform.) All the best, Michael Foord > > Michael > > >> >> Georg >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > > -- > > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/different.html > > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at voidspace.org.uk Wed Jan 5 14:42:00 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 5 Jan 2011 13:42:00 +0000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: On 4 January 2011 22:52, Guido van Rossum wrote: > Hmm... I starred this and am finally dug out enough to comment. > > Would it be sufficient if the __module__ attribute of classes and > functions got set to the "canonical" name rather than the "physical" > name? > > You can currently get a crude version of this by simply assigning to > __name__ at the top of the module. > > That sounds like it would be too confusing, however, so perhaps we > could make it so that, when the __module__ attribute is initialized, > it first looks for __canonical__ and then for __name__? > > This may still be too crude though -- I looked at the one example I > could think of where this might be useful, the unittest package, and > realized that it would set __module__ to 'unittest' even for classes > that are not actually re-exported via the unittest namespace. > > So maybe it would be better in that case to just patch the __module__ > attribute of all the public classes in unittest/__import__.py? > > So should I do this in unittest for Python 2.7 / 3.2? The problem this *would* solve is that pickled unittest objects from 2.7 / 3.2 can't be unpickled on earlier versions of Python. I don't know how *real* a problem it is or whether it is worth losing / faking the __module__ information on these classes to solve it. Sure it's a problem that is likely to bite *someone* at some point, but not very many people. If someone is using __module__ information to find source code (or anything else) for a class then changing __module__ will break that, so I'm not convinced it's a worthwhile tradeoff. All the best, Michael > OTOH for things named __main__, setting __canonical__ (automatically, > by -m or whatever other mechanism starts execution, like "python > " might actually work. > > On the third hand, maybe you've finally hit upon a reason why the "if > __name__ == '__main__': main()" idiom is bad... > > --Guido > > On Thu, Dec 30, 2010 at 6:52 PM, Nick Coghlan wrote: > > On Thu, Dec 30, 2010 at 11:48 AM, Ron Adam wrote: > >> This sounds like two different separate issues to me. > >> > >> One is the leaking-out of lower level details. > >> > >> The other is abstracting a framework with the minimal amount of details > >> needed. > > > > Yeah, sort of. Really, the core issue is that some objects live in two > places: > > - where they came from right now, in the current interpreter > > - where they should be retrieved from "officially" (e.g. since another > > interpreter may not provide an accelerated version, or because the > > appropriate submodule may be selected at runtime based on the current > > platform) > > > > There's currently no systematic way of flagging objects or modules > > where the latter location differs from the former, so the components > > that leak the low level details (such as pickling and pydoc) have no > > way to avoid it. Once a system is in place to identify such objects > > (or perhaps just the affected modules), then the places that leak that > > information can be updated to deal with the situation appropriately > > (e.g. pickling would likely just use the official names, while pydoc > > would display both, indicating which one was the 'official' location, > > and which one reflected the current interpreter behaviour). > > > > So it's really one core problem (non-portable module details), which > > then leads to an assortment of smaller problems when other parts of > > the standard library are forced to rely on those non-portable details > > because that's the only information available. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > -- > --Guido van Rossum (python.org/~guido ) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 5 16:57:12 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Jan 2011 01:57:12 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com> Message-ID: On Wed, Jan 5, 2011 at 11:42 PM, Michael Foord wrote: > So should I do this in unittest for Python 2.7 / 3.2? > > The problem this *would* solve is that pickled unittest objects from 2.7 / > 3.2 can't be unpickled on earlier versions of Python. > > I don't know how *real* a problem it is or whether it is worth losing / > faking the __module__ information on these classes to solve it. Sure it's a > problem that is likely to bite *someone* at some point, but not very many > people. If someone is using __module__ information to find source code (or > anything else) for a class then changing __module__ will break that, so I'm > not convinced it's a worthwhile tradeoff. The two examples I looked at (functools and datetime) favoured hiding the implementation details at the cost of causing introspection problems. Despite my comments in the opening post of the thread, I think that is the better trade-off to make. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rrr at ronadam.com Wed Jan 5 18:32:39 2011 From: rrr at ronadam.com (Ron Adam) Date: Wed, 05 Jan 2011 11:32:39 -0600 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

Message-ID: <4D24AB37.7050900@ronadam.com> On 01/05/2011 06:15 AM, Nick Coghlan wrote: > Perhaps it would focus the discussion if we picked one or two modules > (in addition to __main__) as example cases. > > functools comes in two pieces - partial and reduce are implemented in > C in the _functools module, everything else is implemented in Python > in functools itself. > datetime, on the other hand, is a case of a pure acceleration module - > if _datetime is available, it is expected to completely implement the > datetime API. > > _functools.partial and the classes in datetime all adopt the strategy > of lying about their original location in __module__. This is probably > the best available choice, as it makes pickling do the right thing. > > The main downside with this approach is the way it confuses things > like inspect.getsource (for datetime, it reports the pure Python > versions as the source code for the C accelerated versions, for > functools.partial it gives a technically accurate, but potentially > misleading error message. If inspect could easily *tell* that the > accelerated versions were in use, then it could handle the situation a > bit more gracefully). It seems Python tries pretty hard to hide external calls, (the cause of the confusion you mention above). It makes me wonder why python doesn't have an extern type (or types). Then instead of them being a source of confusion, they would be recognisable for what they are. They could have extra attributes to enable pickle and other tools to work in a nice way. Ron From fuzzyman at voidspace.org.uk Wed Jan 5 18:45:46 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 5 Jan 2011 17:45:46 +0000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

Message-ID: On 5 January 2011 15:57, Nick Coghlan wrote: > On Wed, Jan 5, 2011 at 11:42 PM, Michael Foord > wrote: > > So should I do this in unittest for Python 2.7 / 3.2? > > > > The problem this *would* solve is that pickled unittest objects from 2.7 > / > > 3.2 can't be unpickled on earlier versions of Python. > > > > I don't know how *real* a problem it is or whether it is worth losing / > > faking the __module__ information on these classes to solve it. Sure it's > a > > problem that is likely to bite *someone* at some point, but not very many > > people. If someone is using __module__ information to find source code > (or > > anything else) for a class then changing __module__ will break that, so > I'm > > not convinced it's a worthwhile tradeoff. > > The two examples I looked at (functools and datetime) favoured hiding > the implementation details at the cost of causing introspection > problems. Despite my comments in the opening post of the thread, I > think that is the better trade-off to make. > Both of those are because of underlying C implementations where introspection problems would be the default anyway, which isn't quite the same for situation for unittest. Michael > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 5 20:10:18 2011 From: guido at python.org (Guido van Rossum) Date: Wed, 5 Jan 2011 11:10:18 -0800 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

Message-ID: I'm going to have to leave this thread to you all, my main goal was to tease out a better problem description. I think that's been taken care of now. The solution will then follow. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Thu Jan 6 02:52:54 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Jan 2011 11:52:54 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

Message-ID: On Thu, Jan 6, 2011 at 3:45 AM, Michael Foord wrote: > On 5 January 2011 15:57, Nick Coghlan wrote: >> The two examples I looked at (functools and datetime) favoured hiding >> the implementation details at the cost of causing introspection >> problems. Despite my comments in the opening post of the thread, I >> think that is the better trade-off to make. > > Both of those are because of underlying C implementations where > introspection problems would be the default anyway, which isn't quite the > same for situation for unittest. True, but it means the precedent of using __module__ to refer to the official location rather than than the actual location has already been set. That suggests to me our best way forward is to bless that as a recommended practice, then find a way to deal with the negative impact it currently has on introspection (such as a "__real_module__" attribute, as I suggested in another post). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From fuzzyman at voidspace.org.uk Thu Jan 6 13:21:15 2011 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 06 Jan 2011 12:21:15 +0000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

Message-ID: <4D25B3BB.8070800@voidspace.org.uk> On 06/01/2011 01:52, Nick Coghlan wrote: > On Thu, Jan 6, 2011 at 3:45 AM, Michael Foord wrote: >> On 5 January 2011 15:57, Nick Coghlan wrote: >>> The two examples I looked at (functools and datetime) favoured hiding >>> the implementation details at the cost of causing introspection >>> problems. Despite my comments in the opening post of the thread, I >>> think that is the better trade-off to make. >> Both of those are because of underlying C implementations where >> introspection problems would be the default anyway, which isn't quite the >> same for situation for unittest. > True, but it means the precedent of using __module__ to refer to the > official location rather than than the actual location has already > been set. That suggests to me our best way forward is to bless that as > a recommended practice, then find a way to deal with the negative > impact it currently has on introspection (such as a "__real_module__" > attribute, as I suggested in another post). > Well, I would say set __module__ to the official location *when* we have "__real_module__" (or whatever) in place. Changing __module__ breaks inspect.getsource: .>>> import inspect .>>> from unittest import TestCase .>>> TestCase.__module__ 'unittest.case' .>>> TestCase.__module__ = 'unittest' .>>> inspect.getsource(TestCase) Traceback (most recent call last): ... IOError: could not find class definition As the only problem this solves is a theoretical one (so far for unittest anyway) I'm not keen to do this until the introspection issue is resolved. One this is resolved I'm fine with it. All the best, Michael > Cheers, > Nick. > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html From rrr at ronadam.com Fri Jan 7 03:38:07 2011 From: rrr at ronadam.com (Ron Adam) Date: Thu, 06 Jan 2011 20:38:07 -0600 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

Message-ID: <4D267C8F.3050601@ronadam.com> On 01/05/2011 07:52 PM, Nick Coghlan wrote: > On Thu, Jan 6, 2011 at 3:45 AM, Michael Foord wrote: >> On 5 January 2011 15:57, Nick Coghlan wrote: >>> The two examples I looked at (functools and datetime) favoured hiding >>> the implementation details at the cost of causing introspection >>> problems. Despite my comments in the opening post of the thread, I >>> think that is the better trade-off to make. >> >> Both of those are because of underlying C implementations where >> introspection problems would be the default anyway, which isn't quite the >> same for situation for unittest. > > True, but it means the precedent of using __module__ to refer to the > official location rather than than the actual location has already > been set. That suggests to me our best way forward is to bless that as > a recommended practice, then find a way to deal with the negative > impact it currently has on introspection (such as a "__real_module__" > attribute, as I suggested in another post). You could add a private dictionary to sys, that is updated along with sys.modules, which maps module names to real names. And have a function in inspect to retrieve the real name for an object. That sounds like it would do pretty much what you need and doesn't add a top level builtin or global, or change "if __name__ == '__main__': main()". Cheers, Ron From ncoghlan at gmail.com Fri Jan 7 04:28:46 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 7 Jan 2011 13:28:46 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: <4D267C8F.3050601@ronadam.com> References: <4D1BE506.2070806@ronadam.com>

<4D267C8F.3050601@ronadam.com> Message-ID: On Fri, Jan 7, 2011 at 12:38 PM, Ron Adam wrote: > You could add a private dictionary to sys, that is updated along with > sys.modules, which maps module names to real names. ?And have a function in > inspect to retrieve the real name for an object. > > That sounds like it would do pretty much what you need and doesn't add a top > level builtin or global, or change "if __name__ == '__main__': main()". My original suggestion was along those lines, but I've come to the conclusion that it isn't sufficiently granular - when existing code tinkers with "__module__" it tends to do it at the object level rather than by modifying __name__ in the module globals. To turn this into a concrete proposal, here is what I am thinking of specifying in a PEP for 3.3: 1. Implicit configuration of __module__ attributes is updated to check for a definition of "__import_name__" at the module level. If found, then this is used as the value for the __module__ attribute. Otherwise, __module__ is set to __name__ as usual. 2. Any code that currently sets a __module__ attribute (i.e. function and class definitions) will also set an __impl_module__ attribute. This attribute will always be set to the value of __name__. 3. Update and/or augment the relevant C APIs to make it easy to do this for affected extension modules 4. Update inspect.getsource() (and possibly some other introspection functions) to look at __impl_module__ rather than __module__ 5. Update all acceleration (such as _datetime) and "implementation packages" (such as unittest) to set __module__ and __impl_module__ appropriately on exported objects 6. Update the __main__ execution logic (including both the builtin logic and runpy) to insert the __main__ module into sys.modules as both "__main__" and the module's real name (i.e. the name that would result in a second copy of the module ending up in sys.modules if you imported it) 7. Update the __main__ execution logic to set __import_name__ to the actual name of the module. So we end up with two new magic attributes: __import_name__: optional module level attribute that indicates a preferred alternative to __name__ for accessing the module. contents. Alters the value of __module__ for classes and functions defined in the module. Implicitly set for the __main__ module. __impl_module__: implicitly set on objects with a __module__ attribute to allow __module__ to be altered to refer to an object's preferred import location without losing the actual implementation location of the object Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mmanns at gmx.net Fri Jan 7 11:24:35 2011 From: mmanns at gmx.net (Martin Manns) Date: Fri, 7 Jan 2011 11:24:35 +0100 Subject: [Python-ideas] Add irange with large integer step support to itertools Message-ID: <20110107112435.2ae46c89@Knock> Hi I would like to propose an addition of an "irange" function to itertools. This addition could reduce testing effort when developing applications, in which large integers show up. Both, xrange (Python 2.x) and range (Python 3.x) have limited support for large integer step values, for example: Python 3.1.3 (r313:86834, Nov 28 2010, 10:01:07) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> range(10**10000, 10**10000+10**1000, 10**900)[5] Traceback (most recent call last): File "", line 1, in OverflowError: Python int too large to convert to C ssize_t The code below is untested and for clarification only. It has been taken and modified from [issue7721] http://bugs.python.org/issue7721 With irange, no OverflowError is thrown: >>> from itertools import islice >>> from irange import irange >>> def nth(iterable, n, default=None): ... "Returns the nth item or a default value" ... return next(islice(iterable, n, None), default) ... >>> nth(irange(10**10000, 10**10000+10**1000, 10**900), 5) 100000000000000 ... ## Code snippet (untested) from itertools import count, takewhile def irange(start, stop=None, step=1): """Range for long integers Usage: irange([start], stop, [step]) Parameters ---------- start: Integer stop: Integer step: Integer, defaults to 1 """ if start is None: raise TypeError("range() integer argument expected, got NoneType") if stop is None: stop = start start = 0 if step is None: step = 1 if step > 0: if stop < start: return (_ for _ in []) cond = lambda x: x < stop elif step < 0: if stop > start: return (_ for _ in []) cond = lambda x: x > stop else: raise ValueError("irange() step argument must not be zero") return takewhile(cond, (start + i * step for i in count())) ## End code snippet Does such an addition make sense in your eyes? Regards Martin From rrr at ronadam.com Sat Jan 8 10:06:31 2011 From: rrr at ronadam.com (Ron Adam) Date: Sat, 08 Jan 2011 03:06:31 -0600 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

<4D267C8F.3050601@ronadam.com> Message-ID: <4D282917.3020606@ronadam.com> On 01/06/2011 09:28 PM, Nick Coghlan wrote: > On Fri, Jan 7, 2011 at 12:38 PM, Ron Adam wrote: >> You could add a private dictionary to sys, that is updated along with >> sys.modules, which maps module names to real names. And have a function in >> inspect to retrieve the real name for an object. >> >> That sounds like it would do pretty much what you need and doesn't add a top >> level builtin or global, or change "if __name__ == '__main__': main()". > > My original suggestion was along those lines, but I've come to the > conclusion that it isn't sufficiently granular - when existing code > tinkers with "__module__" it tends to do it at the object level rather > than by modifying __name__ in the module globals. What do you mean by *tinkers with "__module__"* ? Do you have an example where/when that is needed? > To turn this into a concrete proposal, here is what I am thinking of > specifying in a PEP for 3.3: > > 1. Implicit configuration of __module__ attributes is updated to check > for a definition of "__import_name__" at the module level. If found, > then this is used as the value for the __module__ attribute. > Otherwise, __module__ is set to __name__ as usual. If __import_name__ is going to match __module__ everywhere else, why not just call it __module__ every where? Would __package__ be changed in any way? > 2. Any code that currently sets a __module__ attribute (i.e. function > and class definitions) will also set an __impl_module__ attribute. > This attribute will always be set to the value of __name__. So we will have: __package__, __module__, __import_name__, __impl_name__, and if you also include __file__ and __path__, that makes six different attributes for describing where something came from. I don't know about you, but this bothers me a bit. :-/ How about reconsidering going the other direction: 1. Add __module__ to module level name space. +1 2. Add a module registry that uses the __module__ attribute to get a module_location_info object, which would have all the useful location info in it. (including the real name of "__main__") If __name__ and __module__ are not changed, Programs that use those won't break. Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.) Does this fit some of problems you are thinking of where the granularity may matter? It would take two functions to do this. One to create the virtual module, and another to pre-load it's initial objects. For those objects, the loader would set obj.__module__ to the virtual module name, and also set obj.__original_module__ to the original module name. These would only be seen on objects in virtual modules. A lookup on obj.__module__ will tell you it's in a virtual module. Then a lookup with obj.__original_module__ would give you the actual location info it came from. By doing it that way, most people will never need to know how these things work or even see them. ie... It's advance/expert Python foo. ;-) Any way, I hope this gives you some ideas, I know you can figure out the details much better than I can. Cheers, Ron From rrr at ronadam.com Sun Jan 9 02:20:22 2011 From: rrr at ronadam.com (Ron Adam) Date: Sat, 08 Jan 2011 19:20:22 -0600 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: <4D282917.3020606@ronadam.com> References: <4D1BE506.2070806@ronadam.com>

<4D267C8F.3050601@ronadam.com> <4D282917.3020606@ronadam.com> Message-ID: <4D290D56.8020406@ronadam.com> On 01/08/2011 03:06 AM, Ron Adam wrote: > So we will have: __package__, __module__, __import_name__, __impl_name__, > and if you also include __file__ and __path__, that makes six different > attributes for describing where something came from. And also add __cached__ to that list. > I don't know about you, but this bothers me a bit. :-/ From ncoghlan at gmail.com Sun Jan 9 07:39:24 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 9 Jan 2011 16:39:24 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: <4D282917.3020606@ronadam.com> References: <4D1BE506.2070806@ronadam.com>

<4D267C8F.3050601@ronadam.com> <4D282917.3020606@ronadam.com> Message-ID: On Sat, Jan 8, 2011 at 7:06 PM, Ron Adam wrote: > On 01/06/2011 09:28 PM, Nick Coghlan wrote: >> My original suggestion was along those lines, but I've come to the >> conclusion that it isn't sufficiently granular - when existing code >> tinkers with "__module__" it tends to do it at the object level rather >> than by modifying __name__ in the module globals. > > What do you mean by *tinkers with "__module__"* ? > > Do you have an example where/when that is needed? >>> from inspect import getsource >>> from functools import partial >>> partial.__module__ 'functools' >>> getsource(partial) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.6/inspect.py", line 689, in getsource lines, lnum = getsourcelines(object) File "/usr/lib/python2.6/inspect.py", line 678, in getsourcelines lines, lnum = findsource(object) File "/usr/lib/python2.6/inspect.py", line 552, in findsource raise IOError('could not find class definition') IOError: could not find class definition partial is actually implemented in C in the _functools module, hence the failure of the getsource call. However, it officially lives in functools for pickling purposes (other implementations aren't obliged to provide _functools at all), so __module__ is adjusted appropriately. The other examples I have been using are the _datetime C acceleration module and the unittest pseudo-package. >> 1. Implicit configuration of __module__ attributes is updated to check >> for a definition of "__import_name__" at the module level. If found, >> then this is used as the value for the __module__ attribute. >> Otherwise, __module__ is set to __name__ as usual. > > If __import_name__ is going to match __module__ everywhere else, why not > just call it __module__ every where? Because the module level attributes for identifying the module don't serve the same purpose as the attributes identifying where functions and classes are defined. That said, calling it "__module__" would probably work, and make the naming logic a bit more intuitive. The precedent for that attribute name to refer to a string rather than a module object was set a long time ago, after all. > Would __package__ be changed in any way? To look for __module__ before checking __name__? No, since doing that would make it unnecessarily difficult to use relative imports inside pseudo-packages. >> 2. Any code that currently sets a __module__ attribute (i.e. function >> and class definitions) will also set an __impl_module__ attribute. >> This attribute will always be set to the value of __name__. > > So we will have: ?__package__, __module__, __import_name__, __impl_name__, > ?and if you also include __file__ and __path__, that makes six different > attributes for describing where something came from. > > I don't know about you, but this bothers me a bit. :-/ It bothers me a lot, since I probably could have avoided at least some of it by expanding the scope of PEP 366. However, it does help to split them out into the different contexts and look at how each of them are used, since it makes it clear that there are a lot of attributes because there is a fair bit of information that is used in different ways. Module level attributes relating to location in the external environment: __file__: typically refers to a source file, but is not required to (see PEP 302) __path__: package attribute used to identify the directory (or directories) searched for submodules __loader__: PEP 302 loader reference (may not exist for ordinary filesystem imports) __cached__: if it exists, refers to a compiled bytecode file (see PEP 3149) It is important to understand that ever since PEP 302, *there is no loader independent mapping* between any of these external environment related attributes and the module namespace. Some Python standard library code (i.e. multiprocessing) currently assumes such a mapping exists and it is broken on windows right now as a direct result of that incorrect assumption (other code explicitly disclaims support for PEP 302 loaded modules and only works with actual files and directories). Module level attributes relating to location within the module namespace: __name__: actual name of current module in the current interpreter instance. Best choice for introspection of the current interpreter. __module__ (*new*): "official" portable name for module contents (components should never include leading underscores). Best choice for information that should be portable to other interpreters (e.g. for pickling and other serialisation formats) __package__: optional attribute used specifically to control handling of relative imports. May be explicitly set (e.g. by runpy), otherwise implicitly set to "__name__.rpartion('.')[0]" by the first relative import. Most of the time, __name__ is consistent across all 3 use cases, in which case __package__ and __import_name__ are redundant. However, when __name__ is wrong for some reason (e.g. including an implementation detail, or adjusted to "__main__" for execution as a script), then __package__ allows relative imports to be fixed, while __import_name__ will allow pickling and other operations that should hide implementation details to be fixed. Object level attributes relating to location of class and function definitions: __module__ (*updated*): refers to __module__ from originating module (if defined) and to __name__, otherwise __impl_module__ (*new*): refers to __name__ from originating module Looking at that write-up, I do quite like the idea of reusing __module__ for the new module level attribute. > Also consider having virtual modules, where objects in it may have come from > different *other* locations. A virtual module would need a way to keep track > of that. (I'm not sure this is a good idea.) It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done *right* (i.e. without having to choose which use cases to support and which ones to break). That basic problem is that __module__ currently tries to serve two masters: 1. use cases like inspect.getsource, where we want to know where the object came from in the current interpreter 2. use cases like pickle, where we want the "official" portable location, with any implementation details (like the _functools module) hidden. Currently, the default behaviour of the interpreter is to support use case 1 and break use case 2 if any objects are defined in a different module from where they claim to live (e.g. see the pickle compatibility breakage with the 3.2 unittest implementation layout changes). The only tool currently available to module authors is to override __module__ (as functools.partial and the datetime acceleration module do), which is correct for use case 2, but breaks use case 1 (leading to misleading error messages in the C acceleration module case, and breaking otherwise valid introspection in the unittest case). My proposed changes will: a) make overriding __module__ significantly easier to do b) allow the introspection use cases access to the information they need so they can do the right thing when confronted with an overridden __module__ attribute > Does this fit some of problems you are thinking of where the granularity may > matter? > > It would take two functions to do this. ?One to create the virtual module, > and another to pre-load it's initial objects. ?For those objects, the loader > would set obj.__module__ to the virtual module name, and also set > obj.__original_module__ to the original module name. ?These would only be > seen on objects in virtual modules. ?A lookup on obj.__module__ will tell > you it's in a virtual module. ?Then a lookup with obj.__original_module__ > would give you the actual location info it came from. That adds a lot of complexity though - far simpler to define a new __impl_module__ attribute on every object, retroactively fixing introspection of existing code that adjusts __module__ to make pickling work properly across different versions and implementations. > By doing it that way, most people will never need to know how these things > work or even see them. ?ie... It's advance/expert Python foo. ;-) Most people will never need to care or worry about the difference between __module__ and __impl_module__ either - it will be hidden inside libraries like inspect, pydoc and pickle. > Any way, I hope this gives you some ideas, I know you can figure out the > details much better than I can. Yeah, the idea of reusing the __module__ attribute name at the top level is an excellent one. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From rrr at ronadam.com Sun Jan 9 18:56:24 2011 From: rrr at ronadam.com (Ron Adam) Date: Sun, 09 Jan 2011 11:56:24 -0600 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: References: <4D1BE506.2070806@ronadam.com>

<4D267C8F.3050601@ronadam.com> <4D282917.3020606@ronadam.com> Message-ID: <4D29F6C8.2010505@ronadam.com> On 01/09/2011 12:39 AM, Nick Coghlan wrote: >> Also consider having virtual modules, where objects in it may have come from >> different *other* locations. A virtual module would need a way to keep track >> of that. (I'm not sure this is a good idea.) > It's too late, code already does that. This is precisely the use case > I am trying to fix (objects like functools.partial that deliberately > lie in their __module__ attribute), so that this can be done *right* > (i.e. without having to choose which use cases to support and which > ones to break). Yes, __builtins__ is a virtual module. Creating a module in memory... >>> import imp >>> new = imp.new_module("new") >>> new The term "(built-in)" doesn't quite fit in this case. But I can get used to it. >>> sys.modules[new.__name__] Traceback (most recent call last): File "", line 1, in KeyError: 'new' And it's not in sys.modules yet. That's ok, other things can be loaded into it before it's added it to sys.modules. It's this loading part that can be improved. > That basic problem is that __module__ currently tries to serve two masters: > 1. use cases like inspect.getsource, where we want to know where the > object came from in the current interpreter > 2. use cases like pickle, where we want the "official" portable > location, with any implementation details (like the _functools module) > hidden. Most C extensions are written as modules, to be imported and imported from. A tool to load objects rather than import them, may be better in these situations. partial = imp.load_extern_object("_functools.partial") A loaded object would have it's __module__ attribute set to the module it's loaded into instead of where it came from. By doing it this way, it doesn't complicate the import semantics. It may also be useful to make it a special type, so that other tools can decide how to handle them. > Currently, the default behaviour of the interpreter is to support use > case 1 and break use case 2 if any objects are defined in a different > module from where they claim to live (e.g. see the pickle > compatibility breakage with the 3.2 unittest implementation layout > changes). The only tool currently available to module authors is to > override __module__ (as functools.partial and the datetime > acceleration module do), which is correct for use case 2, but breaks > use case 1 (leading to misleading error messages in the C acceleration > module case, and breaking otherwise valid introspection in the > unittest case). > > My proposed changes will: > a) make overriding __module__ significantly easier to do > b) allow the introspection use cases access to the information they > need so they can do the right thing when confronted with an overridden > __module__ attribute It would be better to find solutions that don't override __module__ after it has been imported or loaded. >> Does this fit some of problems you are thinking of where the granularity may >> matter? >> >> It would take two functions to do this. One to create the virtual module, >> and another to pre-load it's initial objects. For those objects, the loader >> would set obj.__module__ to the virtual module name, and also set >> obj.__original_module__ to the original module name. These would only be >> seen on objects in virtual modules. A lookup on obj.__module__ will tell >> you it's in a virtual module. Then a lookup with obj.__original_module__ >> would give you the actual location info it came from. > > That adds a lot of complexity though - far simpler to define a new > __impl_module__ attribute on every object, retroactively fixing > introspection of existing code that adjusts __module__ to make > pickling work properly across different versions and implementations. > >> By doing it that way, most people will never need to know how these things >> work or even see them. ie... It's advance/expert Python foo. ;-) > > Most people will never need to care or worry about the difference > between __module__ and __impl_module__ either - it will be hidden > inside libraries like inspect, pydoc and pickle. I think __impl_module__ should only be on objects where it would be different than __module__. >> Any way, I hope this gives you some ideas, I know you can figure out the >> details much better than I can. > Yeah, the idea of reusing the __module__ attribute name at the top > level is an excellent one. The hard part of all of this, is separating out the the good doable ideas from the good, but unfortunately can't do ideas because it will break something ideas. Cheers, Ron From ncoghlan at gmail.com Sun Jan 9 19:18:42 2011 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 10 Jan 2011 04:18:42 +1000 Subject: [Python-ideas] Module aliases and/or "real names" In-Reply-To: <4D29F6C8.2010505@ronadam.com> References: <4D1BE506.2070806@ronadam.com>

<4D267C8F.3050601@ronadam.com> <4D282917.3020606@ronadam.com>