From rrr at ronadam.com Wed Sep 5 08:59:42 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 05 Sep 2007 01:59:42 -0500 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index Message-ID: <46DE53DE.7070803@ronadam.com> Could we add the ability of str.index and str.find to accept a tuple as the first argument and return the index of the first item found in it. This is similar to how str.startswith and str.endswith already works. | startswith(...) | S.startswith(prefix[, start[, end]]) -> bool | | Return True if S starts with the specified prefix, False otherwise. | With optional start, test S beginning at that position. | With optional end, stop comparing S at that position. | prefix can also be a tuple of strings to try. This would speed up cases of filtering and searching when more than one item is being searched for. It would also simplify building iterators that filter and yield multiple items in order. A general google code search seems to show it's a generally useful thing to do. http://www.google.com/codesearch?hl=en&lr=&q=%22findfirst%22+string&btnG=Search (searching for python specific code doesn't show much because python doesn't have a findfirst function of any type.) Cheers, Ron From guido at python.org Wed Sep 5 17:05:43 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 5 Sep 2007 08:05:43 -0700 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: <46DE53DE.7070803@ronadam.com> References: <46DE53DE.7070803@ronadam.com> Message-ID: I was surprised to find that startswith and endswith support this, but it does make sense. Adding a patch to 2.6 would cause it to be merged into 3.0 soon enough. On 9/4/07, Ron Adam wrote: > > Could we add the ability of str.index and str.find to accept a tuple as the > first argument and return the index of the first item found in it. > > This is similar to how str.startswith and str.endswith already works. > > | startswith(...) > | S.startswith(prefix[, start[, end]]) -> bool > | > | Return True if S starts with the specified prefix, False otherwise. > | With optional start, test S beginning at that position. > | With optional end, stop comparing S at that position. > | prefix can also be a tuple of strings to try. > > > This would speed up cases of filtering and searching when more than one > item is being searched for. It would also simplify building iterators that > filter and yield multiple items in order. > > > A general google code search seems to show it's a generally useful thing to > do. > > http://www.google.com/codesearch?hl=en&lr=&q=%22findfirst%22+string&btnG=Search > > > (searching for python specific code doesn't show much because python > doesn't have a findfirst function of any type.) > > > Cheers, > Ron > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From terry at jon.es Wed Sep 5 18:14:30 2007 From: terry at jon.es (Terry Jones) Date: Wed, 5 Sep 2007 18:14:30 +0200 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: Your message at 08:05:43 on Wednesday, 5 September 2007 References: <46DE53DE.7070803@ronadam.com> Message-ID: <18142.54758.609458.513647@terry-jones-computer.local> >>>>> "Guido" == Guido van Rossum writes: Guido> I was surprised to find that startswith and endswith support this, Guido> but it does make sense. Adding a patch to 2.6 would cause it to be Guido> merged into 3.0 soon enough. Guido> On 9/4/07, Ron Adam wrote: >> Could we add the ability of str.index and str.find to accept a tuple as the >> first argument and return the index of the first item found in it. Hi If someone is going to head down this path, it might be better to implement a more general algorithm and provide the above as a special case via an argument. There's a fast and beautiful algorithm due to Aho & Corasick (CACM, 1975) that finds _all_ matches of a set of patterns. It runs in time that's linear in max(sum of the lengths of the patterns to be matched, length of to-be-matched text). The algorithm was the basis of fgrep. To provide the above, a special case could have it return as soon as it found a first match (i.e., of any pattern). One general way to write it would be to have it return a dict of patterns and result indices in the case that the pattern argument is a tuple. So "Hey, look, look, look at these patterns".find(('look', 'for', 'these', 'patterns')) might return { 'look' : [ 5, 11, 17 ], 'for' : [ ], # or arguably [ -1 ], 'these' : [ 25 ], 'patterns' : [ 31 ], } OK, that's a bit of a departure from the normal behavior of find, but so is passing a tuple of patterns. Alternately, you could also get back a tuple of (tuples of) matching indices. The ideal calling interface and result depends on what you need to do - check if a specific string matched? Just know the first match offset, etc. I don't know the best solution, but the algorithm rocks. Raymond - you'll love it :-) Terry From terry at jon.es Wed Sep 5 18:37:34 2007 From: terry at jon.es (Terry Jones) Date: Wed, 5 Sep 2007 18:37:34 +0200 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: Your message at 18:14:30 on Wednesday, 5 September 2007 References: <46DE53DE.7070803@ronadam.com> <18142.54758.609458.513647@terry-jones-computer.local> Message-ID: <18142.56142.262681.951910@terry-jones-computer.local> >>>>> "Terry" == Terry Jones writes: >>>>> "Guido" == Guido van Rossum writes: Guido> I was surprised to find that startswith and endswith support this, Guido> but it does make sense. Adding a patch to 2.6 would cause it to be Guido> merged into 3.0 soon enough. Guido> On 9/4/07, Ron Adam wrote: >>> Could we add the ability of str.index and str.find to accept a tuple as the >>> first argument and return the index of the first item found in it. I should have added a few more comments. If you're going to implement the original desired functionality and make it run quickly, you're probably going to dream up something along the lines of what Aho & Corasick did so beautifully. It's tricky to get it right. As you walk the text string, several patterns may be currently matching. But the next char you consider might cause one or more of the current matches to fail, or a currently non-matching pattern to begin to match. The A&C algorithm builds a trie with failure arcs, so the matching is linear (both linear time to build the trie and the failure arcs, and then linear to walk the trie with the text). It has accepting states, so you know as soon as something matches, and can quit early. If this is going to be implemented you may as well do it right the first time. You could also return a dict in which (pattern) keys are absent if they didn't match at all. Then it would be fast to tell which, if any, patterns matched - no need to step through all passed patterns, just use result.keys() to get them. Terry From grosser.meister.morti at gmx.net Wed Sep 5 18:50:21 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Wed, 05 Sep 2007 18:50:21 +0200 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: <18142.54758.609458.513647@terry-jones-computer.local> References: <46DE53DE.7070803@ronadam.com> <18142.54758.609458.513647@terry-jones-computer.local> Message-ID: <46DEDE4D.4010709@gmx.net> Terry Jones wrote: >>>>>> "Guido" == Guido van Rossum writes: > Guido> I was surprised to find that startswith and endswith support this, > Guido> but it does make sense. Adding a patch to 2.6 would cause it to be > Guido> merged into 3.0 soon enough. > > Guido> On 9/4/07, Ron Adam wrote: >>> Could we add the ability of str.index and str.find to accept a tuple as the >>> first argument and return the index of the first item found in it. > > Hi > > If someone is going to head down this path, it might be better to implement > a more general algorithm and provide the above as a special case via an > argument. > > There's a fast and beautiful algorithm due to Aho & Corasick (CACM, 1975) > that finds _all_ matches of a set of patterns. It runs in time that's > linear in max(sum of the lengths of the patterns to be matched, length of > to-be-matched text). The algorithm was the basis of fgrep. > > To provide the above, a special case could have it return as soon as it > found a first match (i.e., of any pattern). > > One general way to write it would be to have it return a dict of patterns > and result indices in the case that the pattern argument is a tuple. > > So > > "Hey, look, look, look at these patterns".find(('look', 'for', 'these', 'patterns')) > > might return > > { > 'look' : [ 5, 11, 17 ], > 'for' : [ ], # or arguably [ -1 ], > 'these' : [ 25 ], > 'patterns' : [ 31 ], > } > > OK, that's a bit of a departure from the normal behavior of find, but so is > passing a tuple of patterns. Alternately, you could also get back a tuple > of (tuples of) matching indices. > > The ideal calling interface and result depends on what you need to do - > check if a specific string matched? Just know the first match offset, etc. > > I don't know the best solution, but the algorithm rocks. Raymond - you'll > love it :-) > > Terry I would expect such a method to return the index where one of the given strings was found. Or maybe a tuple: (start, end) or a tuple: (start, searchstring). -panzi From terry at jon.es Wed Sep 5 19:01:23 2007 From: terry at jon.es (Terry Jones) Date: Wed, 5 Sep 2007 19:01:23 +0200 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: Your message at 18:50:21 on Wednesday, 5 September 2007 References: <46DE53DE.7070803@ronadam.com> <18142.54758.609458.513647@terry-jones-computer.local> <46DEDE4D.4010709@gmx.net> Message-ID: <18142.57571.484035.769749@terry-jones-computer.local> >>>>> "Mathias" == Mathias Panzenb?ck writes: Mathias> I would expect such a method to return the index where one of the Mathias> given strings was found. Or maybe a tuple: (start, end) or a Mathias> tuple: (start, searchstring). It could do something like that if you passed an argument telling it to quit on the first match. But that makes the return type depend on the passed arg, which I guess is not good. We'd already be doing that if we returned a dict, but this would return either a tuple or a dict. You could drop the dict idea altogether, but you need to consider what to do if many (probably different) patterns match, all starting at the same location in the string. For this reason alone I don't think returning a (start searchstring) tuple is sufficient. Given that Aho & Corasick find everything you could want to know (all matches of all patterns), and that they do it in linear time, it doesn't seem right to throw this information away - especially after going to the trouble of building and walking the trie. Terry From rrr at ronadam.com Wed Sep 5 20:19:36 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 05 Sep 2007 13:19:36 -0500 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: <18142.57571.484035.769749@terry-jones-computer.local> References: <46DE53DE.7070803@ronadam.com> <18142.54758.609458.513647@terry-jones-computer.local> <46DEDE4D.4010709@gmx.net> <18142.57571.484035.769749@terry-jones-computer.local> Message-ID: <46DEF338.10407@ronadam.com> Terry Jones wrote: >>>>>> "Mathias" == Mathias Panzenb?ck writes: > Mathias> I would expect such a method to return the index where one of the > Mathias> given strings was found. Or maybe a tuple: (start, end) or a > Mathias> tuple: (start, searchstring). > > It could do something like that if you passed an argument telling it to > quit on the first match. But that makes the return type depend on the > passed arg, which I guess is not good. We'd already be doing that if we > returned a dict, but this would return either a tuple or a dict. > > You could drop the dict idea altogether, but you need to consider what to > do if many (probably different) patterns match, all starting at the same > location in the string. For this reason alone I don't think returning a > (start searchstring) tuple is sufficient. I was thinking of something a bit more light weight. For more complex stuff I think the 're' module already does pretty much what you are describing. It may even already take advantage of the algorithms you referred to. If not, that would be an important improvement to the re module. :-) The use case I had in mind was to find starting and ending delimiters. And to avoid the following type of awkward code. (This would work for finding other things as well of course.) start = 0 while start < len(s): i1 = s.find('{', start) if i1 == -1: i1 = len(s) i2 = s.find('}', start) if i2 == -1: i2 = len(s) # etc... for as many search terms as you have... # or use a loop to locate each one. start = min(i1, i2) if start == len(s): break ... # do something with s[start] ... That works but it has to go through the string once for each item. Of course I would use 're' for anything more complex than a few fixed length terms. The above could be simplified greatly to the following and be much quicker over what we have now and still not be overly complex. start = 0 while start < len(s): try: start = s.index(('{', '}'), start) except ValueError: break ... # do something with s[start] ... > Given that Aho & Corasick find everything you could want to know (all > matches of all patterns), and that they do it in linear time, it doesn't > seem right to throw this information away - especially after going to the > trouble of building and walking the trie. Thanks for the reference, I'll look into it. :-) If the function returns something other than a simple index, then I think it will need to be a new function or method and not just an alteration of str.index and str.find. In that case it may also need a PEP. Cheers, Ron From terry at jon.es Wed Sep 5 21:55:44 2007 From: terry at jon.es (Terry Jones) Date: Wed, 5 Sep 2007 21:55:44 +0200 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: Your message at 13:19:36 on Wednesday, 5 September 2007 References: <46DE53DE.7070803@ronadam.com> <18142.54758.609458.513647@terry-jones-computer.local> <46DEDE4D.4010709@gmx.net> <18142.57571.484035.769749@terry-jones-computer.local> <46DEF338.10407@ronadam.com> Message-ID: <18143.2496.630054.862695@terry-jones-computer.local> Hi Ron >>>>> "Ron" == Ron Adam writes: Ron> I was thinking of something a bit more light weight. Ah, now we got to what you actually want to do :-) Ron> For more complex stuff I think the 're' module already does pretty Ron> much what you are describing. It may even already take advantage of Ron> the algorithms you referred to. If not, that would be an important Ron> improvement to the re module. :-) Yes, that would make a good SoC project. But as you say it may already be done that way. Ron> The use case I had in mind was to find starting and ending delimiters. Ron> And to avoid the following type of awkward code. (This would work for Ron> finding other things as well of course.) start = 0 while start < len(s): i1 = s.find('{', start) if i1 == -1: i1 = len(s) i2 = s.find('}', start) if i2 == -1: i2 = len(s) # etc... for as many search terms as you have... # or use a loop to locate each one. start = min(i1, i2) if start == len(s): break ... # do something with s[start] ... Ron> That works but it has to go through the string once for each item. It's worse than that. _Each time around the loop_ it tests all candidates against all of the remaining text. Imagine matching L patterns that were each 'a' * M against a text of 'a' * N. You're going to do (roughly) O(N * L * M) comparisons, using naive string matching. You could completely drop patterns that have already returned -1. You can short-circuit a loop if you ever got a 0 index back. Sorry if this seems picky - I guess you're just writing quick pseudo-code. Ron> Of course I would use 're' for anything more complex than a few fixed Ron> length terms. Yes. It depends partly on what you want back. You could write a super-fast iterator based on A&C that told you everything you need to know, with guaranteed linear worst case behavior, but that seems like overkill here (and it may be in re, as you say). Ron> If the function returns something other than a simple index, then I Ron> think it will need to be a new function or method and not just an Ron> alteration of str.index and str.find. In that case it may also need a Ron> PEP. Be my guest :-) Regards, Terry From rrr at ronadam.com Thu Sep 6 00:59:41 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 05 Sep 2007 17:59:41 -0500 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: <18143.2496.630054.862695@terry-jones-computer.local> References: <46DE53DE.7070803@ronadam.com> <18142.54758.609458.513647@terry-jones-computer.local> <46DEDE4D.4010709@gmx.net> <18142.57571.484035.769749@terry-jones-computer.local> <46DEF338.10407@ronadam.com> <18143.2496.630054.862695@terry-jones-computer.local> Message-ID: <46DF34DD.4000500@ronadam.com> Terry Jones wrote: > It's worse than that. _Each time around the loop_ it tests all candidates > against all of the remaining text. Imagine matching L patterns that were > each 'a' * M against a text of 'a' * N. You're going to do (roughly) O(N * > L * M) comparisons, using naive string matching. Yep, it's bad, which is why I don't want to do it this way. Of course it's much better than ... for char in string: ... etc ... > You could completely drop patterns that have already returned -1. You can > short-circuit a loop if you ever got a 0 index back. Sorry if this seems > picky - I guess you're just writing quick pseudo-code. Well its a but if both pseudo-code and from an actual problem I was working on. It was a first unoptimized version. First rule is to get something that works, then make it fast after it's tested. Right? :-) No you aren't being any more picky than I am. I didn't like that solution either which is why I suggested improving index and find just a bit. And I'm not real keen on this next one although it will probably work better. (not tested) length = len(s) cases = dict((c, 0) for c in terms) while 1: for term, i in cases.items(): if i == start: i = s.find(term, start) cases[term] = i if i > -1 else length start = min(cases.values()) if start == length: break ... Do something with s[start] Return some result we constructed or found along the way. That's an improvement, But it's still way more work than I think the problem should need. It's also complex enough that it's no longer obvious just what it's doing. I still like the tuple version of index, or find better. Much easier to read and understand. ... try: start = s.index(('{', '}'), start) except: break ... ... start = s.find(('{', '}'), start) if start == -1: break ... > Ron> Of course I would use 're' for anything more complex than a few fixed > Ron> length terms. > > Yes. It depends partly on what you want back. You could write a super-fast > iterator based on A&C that told you everything you need to know, with > guaranteed linear worst case behavior, but that seems like overkill here > (and it may be in re, as you say). Regular expression can be slower for small things because it has more overhead. It's not always the fasted choice. And I've read they are not good at is parsing nested delimiters. Cheers, RON > Ron> If the function returns something other than a simple index, then I > Ron> think it will need to be a new function or method and not just an > Ron> alteration of str.index and str.find. In that case it may also need a > Ron> PEP. > > Be my guest :-) > > Regards, > Terry From rrr at ronadam.com Thu Sep 6 13:58:40 2007 From: rrr at ronadam.com (Ron Adam) Date: Thu, 06 Sep 2007 06:58:40 -0500 Subject: [Python-ideas] FInd first tuple argument for str.find and str.index In-Reply-To: References: <46DE53DE.7070803@ronadam.com> Message-ID: <46DFEB70.90201@ronadam.com> Guido van Rossum wrote: > I was surprised to find that startswith and endswith support this, but > it does make sense. Adding a patch to 2.6 would cause it to be merged > into 3.0 soon enough. I'll give it a try but it may take me a while to do it. If someone else wants to do this and is more familiar with the string and unicode objects that would be good. Cheers, Ron > On 9/4/07, Ron Adam wrote: >> Could we add the ability of str.index and str.find to accept a tuple as the >> first argument and return the index of the first item found in it. >> >> This is similar to how str.startswith and str.endswith already works. >> >> | startswith(...) >> | S.startswith(prefix[, start[, end]]) -> bool >> | >> | Return True if S starts with the specified prefix, False otherwise. >> | With optional start, test S beginning at that position. >> | With optional end, stop comparing S at that position. >> | prefix can also be a tuple of strings to try. >> >> >> This would speed up cases of filtering and searching when more than one >> item is being searched for. It would also simplify building iterators that >> filter and yield multiple items in order. >> >> >> A general google code search seems to show it's a generally useful thing to >> do. >> >> http://www.google.com/codesearch?hl=en&lr=&q=%22findfirst%22+string&btnG=Search >> >> >> (searching for python specific code doesn't show much because python >> doesn't have a findfirst function of any type.) >> >> >> Cheers, >> Ron >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > From jim_hill_au-24 at yahoo.com.au Sun Sep 9 09:24:47 2007 From: jim_hill_au-24 at yahoo.com.au (Jim Hill) Date: Sun, 09 Sep 2007 17:24:47 +1000 Subject: [Python-ideas] loop, breakif, skip Message-ID: <46E39FBF.9010703@yahoo.com.au> These 4 proposals are somewhat inter-dependent, so I include them in a single message. They are simple ideas, childish almost. Think of non-programmers writing simple scripts, and children learning basic coding in grade school. Hope I'm not wasting your time with nonsense. (I'm not an advanced programmer, so can't be sure.) ------------ Proposal 1 Abolish 'continue' in loops, and use 'skip' instead. In normal English 'continue' means 'carry on at the next line'. In most programming languages 'continue' means 'jump back UP the page to the start of this loop'. This is OK for programmers accustomed to C, but I find it very counter-intuitive, even though i know it means 'continue with the next iteration'. On the other hand, 'skip', meaning 'skip the rest of this iteration', feels more intuitive to me. 'continue' is too long, 'skip' is short. One new keyword. Breaks existing code. ------------ Proposal 2 an alternative to PEP-315, and a simpler way to write while loops of various flavours. part A 'loop:' is exactly equivalent to 'while True:' part B (1) '*breakif ' is exactly equivalent to 'if : break' (2) '*skipif ' is exactly equivalent to 'if : skip' (assuming 'skip' replaces 'continue') * is to make the word easier to find by human eye. would some other character do it better? 2a and 2b together allow while loops to optionally look something like this: loop: *breakif *skipif *breakif and *skipif can be used in for loops too, of course. 3 new keywords. Existing code would not be affected, unless it was already using loop, *breakif or *skipif as names. ------------ Proposal 3 Mainly for young students learning to program. the keyword 'loop' can be placed in front of the keyword 'while' the keyword 'loop' can be placed in front of the keyword 'for' without changing the meaning of 'while' or 'for'. Looks like this loop while : loop for : Allows beginner students the satisfaction of thinking that every kind of loop begins with the word 'loop', which also makes learning a little easier. (Later they will learn that 'loop' can be left out.) Existing code would not be affected. ------------ Proposal 4 If 'continue' is not used in loops, it can have a more meaningful role in switch/case blocks. 'continue' in a Python switch block would have a meaning opposite to that of 'break' in a C switch block, allowing you to do 'fall-through'. Here 'continue' would have its intuitive meaning of 'carry on at the next line'. switch : case : [continue] case : [continue] case : Existing code would not be affected, as switch/case is not implemented yet. ------------ Jim Hill From greg.ewing at canterbury.ac.nz Sun Sep 9 10:21:06 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 09 Sep 2007 20:21:06 +1200 Subject: [Python-ideas] loop, breakif, skip In-Reply-To: <46E39FBF.9010703@yahoo.com.au> References: <46E39FBF.9010703@yahoo.com.au> Message-ID: <46E3ACF2.9050808@canterbury.ac.nz> Jim Hill wrote: > Think of non-programmers writing simple scripts, > and children learning basic coding in grade school. > > Abolish 'continue' in loops, and use 'skip' instead. I wouldn't recommend teaching beginning programmers about continue at all, whatever it's called. It's an unneccessary complication when learning the fundamentals. > 'loop:' is exactly equivalent to 'while True:' > '*breakif ' > '*skipif ' These just look ugly and unpythonic. > the keyword 'loop' can be placed in front of the keyword 'while' > the keyword 'loop' can be placed in front of the keyword 'for' 'Loop' is a piece of programming jargon, not something that would occur readily to someone thinking in everyday terms. Python's way of phrasing its loops is closer to natural English, therefore, one would expect, easier for beginning programmers to get the meaning of. > Allows beginner students the satisfaction of thinking that > every kind of loop begins with the word 'loop', All two of them? I don't think that's a big enough burden on the memory to be worth introducing Another Way To Do It. > 'continue' in a Python switch block would have a meaning > opposite to that of 'break' in a C switch block, > allowing you to do 'fall-through'. I don't think that's any more intuitive than its current meaning. The word 'continue' on its own doesn't really say anything at all about *what* to continue with, except in a context where you're stopped in some way, which is not the case here. So whatever meaning is chosen, it's something that has to be learned. -- Greg From ksankar at doubleclix.net Sun Sep 16 22:59:32 2007 From: ksankar at doubleclix.net (Krishna Sankar) Date: Sun, 16 Sep 2007 13:59:32 -0700 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures Message-ID: <46ED9934.4070201@doubleclix.net> PEP: xxxxxxxx Title: Concurrency for moderately massive (4 to 32 cores) multi-core architectures Version: $Revision$ Last-Modified: $Date$ Author: Krishna Sankar , Status: Wandering ! (as in "Not all those who wander are lost ..." -J.R.R.Tolkien) Type: Process Content-Type: text/x-rst Created: 15-Sep-2007 Abstract -------- This proposal aims at leveraging the multi-core capability as an embedded mechanism in python. It is not whether python is slow or fast, but of performance and control of parallelism/concurrency in a moderately massive parallelism world. The aim is 4 to 32 cores. The proposal advocates two mechanisms - one for task parallelism and another for data intensive parallelism. Scientific computing and web 2.0 frameworks are the forefront users for this proposal. Other applications would benefit as well. Rationale --------- Multicore architectures need no introductions and their ubiquity is evident. It is imperative that Python has one or more standard ways of leveraging multi-core architectures. OTOH, traditional thread based concurrency and lock based exclusions are becoming more and more difficult to program correctly. First of all, the question is not whether py is slow or fast but performance of a system written in py. Which means, ability to leverage multi-core architectures as well as control. Control in term of things like ability to pin one process/task to a core, ability to pin one or more homogeneous tasks to specific cores et al, as well as not wait for a global lock and similar primitives. (Before anybody jumps into a conclusion, this is not about GIL by any means ;o)) Second, it is clear that we need a good solution (not THE solution) for moderately massive parallelism in multi-core architectures (i.e. 8-32 cores). Share nothing might not be optimal; we need some form of memory sharing, not just copy all data via messages. May be functional programming based on the blackboard pattern would work, who knows. I have seen systems saturated still having only ~25% of CPU utilization (in a 4 core system!). It is because we didn't leverage multi-cores and parallelism. So while py3k will not be slow, lack of a cohesive multi-core strategy will show up in system performance and byte us later(pun intended!). At least, in my mind, this is not an exercise about exposing locks and mutexes or threads in Python. I do believe that the GIL will be refactored to more granularity in the coming months (similar to the Global Locks in Linux) and most probably we will get microThreads et al. As we all know, architecture is constraining as well as liberating. The language primitives influence greatly how we think about a problem. In the discussions, Guido is right in insisting on speed, and Bruce is right in asking for language constructs. Without pragmatic speed, folks won't use it; same is the case without the required constructs. Both are barriers to adoption. We have an opportunity to offer a solution for multi-core architectures and let us seize it - we will rush in where angels fear to tread! Programming Models ------------------ There are at least 3 possible paradigms A. conventional threading model B. Functional model, Erlang being the most appropriate C. Some form of limited shared memory model (message passing but pass pointers, blackboard model) D. Others, like Transactional Memory [2] There is enough literature out there, so do not plan to explain these here. ( Do we need more explanation? ) Pragmatic proposal ------------------ May I suggest we embed two primitives in Python 3K: A) A functional style share-nothing set of interfaces (and implementations thereof) - provides the task parallelism/concurrency capability, "small messages, big computations" as Joe Armstrong calls it[3] B) A limited shared memory based model for data intensive parallelism Most probably this would be part of stdlib. While Guido is almost right in saying that this is a (std)library problem, it is not fully so. We would need a few primitives from the underlying PVM substrate. Possibly one reason for Guido's position is the lack of clarity as to what needs to be changed and why. IMHO, just saying take GIL off does not solve the problem either. The Zen of Python parallelism ----------------------------- I draw inspiration for the very timely article by James Reinders in DDJ [1]. It embodies what we should be doing viz.: 1. Refactor the problem into parallel tasks. We cannot help if the domain is sequential 2. Program to abstraction & program chores not cores. Writing correct program using raw threads et al is difficult. Let the underlying substrate decide how best to optimize 3. Design for scale 4. Have an option to turn concurrency off, for debugging 5. Declarative parallelism based mechanisms (?) Related Efforts --------------- The good news is there are at least 2 or 3 paradigms with implementations and rough benchmarks. Hopefully we can leverage the implementations and mature them to stdlib (with required primitives in pvm) Parallel python http://www.artima.com/weblogs/viewpost.jsp?thread=214303 http://cheeseshop.python.org/pypi/parallel Processing http://cheeseshop.python.org/pypi/processing http://code.google.com/p/papyros/ Discussions ----------- There are at least four thread sets (pardon the pun !) I am aware of: 1. The GIL discussions in python-dev and Guido's blog on GIL http://www.artima.com/weblogs/viewpost.jsp?thread=214235 2. The py3k topics started by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214112, response by Guide http://www.artima.com/weblogs/viewpost.jsp?thread=214325 and reply to reply by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214480 3. Python and concurrency http://mail.python.org/pipermail/python-ideas/2007-March/000338.html References ---------- [1]http://www.ddj.com/architect/201804248 [2]Transaction http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=444 [3]Programming Erlang by Joe Armstrong From ksankar at doubleclix.net Sun Sep 16 23:34:10 2007 From: ksankar at doubleclix.net (Krishna Sankar) Date: Sun, 16 Sep 2007 14:34:10 -0700 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <46ED9934.4070201@doubleclix.net> References: <46ED9934.4070201@doubleclix.net> Message-ID: <46EDA152.8040601@doubleclix.net> Folks, For some reason (fat fingers ;o() I missed the introduction to the proposal. Here is the full mail (pardon me for the spam): As a follow-up to the py3k discussions started by Bruce and Guido, I pinged Brett and he suggested I submit an exploratory proposal. Would appreciate insights, wisdom, the good, the bad and the ugly. A) Does it make sense ? B) Which application sets should we consider in designing the interfaces and implementations C) In this proposal, parallelism and concurrency are used in an interchangeable fashion. Thoughts ? D) Please suggest pertinent links, discussions and insights. E) I have kept the proposal to a minimum to start the discussions and to explore if this is the right thing to do. Collaboratively, as we zero-in on one or two approaches, the idea is to expand it to a crisp and clear PEP. Need to do some more formatting as well. ------------------------------------------------------------------------------------------------------------ PEP: xxxxxxxx Title: Concurrency for moderately massive (4 to 32 cores) multi-core architectures Version: $Revision$ Last-Modified: $Date$ Author: Krishna Sankar , Status: Wandering ! (as in "Not all those who wander are lost ..." -J.R.R.Tolkien) Type: Process Content-Type: text/x-rst Created: 15-Sep-2007 Abstract -------- This proposal aims at leveraging the multi-core capability as an embedded mechanism in python. It is not whether python is slow or fast, but of performance and control of parallelism/concurrency in a moderately massive parallelism world. The aim is 4 to 32 cores. The proposal advocates two mechanisms - one for task parallelism and another for data intensive parallelism. Scientific computing and web 2.0 frameworks are the forefront users for this proposal. Other applications would benefit as well. Rationale --------- Multicore architectures need no introductions and their ubiquity is evident. It is imperative that Python has one or more standard ways of leveraging multi-core architectures. OTOH, traditional thread based concurrency and lock based exclusions are becoming more and more difficult to program correctly. First of all, the question is not whether py is slow or fast but performance of a system written in py. Which means, ability to leverage multi-core architectures as well as control. Control in term of things like ability to pin one process/task to a core, ability to pin one or more homogeneous tasks to specific cores et al, as well as not wait for a global lock and similar primitives. (Before anybody jumps into a conclusion, this is not about GIL by any means ;o)) Second, it is clear that we need a good solution (not THE solution) for moderately massive parallelism in multi-core architectures (i.e. 8-32 cores). Share nothing might not be optimal; we need some form of memory sharing, not just copy all data via messages. May be functional programming based on the blackboard pattern would work, who knows. I have seen systems saturated still having only ~25% of CPU utilization (in a 4 core system!). It is because we didn't leverage multi-cores and parallelism. So while py3k will not be slow, lack of a cohesive multi-core strategy will show up in system performance and byte us later(pun intended!). At least, in my mind, this is not an exercise about exposing locks and mutexes or threads in Python. I do believe that the GIL will be refactored to more granularity in the coming months (similar to the Global Locks in Linux) and most probably we will get microThreads et al. As we all know, architecture is constraining as well as liberating. The language primitives influence greatly how we think about a problem. In the discussions, Guido is right in insisting on speed, and Bruce is right in asking for language constructs. Without pragmatic speed, folks won't use it; same is the case without the required constructs. Both are barriers to adoption. We have an opportunity to offer a solution for multi-core architectures and let us seize it - we will rush in where angels fear to tread! Programming Models ------------------ There are at least 3 possible paradigms A. conventional threading model B. Functional model, Erlang being the most appropriate C. Some form of limited shared memory model (message passing but pass pointers, blackboard model) D. Others, like Transactional Memory [2] There is enough literature out there, so do not plan to explain these here. ( Do we need more explanation? ) Pragmatic proposal ------------------ May I suggest we embed two primitives in Python 3K: A) A functional style share-nothing set of interfaces (and implementations thereof) - provides the task parallelism/concurrency capability, "small messages, big computations" as Joe Armstrong calls it[3] B) A limited shared memory based model for data intensive parallelism Most probably this would be part of stdlib. While Guido is almost right in saying that this is a (std)library problem, it is not fully so. We would need a few primitives from the underlying PVM substrate. Possibly one reason for Guido's position is the lack of clarity as to what needs to be changed and why. IMHO, just saying take GIL off does not solve the problem either. The Zen of Python parallelism ----------------------------- I draw inspiration for the very timely article by James Reinders in DDJ [1]. It embodies what we should be doing viz.: 1. Refactor the problem into parallel tasks. We cannot help if the domain is sequential 2. Program to abstraction & program chores not cores. Writing correct program using raw threads et al is difficult. Let the underlying substrate decide how best to optimize 3. Design for scale 4. Have an option to turn concurrency off, for debugging 5. Declarative parallelism based mechanisms (?) Related Efforts --------------- The good news is there are at least 2 or 3 paradigms with implementations and rough benchmarks. Parallel python http://www.artima.com/weblogs/viewpost.jsp?thread=214303 http://cheeseshop.python.org/pypi/parallel Processing http://cheeseshop.python.org/pypi/processing http://code.google.com/p/papyros/ Discussions ----------- There are at least four thread sets (pardon the pun !) I am aware of: 1. The GIL discussions in python-dev and Guido's blog on GIL http://www.artima.com/weblogs/viewpost.jsp?thread=214235 2. The py3k topics started by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214112, response by Guide http://www.artima.com/weblogs/viewpost.jsp?thread=214325 and reply to reply by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214480 3. Python and concurrency http://mail.python.org/pipermail/python-ideas/2007-March/000338.html References [1]http://www.ddj.com/architect/201804248 [2]Transaction http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=444 [3]Programming Erlang by Joe Armstrong From rhamph at gmail.com Mon Sep 17 00:51:18 2007 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 16 Sep 2007 16:51:18 -0600 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <46EDA152.8040601@doubleclix.net> References: <46ED9934.4070201@doubleclix.net> <46EDA152.8040601@doubleclix.net> Message-ID: On 9/16/07, Krishna Sankar wrote: > Folks, > For some reason (fat fingers ;o() I missed the introduction to the > proposal. Here is the full mail (pardon me for the spam): > > As a follow-up to the py3k discussions started by Bruce and Guido, I > pinged Brett and he suggested I submit an exploratory proposal. Would > appreciate insights, wisdom, the good, the bad and the ugly. > > A) Does it make sense ? > B) Which application sets should we consider in designing the > interfaces and implementations > C) In this proposal, parallelism and concurrency are used in an > interchangeable fashion. Thoughts ? > D) Please suggest pertinent links, discussions and insights. > E) I have kept the proposal to a minimum to start the discussions and > to explore if this is the right thing to do. Collaboratively, as we > zero-in on one or two approaches, the idea is to expand it to a crisp > and clear PEP. Need to do some more formatting as well. I've been exploring this problem for a while so I've got some pretty strong opinions. I guess we'll find out if my ideas pass muster. :) ------------------------------------------------------------------------------------------------------------ > PEP: xxxxxxxx > Title: Concurrency for moderately massive (4 to 32 cores) multi-core > architectures > Version: $Revision$ > Last-Modified: $Date$ > Author: Krishna Sankar , > Status: Wandering ! (as in "Not all those who wander are lost ..." > -J.R.R.Tolkien) > Type: Process > Content-Type: text/x-rst > Created: 15-Sep-2007 > > Abstract > -------- > This proposal aims at leveraging the multi-core capability as an > embedded mechanism in python. It is not whether python is slow or fast, > but of performance and control of parallelism/concurrency in a > moderately massive parallelism world. The aim is 4 to 32 cores. The > proposal advocates two mechanisms - one for task parallelism and another > for data intensive parallelism. Scientific computing and web 2.0 > frameworks are the forefront users for this proposal. Other applications > would benefit as well. I'm not sure just what "data intensive" means. I know some is basically a variant on vectorization, but I think that can be done much easier in a library than in the language proper. You're also missing distributed parallelism. There's a large domain in which you want failures to only bring down a single node, you'll willing to sacrifice shared state and consensus, you're willing to put in the extra effort to make it work. That's not ideal for the core of the language (where threading can do far better), but it's important to keep in mind how they play off each other, and which use cases can be better met by either. > > Rationale > --------- > Multicore architectures need no introductions and their ubiquity is > evident. It is imperative that Python has one or more standard ways of > leveraging multi-core architectures. OTOH, traditional thread based > concurrency and lock based exclusions are becoming more and more > difficult to program correctly. > > First of all, the question is not whether py is slow or fast but > performance of a system written in py. Which means, ability to leverage > multi-core architectures as well as control. Control in term of things > like ability to pin one process/task to a core, ability to pin one or > more homogeneous tasks to specific cores et al, as well as not wait for > a global lock and similar primitives. (Before anybody jumps into a > conclusion, this is not about GIL by any means ;o)) I'm not sure how relevant processor affinity (or prioritization for that matter) . I suspect only around 5% of users (if that) will really need it. It seems that you'd need some deep cooperation with the OS to be really successful at it, and the support isn't there today. Of course support will likely improve as manycore becomes common. > Second, it is clear that we need a good solution (not THE solution) for > moderately massive parallelism in multi-core architectures (i.e. 8-32 > cores). Share nothing might not be optimal; we need some form of memory > sharing, not just copy all data via messages. May be functional > programming based on the blackboard pattern would work, who knows. I think share nothing is clearly unacceptable. Things like audio processing, video processing, gaming, or GUIs need it. If you don't support it directly you'll end up with a second-class form of objects where the true content is shared indirectly and a handle gets copied repeatedly. You lose a great deal of dynamism doing that. > I have seen systems saturated still having only ~25% of CPU utilization > (in a 4 core system!). It is because we didn't leverage multi-cores and > parallelism. So while py3k will not be slow, lack of a cohesive > multi-core strategy will show up in system performance and byte us > later(pun intended!). This hints at a major compromise we can make. So long as the semantics are unchanged, we can offer a compile-time option to enable scalable threading, even though it may have a fairly significant amount of overhead. For 1 to 4 cores you'll still want high single-thread performance, but by the time you've got 8 cores it may be an easy decision to switch to a scalable version instead. Packagers could make this as easy as installing a different core package. > At least, in my mind, this is not an exercise about exposing locks and > mutexes or threads in Python. I do believe that the GIL will be > refactored to more granularity in the coming months (similar to the > Global Locks in Linux) and most probably we will get microThreads et al. > As we all know, architecture is constraining as well as liberating. The > language primitives influence greatly how we think about a problem. I've already got a patch/fork with the GIL refactored/removed, so I agree that it'll change. ;) I highly doubt we'll get any real microthreads though: CPython is built on C, and getting away from the C stack (and thus C's threads) is impractical. I agree it's about primitives though. > In the discussions, Guido is right in insisting on speed, and Bruce is > right in asking for language constructs. Without pragmatic speed, folks > won't use it; same is the case without the required constructs. Both are > barriers to adoption. We have an opportunity to offer a solution for > multi-core architectures and let us seize it - we will rush in where > angels fear to tread! > > Programming Models > ------------------ > There are at least 3 possible paradigms > > A. conventional threading model > B. Functional model, Erlang being the most appropriate C. Some form of > limited shared memory model (message passing but pass pointers, > blackboard model) D. Others, like Transactional Memory [2] > > There is enough literature out there, so do not plan to explain these > here. ( Do we need more explanation? ) I'm not sure where my model fits in. What I do is take all the existing python objects and give them a shareable/non-shareable property. If if an object has some explicit semantics (such as a thread-safe queue or an immutable int) then it's shareable; otherwise it's non-shareable. All the communication mechanisms (queues, arguments to spawned threads, etc) check this property, so it becomes impossible to corrupt memory. > Pragmatic proposal > ------------------ > May I suggest we embed two primitives in Python 3K: > A) A functional style share-nothing set of interfaces (and > implementations thereof) - provides the task parallelism/concurrency > capability, "small messages, big computations" as Joe Armstrong calls it[3] > B) A limited shared memory based model for data intensive parallelism > > Most probably this would be part of stdlib. While Guido is almost right > in saying that this is a (std)library problem, it is not fully so. We > would need a few primitives from the underlying PVM substrate. Possibly > one reason for Guido's position is the lack of clarity as to what needs > to be changed and why. IMHO, just saying take GIL off does not solve the > problem either. I agree that it's *mostly* a stdlib problem. The breadth of the useful tools will simply be part of the library. There are special cases where language modifications are needed though. > The Zen of Python parallelism > ----------------------------- > I draw inspiration for the very timely article by James Reinders in DDJ > [1]. It embodies what we should be doing viz.: > 1. Refactor the problem into parallel tasks. We cannot help if the > domain is sequential 2. Program to abstraction & program chores not > cores. Writing correct program using raw threads et al is difficult. Let > the underlying substrate decide how best to optimize 3. Design for scale > 4. Have an option to turn concurrency off, for debugging 5. Declarative > parallelism based mechanisms (?) 4 is made moot by better debuggers. I think that's more practical in the long run. Really, if you have producer and consumer threads, you *can't* flick a switch to make them serial. > Related Efforts > --------------- > The good news is there are at least 2 or 3 paradigms with > implementations and rough benchmarks. > Parallel python http://www.artima.com/weblogs/viewpost.jsp?thread=214303 > http://cheeseshop.python.org/pypi/parallel > Processing http://cheeseshop.python.org/pypi/processing > http://code.google.com/p/papyros/ > > Discussions > ----------- > There are at least four thread sets (pardon the pun !) I am aware of: > 1. The GIL discussions in python-dev and Guido's blog on GIL > http://www.artima.com/weblogs/viewpost.jsp?thread=214235 > 2. The py3k topics started by Bruce > http://www.artima.com/weblogs/viewpost.jsp?thread=214112, response by > Guide http://www.artima.com/weblogs/viewpost.jsp?thread=214325 and reply > to reply by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214480 > 3. Python and concurrency > http://mail.python.org/pipermail/python-ideas/2007-March/000338.html > > > References > [1]http://www.ddj.com/architect/201804248 > [2]Transaction > http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=444 > [3]Programming Erlang by Joe Armstrong I'd like to add a list of practical requirements a design must meet: * It must be composable with traditional single-threaded programs or libraries. Small changes are acceptable; complete redesigns are not. * It must be largely compatible with existing CPython code and extensions. The threading APIs will likely change, but replacing Py_INCREF/Py_DECREF with something else is too much. * It must be useful for a broad set of local problems, without becoming burdened down with speciality features. We don't need direct support for distributed computing. * It needs to be easy, reliable, and robust. Uncaught exceptions should gracefully abort the entire program with a stack trace, deadlocks should be detected and broken, and corruption should be impossible. Open for debate: * How much compatibility should be retained with existing concurrency mechanisms? If we're trying to propose something better then we obviously want to replace them, not add "yet another library", but transition is important too. (I mean this question to broadly apply to event-driven and threaded libraries alike.) -- Adam Olsen, aka Rhamphoryncus From ellisonbg.net at gmail.com Wed Sep 19 03:31:00 2007 From: ellisonbg.net at gmail.com (Brian Granger) Date: Tue, 18 Sep 2007 21:31:00 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures Message-ID: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> Thinking about how Python can better support parallelism and concurrency is an important topic. Here is how I see it: if we don't address the issue, the Python interpreter 5 or 10 years from now will run at roughly the same speed as it does today. This is because single CPU cores are not getting much faster (power consumption is too high). Instead, most of the performance gains in hardware will be due to increased hardware parallelism, which means multi/many core CPUs. What to do about this pending crisis is a complicated issue. There are (at least) two levels that are important: 1. Language level features that make it possible to build higher-level libraries/tools for parallelism. 2. The high-level libraries/tools that most users and developers would use to express parallelism. I think it is absolutely critical that we worry about (1) before jumping to (2). So, some thoughts about (1). Does Python itself need to be changed to better enable people to write libraries for expressing parallelism? My answer to this is no. The dominant languages for parallel computing (C/C++/Fortran) don't really have any additional constructs or features above Python in this respect. Java has a more sophisticated support for threads. Erlang has concurrency built into its core. But, Python is not Erlang or Java. As Twisted demonstrates, Python as a language is plenty powerful enough to express concurrency in an elegant way. I am not saying that parallelism and concurrency is easy or wonderful today in Python, just that the language itself is not the problem. We don't necessarily need new language features, we simply need bright people to sit down and think about the right way to express parallelism in Python and then write libraries (maybe in the stdlib) that implement those ideas. But, there is a critical problem in CPython's implementation that prevents people from really breaking new ground in this area with Python. It is the GIL and here is why: * For the platforms on which Python runs, threads are what the hardware+OS people have given to us as the most fine grained way of mapping parallelism onto hardware. This is true, even if you have philosophical or existential problems with threads. With the limitations of the GIL, we can't take advantage of what hardware gives to us. * A process based solution using message passing is simply not suitable for many parallel algorithms that are communications bound. The shared state of threads is needed in many cases, not because sharing state is a "fantastic idea", but rather because it is fast. This will only become more true as multicore CPUs gain more sophisticated memory architectures with higher bandwidths. Also, the overhead of managing processes is much greater than with threads. Many exellent fine grained parallel approaches like Cilk would not be possible with processes only. * There are a number of powerful, high-level Python packages that already exist (these have been named in the various threads) that allow parallelism to be expressed. All of these suffer from a GIL related problem even though they are process based and use message passing. Regardless of whether you are using blocking/non-blocking sockets/IPC, you can't run long running CPU bound code, because all the network related stuff will stop. You then think, "OK, I will run the CPU intensive stuff in a different thread." If the CPU intensive code is just regular Python, you are fine, the Python interpreter will switch between the network thread and the CPU intensive thread every so often. But the second you run extension code that doesn't release the GIL, you are screwed. The network thread will die until the extension code is done. When it comes to implementing robust process based parallelism using sockets, the last thing you can afford is to have your networking black out like this, and in CPython it can't be avoided. I am not saying that threads are what everyone should be using to express parallelism. I am only saying that they are needed to implement robust higher-level forms of parallelism on multicore systems, regardless of whether the solution is using process+ threads or threads alone. Of the dozen or so "parallel Python" packages that currently exist, they _all_ suffer from this problem (some hide it better than others though using clever tricks). We can run but we can't hide. Because of these things, I think the current "Exploratory PEP" is entirely premature. Let's figure out exactly what to do with the GIL and _then_ think about the fun stuff. Brian From ksankar at doubleclix.net Wed Sep 19 04:29:08 2007 From: ksankar at doubleclix.net (Krishna Sankar) Date: Tue, 18 Sep 2007 19:29:08 -0700 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> Message-ID: <46F08974.4010101@doubleclix.net> Brian, Good points. > We don't necessarily > need new language features, we simply need bright people to sit down >and think about the right way to express parallelism in Python and > then write libraries (maybe in the stdlib) that implement those ideas. Exactly. This PEP is about this thinking about expressing parallelism in py. Also GIL is a challenge only in one implementation (I know, it is an important implementation!) My assumption is that the GIL restriction will be removed one way or another, soon. (same reason, quoting Joe Lewis) What we need to do is not to line them (i.e. GIL removal, parallelism) serially, but work on them simultaneously; that way they will leverage each other. Also, what ever paradigm(s) we zero-in on, can be implemented in other implementations, anyway. Moreover, IMHO, we need not force the GIL issue just yet. I firmly believe that it will find a solution in it's own time frame ... Cheers Brian Granger wrote: > Thinking about how Python can better support parallelism and > concurrency is an important topic. Here is how I see it: if we don't > address the issue, the Python interpreter 5 or 10 years from now will > run at roughly the same speed as it does today. This is because > single CPU cores are not getting much faster (power consumption is too > high). Instead, most of the performance gains in hardware will be due > to increased hardware parallelism, which means multi/many core CPUs. > > What to do about this pending crisis is a complicated issue. > > There are (at least) two levels that are important: > > 1. Language level features that make it possible to build > higher-level libraries/tools for parallelism. > > 2. The high-level libraries/tools that most users and developers > would use to express parallelism. > > I think it is absolutely critical that we worry about (1) before > jumping to (2). So, some thoughts about (1). Does Python itself need > to be changed to better enable people to write libraries for > expressing parallelism? > > My answer to this is no. The dominant languages for parallel > computing (C/C++/Fortran) don't really have any additional constructs > or features above Python in this respect. Java has a more > sophisticated support for threads. Erlang has concurrency built into > its core. But, Python is not Erlang or Java. As Twisted > demonstrates, Python as a language is plenty powerful enough to > express concurrency in an elegant way. I am not saying that > parallelism and concurrency is easy or wonderful today in Python, just > that the language itself is not the problem. We don't necessarily > need new language features, we simply need bright people to sit down > and think about the right way to express parallelism in Python and > then write libraries (maybe in the stdlib) that implement those ideas. > > But, there is a critical problem in CPython's implementation that > prevents people from really breaking new ground in this area with > Python. It is the GIL and here is why: > > * For the platforms on which Python runs, threads are what the > hardware+OS people have given to us as the most fine grained way of > mapping parallelism onto hardware. This is true, even if you have > philosophical or existential problems with threads. With the > limitations of the GIL, we can't take advantage of what hardware gives > to us. > > * A process based solution using message passing is simply not > suitable for many parallel algorithms that are communications bound. > The shared state of threads is needed in many cases, not because > sharing state is a "fantastic idea", but rather because it is fast. > This will only become more true as multicore CPUs gain more > sophisticated memory architectures with higher bandwidths. Also, the > overhead of managing processes is much greater than with threads. > Many exellent fine grained parallel approaches like Cilk would not be > possible with processes only. > > * There are a number of powerful, high-level Python packages that > already exist (these have been named in the various threads) that > allow parallelism to be expressed. All of these suffer from a GIL > related problem even though they are process based and use message > passing. Regardless of whether you are using blocking/non-blocking > sockets/IPC, you can't run long running CPU bound code, because all > the network related stuff will stop. You then think, "OK, I will run > the CPU intensive stuff in a different thread." If the CPU intensive > code is just regular Python, you are fine, the Python interpreter will > switch between the network thread and the CPU intensive thread every > so often. But the second you run extension code that doesn't release > the GIL, you are screwed. The network thread will die until the > extension code is done. When it comes to implementing robust process > based parallelism using sockets, the last thing you can afford is to > have your networking black out like this, and in CPython it can't be > avoided. > > > I am not saying that threads are what everyone should be using to > express parallelism. I am only saying that they are needed to > implement robust higher-level forms of parallelism on multicore > systems, regardless of whether the solution is using process+ threads > or threads alone. > > > Of the dozen or so "parallel Python" packages that currently exist, > they _all_ suffer from this problem (some hide it better than others > though using clever tricks). We can run but we can't hide. > > Because of these things, I think the current "Exploratory PEP" is > entirely premature. Let's figure out exactly what to do with the GIL > and _then_ think about the fun stuff. > > Brian > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > From ellisonbg.net at gmail.com Wed Sep 19 04:58:22 2007 From: ellisonbg.net at gmail.com (Brian Granger) Date: Tue, 18 Sep 2007 22:58:22 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <46F08974.4010101@doubleclix.net> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <46F08974.4010101@doubleclix.net> Message-ID: <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com> > > We don't necessarily > > need new language features, we simply need bright people to sit down > >and think about the right way to express parallelism in Python and > > then write libraries (maybe in the stdlib) that implement those ideas. > > Exactly. This PEP is about this thinking about expressing parallelism in py. While I too love to think about parallelism, until the limitations of the GIL in CPython are fixed, all our grand thoughts will be dead ends at some level. > Also GIL is a challenge only in one implementation (I know, it is an important implementation!) > My assumption is that the GIL restriction will be removed one way or another, soon. (same reason, quoting Joe Lewis) I am not that optimistic I guess. Hopeful though. > What we need to do is not to line them (i.e. GIL removal, parallelism) serially, but work on them simultaneously; that way they will leverage each other. Also, what ever paradigm(s) we zero-in on, can be implemented in other implementations, anyway. If there were infinitely many people willing for work on this stuff, then I agree, but I don't see even a dozen people hacking on the GIL. And in my mind, once the limitations of the GIL are relaxed, the other parallel stuff won't be very difficult given the work that people have already done in this area. > Moreover, IMHO, we need not force the GIL issue just yet. I firmly believe that it will find a solution in it's own time frame ... From rhamph at gmail.com Wed Sep 19 05:30:19 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 18 Sep 2007 21:30:19 -0600 Subject: [Python-ideas] Thread exceptions and interruption Message-ID: One of the core problems with threading is what to do with exceptions and how to gracefully exit when one goes unhandled. My approach is to replace the independently spawned threads with "branches" off of your main thread's call stack. The standard example looks like this[1]: def handle_client(conn, addr): with conn: ... def accept_loop(server_conn): with branch() as clients: with server_conn: while True: clients.add(handle_client, *server_conn.accept()) The call stack will look something like this: main - accept_loop - server_conn.accept |- handle_client \- handle_client Here I use a with-statement[2] to create a branch point. The branch point collects any exceptions from its children and interrupts the children when the first exception occurs. Interruption is done somewhat similarly to posix cancellation; participating functions react to it. However, I raise an Interrupted exception, which can lead to much more graceful cleanup than posix cancellation. ;) The __exit__ portion of branch's with-statement blocks until all child threads have exited. It then reraises the exception, if any, or wraps it in MultipleError if several occurred. The branch construct serves only simple needs. It does not attempt to limit the number of threads to the number of cores available, nor any related tricks. Those can be added as a separate tool (perhaps wrapping branch.) Thoughts? Competing ideas? Disagreement that it's a "core problem" at all? ;) [1] I've previously (in private mostly) referred to the branch() function as collate(). I've recently decided to rename it. [2] Unfortunately, a with-statement lacks all the invariants that would be desirable for the branch construct. It also has no direct way of handling generators-as-context-managers that themselves use branches. -- Adam Olsen, aka Rhamphoryncus From aholkner at cs.rmit.edu.au Wed Sep 19 06:40:09 2007 From: aholkner at cs.rmit.edu.au (Alex Holkner) Date: Wed, 19 Sep 2007 14:40:09 +1000 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: <46F0A829.40908@cs.rmit.edu.au> Adam Olsen wrote: > Here I use a with-statement[2] to create a branch point. The branch > point collects any exceptions from its children and interrupts the > children when the first exception occurs. Interruption is done > somewhat similarly to posix cancellation; participating functions > react to it. However, I raise an Interrupted exception, which can > lead to much more graceful cleanup than posix cancellation. ;) It sounds like you're proposing that a thread can be interrupted at any time. The Java developers realised long ago that this is completely unworkable and deprecated their implementation: http://java.sun.com/j2se/1.3/docs/guide/misc/threadPrimitiveDeprecation.html Please disregard if I misunderstood your approach :-) Alex. From rhamph at gmail.com Wed Sep 19 07:25:09 2007 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 18 Sep 2007 23:25:09 -0600 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: <46F0A829.40908@cs.rmit.edu.au> References: <46F0A829.40908@cs.rmit.edu.au> Message-ID: On 9/18/07, Alex Holkner wrote: > Adam Olsen wrote: > > > Here I use a with-statement[2] to create a branch point. The branch > > point collects any exceptions from its children and interrupts the > > children when the first exception occurs. Interruption is done > > somewhat similarly to posix cancellation; participating functions > > react to it. However, I raise an Interrupted exception, which can > > lead to much more graceful cleanup than posix cancellation. ;) > > It sounds like you're proposing that a thread can be interrupted at any > time. The Java developers realised long ago that this is completely > unworkable and deprecated their implementation: > > http://java.sun.com/j2se/1.3/docs/guide/misc/threadPrimitiveDeprecation.html > > Please disregard if I misunderstood your approach :-) You misunderstood. :) The key word was *participating* functions. Normally this only includes things like file or socket reading. A CPU-bound busy loop may never get interrupted. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Wed Sep 19 18:24:04 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 19 Sep 2007 09:24:04 -0700 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: Regarding the issue of exceptions in threads, I indeed see it as a non-issue. It's easy enough to develop a subclass of threading.Thread which catches any exceptions raised by run(), and stores the exception as an instance variable from which it can be retrieved after join() succeeds. Regarding the proposal of branching the call stack, it reminds me too much of the problems one has when a fork()'ed child raises an exception which ends up being handled by an exception handler higher up in the parent's call stack (which has been faithfully copied into the child process by fork()). That has proven a major problem, leading to various warnings to always catch all exceptions and call os._exit() upon problems. I realize you're not proposing exactly that. I also admit I don't exactly understand how you plan to deal with the situation where one thread raises an exception which the spawning location fails to handle, while another thread is still running (but may raise another exception later). Is the spawning thread unwound? Then what's left to catch the second thread's exception? But all in all it gives me the heebie-jeebies. Finally, may I suggest that you're perhaps too much in love with the with-statement? --Guido On 9/18/07, Adam Olsen wrote: > One of the core problems with threading is what to do with exceptions > and how to gracefully exit when one goes unhandled. My approach is to > replace the independently spawned threads with "branches" off of your > main thread's call stack. > > The standard example looks like this[1]: > > def handle_client(conn, addr): > with conn: > ... > > def accept_loop(server_conn): > with branch() as clients: > with server_conn: > while True: > clients.add(handle_client, *server_conn.accept()) > > The call stack will look something like this: > > main - accept_loop - server_conn.accept > |- handle_client > \- handle_client > > Here I use a with-statement[2] to create a branch point. The branch > point collects any exceptions from its children and interrupts the > children when the first exception occurs. Interruption is done > somewhat similarly to posix cancellation; participating functions > react to it. However, I raise an Interrupted exception, which can > lead to much more graceful cleanup than posix cancellation. ;) > > The __exit__ portion of branch's with-statement blocks until all child > threads have exited. It then reraises the exception, if any, or wraps > it in MultipleError if several occurred. > > The branch construct serves only simple needs. It does not attempt to > limit the number of threads to the number of cores available, nor any > related tricks. Those can be added as a separate tool (perhaps > wrapping branch.) > > Thoughts? Competing ideas? Disagreement that it's a "core problem" at all? ;) > > > [1] I've previously (in private mostly) referred to the branch() > function as collate(). I've recently decided to rename it. > > [2] Unfortunately, a with-statement lacks all the invariants that > would be desirable for the branch construct. It also has no direct > way of handling generators-as-context-managers that themselves use > branches. > > -- > Adam Olsen, aka Rhamphoryncus > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Wed Sep 19 18:53:17 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 19 Sep 2007 10:53:17 -0600 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: On 9/19/07, Guido van Rossum wrote: > Regarding the issue of exceptions in threads, I indeed see it as a > non-issue. It's easy enough to develop a subclass of threading.Thread > which catches any exceptions raised by run(), and stores the exception > as an instance variable from which it can be retrieved after join() > succeeds. > > Regarding the proposal of branching the call stack, it reminds me too > much of the problems one has when a fork()'ed child raises an > exception which ends up being handled by an exception handler higher > up in the parent's call stack (which has been faithfully copied into > the child process by fork()). That has proven a major problem, leading > to various warnings to always catch all exceptions and call os._exit() > upon problems. I realize you're not proposing exactly that. I also > admit I don't exactly understand how you plan to deal with the > situation where one thread raises an exception which the spawning > location fails to handle, while another thread is still running (but > may raise another exception later). Is the spawning thread unwound? > Then what's left to catch the second thread's exception? But all in > all it gives me the heebie-jeebies. I don't see what you're getting at. No stack copying is done so fork is irrelevant and the spawning threads *always* blocks until all of its child threads have exited. Let's try a simpler example, without the main thread (which isn't special for exception purposes): (Make sure to look at it with a monospace font) / baz foo - bar +- baz \ baz bar encapsulates several threads. It makes no sense unravel the call tree while a lower portion of it still exists, so it must wait. If there is a failure, bar will politely tell all 3 baz functions to exit, but they probably won't listen (unless they're calling I/O). If necessary bar will wait forever. foo never sees any of this. It is completely hidden within bar. > Finally, may I suggest that you're perhaps too much in love with the > with-statement? I've a preference for writing as a library, rather than with new syntax. ;) -- Adam Olsen, aka Rhamphoryncus From guido at python.org Wed Sep 19 19:16:06 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 19 Sep 2007 10:16:06 -0700 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: On 9/19/07, Adam Olsen wrote: > Let's try a simpler example, without the main thread (which isn't > special for exception purposes): > (Make sure to look at it with a monospace font) > > / baz > foo - bar +- baz > \ baz > > bar encapsulates several threads. It makes no sense unravel the call > tree while a lower portion of it still exists, so it must wait. If > there is a failure, bar will politely tell all 3 baz functions to > exit, but they probably won't listen (unless they're calling I/O). If > necessary bar will wait forever. > > foo never sees any of this. It is completely hidden within bar. So what happens if the first baz thread raises an exception that bar isn't handling? I suppose it first waits until all baz threads are done, but then the question is still open. Does it percolate up to foo? What if two or more baz threads raise exceptions? How does foo see these? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Wed Sep 19 19:37:29 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 19 Sep 2007 11:37:29 -0600 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: On 9/19/07, Guido van Rossum wrote: > On 9/19/07, Adam Olsen wrote: > > Let's try a simpler example, without the main thread (which isn't > > special for exception purposes): > > (Make sure to look at it with a monospace font) > > > > / baz > > foo - bar +- baz > > \ baz > > > > bar encapsulates several threads. It makes no sense unravel the call > > tree while a lower portion of it still exists, so it must wait. If > > there is a failure, bar will politely tell all 3 baz functions to > > exit, but they probably won't listen (unless they're calling I/O). If > > necessary bar will wait forever. > > > > foo never sees any of this. It is completely hidden within bar. > > So what happens if the first baz thread raises an exception that bar > isn't handling? I suppose it first waits until all baz threads are > done, but then the question is still open. Does it percolate up to > foo? What if two or more baz threads raise exceptions? How does foo > see these? bar itself doesn't see it until *after* they've all exited. The branch construct holds it until all child threads have exited. There is no way to get weird stack unwinding. If multiple exceptions occur they get encapsulated in a MultipleError exception. -- Adam Olsen, aka Rhamphoryncus From aahz at pythoncraft.com Wed Sep 19 21:33:11 2007 From: aahz at pythoncraft.com (Aahz) Date: Wed, 19 Sep 2007 12:33:11 -0700 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> Message-ID: <20070919193311.GA6720@panix.com> On Tue, Sep 18, 2007, Brian Granger wrote: > > What to do about this pending crisis is a complicated issue. Oh, please. Calling this a crisis only causes people to ignore you. Take a look at this URL that Guido posted to the baypiggies list: http://marknelson.us/2007/07/30/multicore-panic/ -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ The best way to get information on Usenet is not to ask a question, but to post the wrong information. From jimjjewett at gmail.com Wed Sep 19 21:58:50 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 19 Sep 2007 15:58:50 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <46F08974.4010101@doubleclix.net> <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com> Message-ID: On 9/18/07, Brian Granger wrote: > If there were infinitely many people willing for work on this stuff, > then I agree, but I don't see even a dozen people hacking on the GIL. In part because many people don't believe it would be productive. For threading to be useful in terms of parallel processing, most memory access has to be read-only. That isn't true today, largely because of reference counts. There are ways around that, by using indirection, or delayed counts, or multiple refcount buckets per object, or even just switching to a tracing GC. So far, no one has been able to make these changes without seriously mangling the C API and/or slowing things down a lot. The current refcount mechanism is so lightweight that it isn't clear this would even be possible. (With 4 or more cores dedicated to just python, it might be worth it anyhow -- but it isn't yet.) So if you want the GIL removed, you need to provide an existence proof that (CPython) memory management can be handled efficiently without it. -jJ From rhamph at gmail.com Wed Sep 19 22:10:42 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 19 Sep 2007 14:10:42 -0600 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <46F08974.4010101@doubleclix.net> <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com> Message-ID: On 9/19/07, Jim Jewett wrote: > On 9/18/07, Brian Granger wrote: > > > If there were infinitely many people willing for work on this stuff, > > then I agree, but I don't see even a dozen people hacking on the GIL. > > In part because many people don't believe it would be productive. > > For threading to be useful in terms of parallel processing, most > memory access has to be read-only. That isn't true today, largely > because of reference counts. > > There are ways around that, by using indirection, or delayed counts, > or multiple refcount buckets per object, or even just switching to a > tracing GC. > > So far, no one has been able to make these changes without seriously > mangling the C API and/or slowing things down a lot. The current > refcount mechanism is so lightweight that it isn't clear this would > even be possible. (With 4 or more cores dedicated to just python, it > might be worth it anyhow -- but it isn't yet.) So if you want the GIL > removed, you need to provide an existence proof that (CPython) memory > management can be handled efficiently without it. Is 60-65% of normal CPython "a lot"? (I really should clean things up and post a patch...) -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Wed Sep 19 22:42:06 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 19 Sep 2007 14:42:06 -0600 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: On 9/19/07, Guido van Rossum wrote: > Regarding the issue of exceptions in threads, I indeed see it as a > non-issue. It's easy enough to develop a subclass of threading.Thread > which catches any exceptions raised by run(), and stores the exception > as an instance variable from which it can be retrieved after join() > succeeds. Perhaps a better question then: do you think it correctly handling errors is a significant part of what makes threads hard today? My focus has always been on making "simultaneous activities" easier to manage. Removing the GIL is just a free bonus from making independent tasks really be independent. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Wed Sep 19 22:58:20 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 19 Sep 2007 13:58:20 -0700 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: On 9/19/07, Adam Olsen wrote: > On 9/19/07, Guido van Rossum wrote: > > Regarding the issue of exceptions in threads, I indeed see it as a > > non-issue. It's easy enough to develop a subclass of threading.Thread > > which catches any exceptions raised by run(), and stores the exception > > as an instance variable from which it can be retrieved after join() > > succeeds. > > Perhaps a better question then: do you think it correctly handling > errors is a significant part of what makes threads hard today? If you're talking about unhandled exceptions, no, that's absolutely a non-issue. The real issues are race conditions, deadlocks, livelocks etc. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Wed Sep 19 23:38:38 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 19 Sep 2007 15:38:38 -0600 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: On 9/19/07, Guido van Rossum wrote: > On 9/19/07, Adam Olsen wrote: > > On 9/19/07, Guido van Rossum wrote: > > > Regarding the issue of exceptions in threads, I indeed see it as a > > > non-issue. It's easy enough to develop a subclass of threading.Thread > > > which catches any exceptions raised by run(), and stores the exception > > > as an instance variable from which it can be retrieved after join() > > > succeeds. > > > > Perhaps a better question then: do you think it correctly handling > > errors is a significant part of what makes threads hard today? > > If you're talking about unhandled exceptions, no, that's absolutely a > non-issue. The real issues are race conditions, deadlocks, livelocks > etc. I guess the bottom line here is that, since none of the proposed solutions magically eliminate race conditions, deadlocks, livelocks, etc, we'll need to try them in the field for quite some time before it's clear if the ways they do make things better have any significant effects in reducing the core problems. In other words, I (and the other pundits) should implement our ideas in a forked python, and not propose merging back until we've got a large user base with a proven track record. Even if that's not as much fun. ;) -- Adam Olsen, aka Rhamphoryncus From guido at python.org Wed Sep 19 23:59:18 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 19 Sep 2007 14:59:18 -0700 Subject: [Python-ideas] Thread exceptions and interruption In-Reply-To: References: Message-ID: On 9/19/07, Adam Olsen wrote: > > If you're talking about unhandled exceptions, no, that's absolutely a > > non-issue. The real issues are race conditions, deadlocks, livelocks > > etc. > > I guess the bottom line here is that, since none of the proposed > solutions magically eliminate race conditions, deadlocks, livelocks, > etc, we'll need to try them in the field for quite some time before > it's clear if the ways they do make things better have any significant > effects in reducing the core problems. > > In other words, I (and the other pundits) should implement our ideas > in a forked python, and not propose merging back until we've got a > large user base with a proven track record. Even if that's not as > much fun. ;) Agreed. Though race conditions become less of a problem if you don't have fine-grained memory sharing (where you always hope you can get away without a lock -- just Google for "double-checked locking" :-). And deadlocks can be fought quite effectively by a runtime layer that detects them, plus strategies for forcing lock acquisition order. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jimjjewett at gmail.com Thu Sep 20 00:12:48 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 19 Sep 2007 18:12:48 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <46F08974.4010101@doubleclix.net> <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com> Message-ID: On 9/19/07, Adam Olsen wrote: > On 9/19/07, Jim Jewett wrote: > > On 9/18/07, Brian Granger wrote: > > So far, no one has been able to make these changes without seriously > > mangling the C API and/or slowing things down a lot. The current > > refcount mechanism is so lightweight that it isn't clear this would > > even be possible. > Is 60-65% of normal CPython "a lot"? Yes, but I think it is still better than the last serious attempt, so it would be worth posting patches anyhow. -jJ From rhamph at gmail.com Thu Sep 20 00:27:04 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 19 Sep 2007 16:27:04 -0600 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <46F08974.4010101@doubleclix.net> <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com> Message-ID: On 9/19/07, Jim Jewett wrote: > On 9/19/07, Adam Olsen wrote: > > On 9/19/07, Jim Jewett wrote: > > > On 9/18/07, Brian Granger wrote: > > > > So far, no one has been able to make these changes without seriously > > > mangling the C API and/or slowing things down a lot. The current > > > refcount mechanism is so lightweight that it isn't clear this would > > > even be possible. > > > Is 60-65% of normal CPython "a lot"? > > Yes, but I think it is still better than the last serious attempt, so > it would be worth posting patches anyhow. It's not even comparable to the last serious attempt. Even mere atomic refcounting has *negative* scalability at two threads when running pystones. My approach has 95-100% scalability. -- Adam Olsen, aka Rhamphoryncus From ellisonbg.net at gmail.com Thu Sep 20 03:56:22 2007 From: ellisonbg.net at gmail.com (Brian Granger) Date: Wed, 19 Sep 2007 21:56:22 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <20070919193311.GA6720@panix.com> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> Message-ID: <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> > > What to do about this pending crisis is a complicated issue. > > Oh, please. Calling this a crisis only causes people to ignore you. I apologize if this statement is a little exaggerated. But, I do think this is a really critical problem that is going to affect certain groups of Python users and developers in adverse ways. Perhaps I have not made a very strong case that it is a true "crisis" though. > Take a look at this URL that Guido posted to the baypiggies list: > > http://marknelson.us/2007/07/30/multicore-panic/ I actually agree with many of the comments made by the author. The quotes from John Dvorak and Wired are over the top. The author make good points about operating systems' abilities to handle threading on modest numbers of cores. It does work, even today, and will continue to work (for a while) as the number of cores increases. But the main point of the author is that operating systems DO have threading support that works reasonably well. Python is another story. The author even makes comments that provide a string critique of Python's (and Ruby's) threading capabilities: Modern programs tend to be moderately multithreaded, with individual threads dedicated to the GUI, to user I/O, to socket I/O, and often to computation. Multicore CPUs take advantage of this quite well. And we don't need any new technology to make sure multi-threaded programs are well-behaved - these techniques are pretty well understood, and in use in most software you use today. Modern languages like Java support threads and various concurrency issues right out of the box. C++ requires non-standard libraries, but all modern C++ environments worth their salt deal with multithreading in a fairly sane way. Modern programs tend to be moderately multithreaded.....except for Python. Modern languages like Java and C++ support threads and concurrency, even if those capabilities aren't built in at a low level (C++). I don't think the same thing can be said about Python. The GIL in CPython does in fact prevent threading from being a general solution for CPU bound parallelism. The author is wrong in one important respect: In this future view, by 2010 we should have the first eight-core systems. In 2014, we're up to 32 cores. By 2017, we've reached an incredible 128 core CPU on a desktop machine. I don't know where the author got this information, but it is way off. Here are some currently available examples: * Appro now offers a workstation with up to 4 quad core Opterons: http://www.appro.com/product/workstationxtreme_opteron.asp That is a 16 core system. It is 2007, not 2012. * Tilera offers a 64 core CPU that runs SMP Linux. * SiCortex offers a low power 648 processor Linux system that is organized into 6 core SMPs, each of which runs Linux. This week I am at a Department of Defense conference on multicore computing: http://www.ll.mit.edu/HPEC/2007/index.html There is broad agreement from all corners that multi/manycore CPUs will require new ways of expressing parallelism. But in all of the discussions about new approaches, people do assume that threads are a low-level building block that can and should be used for building the higher-level stuff. Intel's Threaded Building Blocks is a perfect example. While it is implemented using threads, it provides much higher level abstractions for developers to use in building applications that will scale well on multicore systems. I would _love_ to have such constructs in Python, but that is simply not possible. And process based solutions don't provide a solution for the many algorithms that require fine grained partitioning and fast data movement. For those of us who do use Python for high performance computing, these issues are critical. In fact, anytime I deal with fine grained parallel algorithms, I use C/C++. I do end up wrapping my low level C/C++ threaded code into Python, but this doesn't always map onto the problem well. The other group of people for whom this is a big issue, are general Python users that don't think they need parallelism. Eventually even these people will become frustrated that their Python codes are the same speed on a 1 core system as a 128 core system. To me this is a significant problem. Brian > Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ > > The best way to get information on Usenet is not to ask a question, but > to post the wrong information. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From stephen at xemacs.org Thu Sep 20 22:43:32 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 21 Sep 2007 05:43:32 +0900 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> Message-ID: <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> Brian Granger writes: > I apologize if this statement is a little exaggerated. But, I do > think this is a really critical problem that is going to affect > certain groups of Python users and developers in adverse ways. > Perhaps I have not made a very strong case that it is a true "crisis" > though. No, you're missing the point. I don't see anybody denying that you understand your own needs. *You* may face a (true) crisis. *The Python community* does not perceive your crisis as its own. Personally, I don't see why it should. And I think you'd be much more successful at selling this with a two-pronged approach of evangelizing just how utterly cool it would be to have a totally-threading GIL-less Python on the one hand, and recruiting some gung-ho grad students with Google SoC projects (or *gasp* some of your DoE grant or VC money) on the other. Note that nobody has said anything to discourage this as a research project. Nothing like it's impossible, stupid, or YAGNI. But Guido, and other senior developers, are saying they're not going to devote their resources to it as things currently stand (and one of those resources the attention of the folks who review PEPs). From ksankar at doubleclix.net Fri Sep 21 01:34:54 2007 From: ksankar at doubleclix.net (Krishna Sankar) Date: Thu, 20 Sep 2007 16:34:54 -0700 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46F3039E.5000101@doubleclix.net> Stephen, > project. Nothing like it's impossible, stupid, or YAGNI. But Guido, > and other senior developers, are saying they're not going to devote > their resources to it as things currently stand (and one of those > resources the attention of the folks who review PEPs). I am not sure that is true. I think if we have a well thought out PEP that addresses parallelism, it would be looked into by the folks. It is just that there are too many different ways of doing it and of course, the GIL doesn't help either. There is also a school of thought that we should save the GIL and as a result we will find another better way of leveraging the multi-core architectures. CHeers Stephen J. Turnbull wrote: > Brian Granger writes: > > > I apologize if this statement is a little exaggerated. But, I do > > think this is a really critical problem that is going to affect > > certain groups of Python users and developers in adverse ways. > > Perhaps I have not made a very strong case that it is a true "crisis" > > though. > > No, you're missing the point. I don't see anybody denying that you > understand your own needs. *You* may face a (true) crisis. *The > Python community* does not perceive your crisis as its own. > > Personally, I don't see why it should. And I think you'd be much more > successful at selling this with a two-pronged approach of evangelizing > just how utterly cool it would be to have a totally-threading GIL-less > Python on the one hand, and recruiting some gung-ho grad students with > Google SoC projects (or *gasp* some of your DoE grant or VC money) on > the other. > > Note that nobody has said anything to discourage this as a research > project. Nothing like it's impossible, stupid, or YAGNI. But Guido, > and other senior developers, are saying they're not going to devote > their resources to it as things currently stand (and one of those > resources the attention of the folks who review PEPs). > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > From stephen at xemacs.org Fri Sep 21 04:00:57 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 21 Sep 2007 11:00:57 +0900 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <46F3039E.5000101@doubleclix.net> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> <46F3039E.5000101@doubleclix.net> Message-ID: <87k5qk22w6.fsf@uwakimon.sk.tsukuba.ac.jp> Krishna Sankar writes: > > project. Nothing like it's impossible, stupid, or YAGNI. But Guido, > > and other senior developers, are saying they're not going to devote > > their resources to it as things currently stand (and one of those > > resources the attention of the folks who review PEPs). > > > I am not sure that is true. I think if we have a well thought out > PEP that addresses parallelism, ... and provides a plausible proof-of-concept implementation [Guido said that to you explicitly] ... > it would be looked into by the folks. True. But AIUI without the implementation, it won't become a PEP. And at present you don't have a plausible implementation with the GIL. Now, it looks like Adam has a plausible implementation of removing the GIL. If that holds up under at least some common use-cases, I think you'll see enthusiasm from some of the top developers, and acceptance from most. But I really think that has to come first. From ellisonbg.net at gmail.com Fri Sep 21 04:29:03 2007 From: ellisonbg.net at gmail.com (Brian Granger) Date: Thu, 20 Sep 2007 22:29:03 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> > > I apologize if this statement is a little exaggerated. But, I do > > think this is a really critical problem that is going to affect > > certain groups of Python users and developers in adverse ways. > > Perhaps I have not made a very strong case that it is a true "crisis" > > though. > > No, you're missing the point. I don't see anybody denying that you > understand your own needs. *You* may face a (true) crisis. *The > Python community* does not perceive your crisis as its own. I do agree that there is a diversity of needs in the greater Python community. But the current discussion is oriented towards a subset of Python users who *do* or care about parallelism. For this subset of people, I do feel the issues are important. I don't expect people who don't need high performance and thus parallelism to feel the same way that I do. > Personally, I don't see why it should. And I think you'd be much more > successful at selling this with a two-pronged approach of evangelizing > just how utterly cool it would be to have a totally-threading GIL-less > Python on the one hand, Now I am really regretting using the word "crisis," as your statement implies that I am so negative about all this that I have lost sight positive side of this discussion. I do think it would be fantastic to have a more threading capable Python. > and recruiting some gung-ho grad students with > Google SoC projects (or *gasp* some of your DoE grant or VC money) on > the other. *gasp*, I wasn't aware that I had grad students, DOE grants or VC money. A lot of things must have changed while I have been out of town this week :) While the company at which I work does have DOE grants, work on the GIL is _far_ outside their scope of work. One of the difficulties with the actual work of removing the GIL is that it is difficult, grungy work that not many people are interested in funding. Most of the funding sources that are throwing money at parallel computing and multicore are focused on languages other than Python. But, I could imagine that someone like IBM that seems to have an interest in both Python and multicore CPUs would be interested in sponsoring such an effort. You would think that Google would also an interest in this. To me it seems that the situation with the GIL will remain the same until 1) someone with lots of free time and desire steps up to the plate or 2) someone ponies up the $ to pay someone to work on it. Currently, I am not in the first of these situations. > Note that nobody has said anything to discourage this as a research > project. Nothing like it's impossible, stupid, or YAGNI. But Guido, > and other senior developers, are saying they're not going to devote > their resources to it as things currently stand (and one of those > resources the attention of the folks who review PEPs). That has been made clear. Brian From guido at python.org Fri Sep 21 05:09:56 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 20 Sep 2007 20:09:56 -0700 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> Message-ID: Can you all stop the meta-discussion and start discussing ideas for parallel APIs please? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ksankar at doubleclix.net Fri Sep 21 05:48:20 2007 From: ksankar at doubleclix.net (Krishna Sankar) Date: Thu, 20 Sep 2007 20:48:20 -0700 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <87k5qk22w6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> <46F3039E.5000101@doubleclix.net> <87k5qk22w6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46F33F04.9030003@doubleclix.net> Stephen, Now I see where you are coming from and you are right. Plausible is the operative word. It is not that Guido and Sr.Developers will not look at it at all, but will not look at it in the absence of a plausible implementation. This was evident in the blog discussions with Bruce as well. I was encouraged by Guido's reply. He is doing the right thing. Anyway, the point is, now we have a good understanding of the dynamics. There is a small community who would like to see this happen and it is up to us to show it can be done. A few key things need to happen: a) A good set of benchmarks on multi-core systems, extending excellent work (for example http://blogs.warwick.ac.uk/dwatkins/entry/benchmarking_parallel_python_1_2/) I have an 8 core machine (2 X 4 core) and plan to run this benchmark as well as create others like the Red Black Tree (Guido had suggested that, even before that I was thinking of it) b.1) Plausible POC with GIL - pp, actor pattern/agent paradigm et al. b.2) Plausible one, as and when GIL is removed c) Figure out what support is needed from PVM et al (b.1 and/or b.2) d) PEP and onwards ... Most probably I am stating the obvious. My take is that lots of the work is already there. We need to converge and do the rest to work towards a PEP.I know that Guido and Sr Developers will involve themselves at the appropriate time. Cheers Stephen J. Turnbull wrote: > Krishna Sankar writes: > > > > project. Nothing like it's impossible, stupid, or YAGNI. But Guido, > > > and other senior developers, are saying they're not going to devote > > > their resources to it as things currently stand (and one of those > > > resources the attention of the folks who review PEPs). > > > > > > I am not sure that is true. I think if we have a well thought out > > PEP that addresses parallelism, > > ... and provides a plausible proof-of-concept implementation [Guido > said that to you explicitly] ... > > > it would be looked into by the folks. > > True. But AIUI without the implementation, it won't become a PEP. > And at present you don't have a plausible implementation with the GIL. > > Now, it looks like Adam has a plausible implementation of removing the > GIL. If that holds up under at least some common use-cases, I think > you'll see enthusiasm from some of the top developers, and acceptance > from most. But I really think that has to come first. > > > From mattknox_ca at hotmail.com Fri Sep 21 05:01:12 2007 From: mattknox_ca at hotmail.com (Matt Knox) Date: Fri, 21 Sep 2007 03:01:12 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Exploration_PEP_=3A_Concurrency_for_mode?= =?utf-8?q?rately=09massive_=284_to_32_cores=29_multi-core_architec?= =?utf-8?q?tures?= References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <46F08974.4010101@doubleclix.net> <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com> Message-ID: > Is 60-65% of normal CPython "a lot"? > > (I really should clean things up and post a patch...) > Is your implementation based on Python 3000/3.0? or Python 2.x? Regardless, I'd love to see it. How would you describe the changes to the C-api? Radical departure? Somewhat compatible? mostly compatible? No changes? - Matt From rhamph at gmail.com Fri Sep 21 07:33:02 2007 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 20 Sep 2007 23:33:02 -0600 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <46F08974.4010101@doubleclix.net> <6ce0ac130709181958y6cc28a01yc22bb7bfa451c97e@mail.gmail.com>

Message-ID: On 9/20/07, Matt Knox wrote: > > Is 60-65% of normal CPython "a lot"? > > > > (I really should clean things up and post a patch...) > > > > Is your implementation based on Python 3000/3.0? or Python 2.x? Regardless, > I'd love to see it. How would you describe the changes to the C-api? Radical > departure? Somewhat compatible? mostly compatible? No changes? It's based of py3k. I will post eventually (heh). The C API is mostly compatible (source wise only), notably excepting the various threading things. -- Adam Olsen, aka Rhamphoryncus From tjreedy at udel.edu Fri Sep 21 23:39:54 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 21 Sep 2007 17:39:54 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderatelymassive (4 to 32 cores) multi-core architectures References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com><20070919193311.GA6720@panix.com><6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com><87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> Message-ID: "Brian Granger" Brief responses to this and previous posts and threads: You withdraw 'crisis'. Fine. Let us move on. Hobbling Python on the billion (1000 million) or so current and future single-core CPUs is not acceptible. Let us move on. Multiple cores can be used with separate processes. I expect this is where Google is, but that is another discussion, except 'Do not hold your breath" waiting for Google to publicly support multicore threading. Improving multi-thread use of multiple cores -- without hobbling single core use -- would be good. Obviously. No need to argue the point. Let us move on. What is needed is persuasion of practicality by the means Guido has requested: concrete proposals and code. Terry Jan Reedy From greg.ewing at canterbury.ac.nz Sat Sep 22 02:48:50 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 22 Sep 2007 12:48:50 +1200 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> Message-ID: <46F46672.70105@canterbury.ac.nz> Brian Granger wrote: > One of the difficulties with the actual work of removing the GIL is > that it is difficult, grungy work that not many people are interested > in funding. It's not just a matter of hard work -- so far, nobody knows *how* to remove the GIL without a big loss in efficiency. Somebody is going to have to have a flash of insight, and money can't buy that. -- Greg From stephen at xemacs.org Sat Sep 22 09:49:54 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 22 Sep 2007 16:49:54 +0900 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures In-Reply-To: <46F46672.70105@canterbury.ac.nz> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> <46F46672.70105@canterbury.ac.nz> Message-ID: <876423yw9p.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Brian Granger wrote: > > One of the difficulties with the actual work of removing the GIL is > > that it is difficult, grungy work that not many people are interested > > in funding. > > It's not just a matter of hard work -- so far, nobody > knows *how* to remove the GIL without a big loss in > efficiency. Somebody is going to have to have a flash > of insight, and money can't buy that. They said that about the four-color theorem, too. From ksankar at doubleclix.net Wed Sep 26 06:10:59 2007 From: ksankar at doubleclix.net (Krishna Sankar) Date: Tue, 25 Sep 2007 21:10:59 -0700 Subject: [Python-ideas] Python and Transactional memory In-Reply-To: <876423yw9p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <6ce0ac130709181831q415a1e0em87a680b68bd5cd9b@mail.gmail.com> <20070919193311.GA6720@panix.com> <6ce0ac130709191856v68c4f711s453197e0d37ff949@mail.gmail.com> <87r6kt2hl7.fsf@uwakimon.sk.tsukuba.ac.jp> <6ce0ac130709201929n226ea925ye9f0afd98bafb0f@mail.gmail.com> <46F46672.70105@canterbury.ac.nz> <876423yw9p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <46F9DBD3.6020009@doubleclix.net> Would like pointers to any work going on, on transactional memory constructs (like atomic et al) and python. I also have a few questions to any who is working on this topic. Cheers From grosser.meister.morti at gmx.net Wed Sep 26 23:08:29 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Wed, 26 Sep 2007 23:08:29 +0200 Subject: [Python-ideas] is in operator Message-ID: <46FACA4D.8060800@gmx.net> Sometimes I want to compare a "pointer" to more then one others. The "in" operator would be handy, but it uses the "==" operator instead of the "is" operator. So a "is in" operator would be nice. Though I don't know how easy it is for a newbie to see what does what. # This: if x is in (a, b, c): ... # would be equivalent to this: if x is a or x is b or x is c: ... # And of course there should be a "is not in" operator, too: if x is not in (a, b, c): ... # this would be equivalent to tis: if x is not a and x is not b and x is not c: ... Hmmm, maybe a way to apply some kind of comparison between a value and more other values would be better. But that already exists, so screw this msg: if any(x is y for y in (a, b, c)): ... if all(x is not y for y in (a, b, c)): ... From terry at jon.es Wed Sep 26 23:42:50 2007 From: terry at jon.es (Terry Jones) Date: Wed, 26 Sep 2007 23:42:50 +0200 Subject: [Python-ideas] Calling a function of a list without accumulating results Message-ID: <18170.53850.491682.600306@terry.local> What's the most compact way to repeatedly call a function on a list without accumulating the results? While I can accumulate results via a = [f(x) for x in mylist] or with a generator, there doesn't seem to be a way to do this without accumulating the results. I guess I need to either use the above and ignore the result, or use for x in mylist: f(x) I run into this need quite frequently. If I write [f(x) for x in mylist] with no assignment, will Python notice that I don't want the accumulated results and silently toss them for me? A possible syntax change would be to allow the unadorned f(x) for x in mylist And raise an error if someone tries to assign to this. Terry From rhamph at gmail.com Wed Sep 26 23:57:37 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 26 Sep 2007 15:57:37 -0600 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18170.53850.491682.600306@terry.local> References: <18170.53850.491682.600306@terry.local> Message-ID: On 9/26/07, Terry Jones wrote: > What's the most compact way to repeatedly call a function on a list without > accumulating the results? > > While I can accumulate results via > > a = [f(x) for x in mylist] > > or with a generator, there doesn't seem to be a way to do this without > accumulating the results. I guess I need to either use the above and ignore > the result, or use > > for x in mylist: > f(x) Just use this. Simple, readable. No need to get fancy. -- Adam Olsen, aka Rhamphoryncus From brett at python.org Thu Sep 27 01:02:03 2007 From: brett at python.org (Brett Cannon) Date: Wed, 26 Sep 2007 16:02:03 -0700 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18170.53850.491682.600306@terry.local> References: <18170.53850.491682.600306@terry.local> Message-ID: On 9/26/07, Terry Jones wrote: > What's the most compact way to repeatedly call a function on a list without > accumulating the results? > > While I can accumulate results via > > a = [f(x) for x in mylist] > > or with a generator, there doesn't seem to be a way to do this without > accumulating the results. I guess I need to either use the above and ignore > the result, or use > > for x in mylist: > f(x) > > I run into this need quite frequently. If I write > > [f(x) for x in mylist] > > with no assignment, will Python notice that I don't want the accumulated > results and silently toss them for me? > Only after the list is completely constructed. List comprehensions are literally 'for' loops with an append call to a method so without extending the peepholer to notice this case and strip out the list creation and appending it is not optimized. > A possible syntax change would be to allow the unadorned > > f(x) for x in mylist > > And raise an error if someone tries to assign to this. Go with the 'for' loop as Adam suggested. I just don't see this as needing syntax support. -Brett From terry at jon.es Thu Sep 27 01:20:06 2007 From: terry at jon.es (Terry Jones) Date: Thu, 27 Sep 2007 01:20:06 +0200 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: Your message at 16:02:03 on Wednesday, 26 September 2007 References: <18170.53850.491682.600306@terry.local> Message-ID: <18170.59686.345068.336674@terry.local> Hi Brett & Adam Thanks for the replies. | Only after the list is completely constructed. List comprehensions | are literally 'for' loops with an append call to a method so without | extending the peepholer to notice this case and strip out the list | creation and appending it is not optimized. | | > A possible syntax change would be to allow the unadorned | > | > f(x) for x in mylist | > | > And raise an error if someone tries to assign to this. | | Go with the 'for' loop as Adam suggested. I just don't see this as | needing syntax support. I think there are two arguments in its favor. The first is the same as one of the arguments for providing list comprehensions and generator expressions - because it makes common multi-line boilerplate much more concise. There's a certain syntax that's allowed in [] and () to make list comprehensions and generator expressions. I'm suggesting allowing exactly the same thing, but with no explicit grouping wrapped around it. The trivial case I posted isn't much of a win over the simple 2-line alternative, but it's easy to go to further: f(x, y) for x in myXlist for y in myYlist instead of for x in myXlist: for y in myYlist: f(x, y) and of course there are many more examples. The second argument is one of consistency. If list comprehensions are regarded as more pythonic and the Right Way to code in Python, I'd make the same argument for when you don't happen to want to keep the accumulated results. Why force programmers to use two coding styles in order to get essentially the same thing done? I think these are decent arguments. It's simply the full succinctness and convenience of list comprehensions, without needing to accumulate results. Thanks again for the replies. Changing the peepholer to notice when there's no assignment to a list expression would also be nice. I'd look at it if I had time..... :-) Terry From george.sakkis at gmail.com Thu Sep 27 02:07:58 2007 From: george.sakkis at gmail.com (George Sakkis) Date: Wed, 26 Sep 2007 20:07:58 -0400 Subject: [Python-ideas] is in operator In-Reply-To: <46FACA4D.8060800@gmx.net> References: <46FACA4D.8060800@gmx.net> Message-ID: <91ad5bf80709261707u129a118w3768061b0338baba@mail.gmail.com> On 9/26/07, Mathias Panzenb?ck wrote: > > Sometimes I want to compare a "pointer" to more then one others. The "in" > operator > would be handy, but it uses the "==" operator instead of the "is" > operator. So a "is > in" operator would be nice. Though I don't know how easy it is for a > newbie to see > what does what. > > # This: > if x is in (a, b, c): > ... > > # would be equivalent to this: > if x is a or x is b or x is c: > ... > > # And of course there should be a "is not in" operator, too: > if x is not in (a, b, c): > ... > > # this would be equivalent to tis: > if x is not a and x is not b and x is not c: > ... > > > Hmmm, maybe a way to apply some kind of comparison between a value and > more other > values would be better. But that already exists, so screw this msg: > > if any(x is y for y in (a, b, c)): > ... > > if all(x is not y for y in (a, b, c)): Or in a more obfuscated way: import operator as op from itertools import imap from functools import partial if any(imap(partial(op.is_,x), (a, b, c))): ... if all(imap(partial(op.is_not,x), (a, b, c))): ... George -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Sep 27 02:43:49 2007 From: brett at python.org (Brett Cannon) Date: Wed, 26 Sep 2007 17:43:49 -0700 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18170.59686.345068.336674@terry.local> References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> Message-ID: On 9/26/07, Terry Jones wrote: > Hi Brett & Adam > > Thanks for the replies. > > | Only after the list is completely constructed. List comprehensions > | are literally 'for' loops with an append call to a method so without > | extending the peepholer to notice this case and strip out the list > | creation and appending it is not optimized. > | > | > A possible syntax change would be to allow the unadorned > | > > | > f(x) for x in mylist > | > > | > And raise an error if someone tries to assign to this. > | > | Go with the 'for' loop as Adam suggested. I just don't see this as > | needing syntax support. > > I think there are two arguments in its favor. > > The first is the same as one of the arguments for providing list > comprehensions and generator expressions - because it makes common > multi-line boilerplate much more concise. > OK, the question is how common is this. I personally don't come across this idiom often enough to feel the need to avoid creating a listcomp, genexp, or a 'for' loop. > There's a certain syntax that's allowed in [] and () to make list > comprehensions and generator expressions. I'm suggesting allowing exactly > the same thing, but with no explicit grouping wrapped around it. > But are you sure Python's grammar could support it? Parentheses are needed for genexps in certain situations for disambiguation because of Python's LL(1) grammar. > The trivial case I posted isn't much of a win over the simple 2-line > alternative, but it's easy to go to further: > > f(x, y) for x in myXlist for y in myYlist > > instead of > > for x in myXlist: > for y in myYlist: > f(x, y) > > and of course there are many more examples. > Right, but the second one is so much easier to read and comprehend. > The second argument is one of consistency. If list comprehensions are > regarded as more pythonic and the Right Way to code in Python, I'd make the > same argument for when you don't happen to want to keep the accumulated > results. Why force programmers to use two coding styles in order to get > essentially the same thing done? > I think "force" is rather strong wording for "choice". The point is the 'for' loop is the standard solution and there just happens to be a shorthand for a common case. You shouldn't view listcomps as being on the same ground as a 'for' loop. > I think these are decent arguments. It's simply the full succinctness and > convenience of list comprehensions, without needing to accumulate results. > They are decent, but not enough to warrant adding special support in my opinion. Heck, I would vote to ditch listcomps for ``list(genexp)`` had genexps come first and have the options trimmed down even more. And if you are doing this to just toss out stuff you can do something like:: for _ in (f(x) for x in anylist): pass No accumulating list, one line, and you still get your genexp syntax. Basically, unless you can go through the stdlib and find all the instances of the pattern you want to prove it is common enough to warrant support it none of the core developers will probably go for this. -Brett From rhamph at gmail.com Thu Sep 27 03:25:56 2007 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 26 Sep 2007 19:25:56 -0600 Subject: [Python-ideas] is in operator In-Reply-To: <46FACA4D.8060800@gmx.net> References: <46FACA4D.8060800@gmx.net> Message-ID: On 9/26/07, Mathias Panzenb?ck wrote: > Sometimes I want to compare a "pointer" to more then one others. The "in" operator > would be handy, but it uses the "==" operator instead of the "is" operator. So a "is > in" operator would be nice. Though I don't know how easy it is for a newbie to see > what does what. There's many different ways you might want to do a comparison. That's why sorted() has a cmp=func argument. A new API won't work though, as dicts or sets need to know the hash in advance, and lists are O(n) anyway (so there's little appropriate use.) To solve your problem you should be using a decorate/undecorate pattern, possibly encapsulated into a custom container type. There doesn't appear to be any in the python cookbook (so it may be a very rare need), but assuming you did use a container type your code might be rewritten as such: if x in idset([a, b, c]): But decorating is almost as simple: if id(x) in [id(a), id(b), id(c)]: (Caveat: id(obj) assumes you have another reference to the obj, to prevent the identity from being reused.) > # This: > if x is in (a, b, c): > ... > > # would be equivalent to this: > if x is a or x is b or x is c: > ... > > # And of course there should be a "is not in" operator, too: > if x is not in (a, b, c): > ... > > # this would be equivalent to tis: > if x is not a and x is not b and x is not c: > ... > > > Hmmm, maybe a way to apply some kind of comparison between a value and more other > values would be better. But that already exists, so screw this msg: > > if any(x is y for y in (a, b, c)): > ... > > if all(x is not y for y in (a, b, c)): > ... > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Adam Olsen, aka Rhamphoryncus From terry at jon.es Thu Sep 27 04:01:54 2007 From: terry at jon.es (Terry Jones) Date: Thu, 27 Sep 2007 04:01:54 +0200 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: Your message at 17:43:49 on Wednesday, 26 September 2007 References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> Message-ID: <18171.3858.931686.291011@terry.local> Hi Brett | > The first is the same as one of the arguments for providing list | > comprehensions and generator expressions - because it makes common | > multi-line boilerplate much more concise. | | OK, the question is how common is this. I don't know. I use it maybe once every few weeks. There's a function to do this in Common Lisp (mapc), not that that means anything much. | But are you sure Python's grammar could support it? Parentheses are | needed for genexps in certain situations for disambiguation because of | Python's LL(1) grammar. I don't know the answer to this either. I imagine it's a matter of tacking an optional "for ..." clause onto the end of an expression. The "for ..." part can certainly be handled (is being handled already), so I think it might not be too hard - supposing there is already a non-terminal for the "for ..." clause. | > The trivial case I posted isn't much of a win over the simple 2-line | > alternative, but it's easy to go to further: | > | > f(x, y) for x in myXlist for y in myYlist | > | > instead of | > | > for x in myXlist: | > for y in myYlist: | > f(x, y) | > | > and of course there are many more examples. | | Right, but the second one is so much easier to read and comprehend. I tend to agree, but the language supports the more concise form for both list comprehension and genexps, so there must be a fair number of people who thought it was a win to allow the compact form. | I think "force" is rather strong wording for "choice". OK, how about "lack of choice"? :-) Seriously (to take an example from the Python pocket ref page 24), you do have the choice to write a = [x for x in range(5) if x % 2 == 0] instead of a = [] for x in range(5): if x % 2 == 0: a.append(x) but you don't have a (simple) choice if you don't want to accumulate results. I'm merely saying that I think it would be cleaner and more consistent to allow print(x) for x in range(5) if x % 2 == 0 instead of having the non-choice but to write something like for x in range(5): if x % 2 == 0: print x Yes, I (and thank you for it) could now use your suggested for _ in ...: pass trick, but that's not really the whole point, to me. If the language can be made simpler and more consistent, I think that's generally a good thing. I know, I don't know anything about the implementation. But this is an ideas list. | Heck, I would vote to ditch listcomps for ``list(genexp)`` had genexps | come first and have the options trimmed down even more. Me too. But even if that eventuated, I'd _still_ propose allowing the unadorned genexp, for the case where you don't want the results. | Basically, unless you can go through the stdlib and find all the | instances of the pattern you want to prove it is common enough to | warrant support it none of the core developers will probably go for | this. Understood. Thanks, Terry From greg.ewing at canterbury.ac.nz Thu Sep 27 04:09:16 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 27 Sep 2007 14:09:16 +1200 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18170.53850.491682.600306@terry.local> References: <18170.53850.491682.600306@terry.local> Message-ID: <46FB10CC.6010205@canterbury.ac.nz> Terry Jones wrote: > What's the most compact way to repeatedly call a function on a list without > accumulating the results? > for x in mylist: > f(x) That's it. > If I write > > [f(x) for x in mylist] > > with no assignment, will Python notice that I don't want the accumulated > results and silently toss them for me? No. And there would be no advantage in using LC syntax even if it did. The generated bytecode would be essentially identical. If you really must have it all on one line, you can write for x in mylist: f(x) -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Sep 27 04:46:36 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 27 Sep 2007 14:46:36 +1200 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18170.59686.345068.336674@terry.local> References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> Message-ID: <46FB198C.4070705@canterbury.ac.nz> Terry Jones wrote: > If list comprehensions are > regarded as more pythonic and the Right Way to code in Python, I'd make the > same argument for when you don't happen to want to keep the accumulated > results. Why force programmers to use two coding styles in order to get > essentially the same thing done? There isn't anything "more Pythonic" about the LC syntax in itself. It's just a more compact alternative for when you're constructing a list. It's not *un*- Pythonic to *not* use it, even when you do want a list. Nobody would fault you for not using one when you could have. The way things are, there is only one coding style for when you don't want the results. You're suggesting the addition of another one. That *would* be un-Pythonic. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From greg.ewing at canterbury.ac.nz Thu Sep 27 05:02:00 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 27 Sep 2007 15:02:00 +1200 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18171.3858.931686.291011@terry.local> References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> <18171.3858.931686.291011@terry.local> Message-ID: <46FB1D28.5090002@canterbury.ac.nz> Terry Jones wrote: > the language supports the more concise form for both > list comprehension and genexps, so there must be a fair number of people > who thought it was a win to allow the compact form. The compact form *is* considerably more compact when you're constructing a list, as it saves the initalisation and an append call. It also permits an optimisation by extracting a bound method for the append. When not constructing a list, however, there's no significant difference in source length or runtime efficiency. So the "more compact" form wouldn't be any more compact, just different. It would be a spurious Other Way To Do It. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From brett at python.org Thu Sep 27 05:42:06 2007 From: brett at python.org (Brett Cannon) Date: Wed, 26 Sep 2007 20:42:06 -0700 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18171.3858.931686.291011@terry.local> References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> <18171.3858.931686.291011@terry.local> Message-ID: On 9/26/07, Terry Jones wrote: > Hi Brett > > | > The first is the same as one of the arguments for providing list > | > comprehensions and generator expressions - because it makes common > | > multi-line boilerplate much more concise. > | > | OK, the question is how common is this. > > I don't know. I use it maybe once every few weeks. There's a function to do > this in Common Lisp (mapc), not that that means anything much. > You could try to get something that consumes a generator that just tosses out the results into the stdlib. That has a much lower barrier of entry than new syntax. > | But are you sure Python's grammar could support it? Parentheses are > | needed for genexps in certain situations for disambiguation because of > | Python's LL(1) grammar. > > I don't know the answer to this either. I imagine it's a matter of tacking > an optional "for ..." clause onto the end of an expression. The "for ..." > part can certainly be handled (is being handled already), so I think it > might not be too hard - supposing there is already a non-terminal for the > "for ..." clause. > > | > The trivial case I posted isn't much of a win over the simple 2-line > | > alternative, but it's easy to go to further: > | > > | > f(x, y) for x in myXlist for y in myYlist > | > > | > instead of > | > > | > for x in myXlist: > | > for y in myYlist: > | > f(x, y) > | > > | > and of course there are many more examples. > | > | Right, but the second one is so much easier to read and comprehend. > > I tend to agree, but the language supports the more concise form for both > list comprehension and genexps, so there must be a fair number of people > who thought it was a win to allow the compact form. > Yes, but it was specifically for the list building idiom. > | I think "force" is rather strong wording for "choice". > > OK, how about "lack of choice"? :-) > > Seriously (to take an example from the Python pocket ref page 24), you do > have the choice to write > > a = [x for x in range(5) if x % 2 == 0] > > instead of > > a = [] > for x in range(5): > if x % 2 == 0: > a.append(x) > > but you don't have a (simple) choice if you don't want to accumulate > results. I'm merely saying that I think it would be cleaner and more > consistent to allow > > print(x) for x in range(5) if x % 2 == 0 > > instead of having the non-choice but to write something like > > for x in range(5): > if x % 2 == 0: > print x > > Yes, I (and thank you for it) could now use your suggested for _ in ...: > pass trick, but that's not really the whole point, to me. If the language > can be made simpler and more consistent, I think that's generally a good > thing. > But it could be said is not simpler as there is now a new type of statement that previously was always an expression (this would have to be a statement as the whole point of this idea is there is no return value). > I know, I don't know anything about the implementation. But this is an > ideas list. > Right, which is why this conversation has gone on without someone flat-out saying it wasn't going to happen like on python-dev. =) But at some point the idea either needs to seem reasonable to enough to try to move forward or to just let it go. And at this point I have said what it will take to take it to the next level if you care enough as I doubt any of the core developers will go for this enough to carry it on their own. > | Heck, I would vote to ditch listcomps for ``list(genexp)`` had genexps > | come first and have the options trimmed down even more. > > Me too. But even if that eventuated, I'd _still_ propose allowing the > unadorned genexp, for the case where you don't want the results. > > | Basically, unless you can go through the stdlib and find all the > | instances of the pattern you want to prove it is common enough to > | warrant support it none of the core developers will probably go for > | this. > > Understood. > > Thanks, Welcome. Thanks for bringing the idea up! -Brett From rrr at ronadam.com Thu Sep 27 06:54:21 2007 From: rrr at ronadam.com (Ron Adam) Date: Wed, 26 Sep 2007 23:54:21 -0500 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18171.3858.931686.291011@terry.local> References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> <18171.3858.931686.291011@terry.local> Message-ID: <46FB377D.50903@ronadam.com> Terry Jones wrote: > OK, how about "lack of choice"? :-) There's always a choice... Not always a good one though. ;-) > but you don't have a (simple) choice if you don't want to accumulate > results. I'm merely saying that I think it would be cleaner and more > consistent to allow > > print(x) for x in range(5) if x % 2 == 0 > > instead of having the non-choice but to write something like > > for x in range(5): > if x % 2 == 0: > print x > >>> a = list(range(10)) >>> def pr(obj): ... print obj ... >>> a = list(range(10)) >>> b = [1 for x in a if pr(x)] 0 1 2 3 4 5 6 7 8 9 >>> b [] Or to be more specific to the above example... >>> b = [1 for x in range(5) if x % 2 == 0 and pr(x)] 0 2 4 >>> b [] All of which are more complex then a simple for loop. Cheers, Ron From stephen at xemacs.org Thu Sep 27 07:15:23 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 27 Sep 2007 14:15:23 +0900 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18170.59686.345068.336674@terry.local> References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> Message-ID: <87abr8lmdw.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Jones writes: > The trivial case I posted isn't much of a win over the simple 2-line > alternative, but it's easy to go to further: > > f(x, y) for x in myXlist for y in myYlist Excuse me? > instead of > > for x in myXlist: > for y in myYlist: > f(x, y) Oh, is that what you meant?! I think the second version is much more readable, and only a few characters longer when typing. > The second argument is one of consistency. If list comprehensions are > regarded as more pythonic and the Right Way to code in Python, I'd make the > same argument for when you don't happen to want to keep the accumulated > results. Why force programmers to use two coding styles in order to get > essentially the same thing done? Because it is essentially not the same thing. Comprehension syntax is justified precisely when you want to generate a list value for immediate use, and all the other ways to generate that value force you to hide what's being done in an assignment deep inside a thicket of syntax. List comprehensions are Pythonic because they "look like" lists. IMHO, anyway. OTOH, in Python, control syntax always starts with a keyword. A naked comprehension just doesn't look like a control statement to me, it still looks like an expression. I don't know if that's un-Pythonic, but I do like the multiline version better. > I think these are decent arguments. It's simply the full succinctness and > convenience of list comprehensions, without needing to accumulate results. But succintness and convenience aren't arguments for doing something in Python as I understand it. Lack of succintness and convenience may postpone acceptance of a PEP, or even kill it, of course. But they've never been sufficient for acceptance of a PEP that I've seen. From grosser.meister.morti at gmx.net Thu Sep 27 14:18:24 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Thu, 27 Sep 2007 14:18:24 +0200 Subject: [Python-ideas] is in operator In-Reply-To: <91ad5bf80709261707u129a118w3768061b0338baba@mail.gmail.com> References: <46FACA4D.8060800@gmx.net> <91ad5bf80709261707u129a118w3768061b0338baba@mail.gmail.com> Message-ID: <46FB9F90.2030607@gmx.net> George Sakkis wrote: > > Or in a more obfuscated way: > > import operator as op > from itertools import imap >>from functools import partial > > if any(imap(partial(op.is_,x), (a, b, c))): > ... > > if all(imap(partial(op.is_not,x), (a, b, c))): > ... > > > George > Or in haskell (assuming haskell would have "is" and "is not"): if any (x is) [a, b, c] then ... else ... if all (x is not) [a, b c] then ... else ... I'm not sure if "any" and "all" are the ones with 2 parameters (function and list) or if that would be "or" and "and". -panzi From arno at marooned.org.uk Thu Sep 27 19:02:28 2007 From: arno at marooned.org.uk (Arnaud Delobelle) Date: Thu, 27 Sep 2007 18:02:28 +0100 (BST) Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <18170.53850.491682.600306@terry.local> References: <18170.53850.491682.600306@terry.local> Message-ID: <63519.82.46.172.40.1190912548.squirrel@marooned.org.uk> On Wed, September 26, 2007 10:42 pm, Terry Jones wrote: > What's the most compact way to repeatedly call a function on a list > without > accumulating the results? > > While I can accumulate results via > > a = [f(x) for x in mylist] > > or with a generator, there doesn't seem to be a way to do this without > accumulating the results. I guess I need to either use the above and > ignore > the result, or use > > for x in mylist: > f(x) > > I run into this need quite frequently. If I write > > [f(x) for x in mylist] > > with no assignment, will Python notice that I don't want the accumulated > results and silently toss them for me? > > A possible syntax change would be to allow the unadorned > > f(x) for x in mylist > > And raise an error if someone tries to assign to this. > If you want to do it like this, why not do it explicitly: def exhaust(iterable): for i in iterable: pass Then you can write: exhaust(f(x) for x in mylist) Done! -- Arnaud From fdrake at acm.org Thu Sep 27 20:05:48 2007 From: fdrake at acm.org (Fred Drake) Date: Thu, 27 Sep 2007 14:05:48 -0400 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <63519.82.46.172.40.1190912548.squirrel@marooned.org.uk> References: <18170.53850.491682.600306@terry.local> <63519.82.46.172.40.1190912548.squirrel@marooned.org.uk> Message-ID: <279AD750-5307-4CDB-8AE8-E5252F7707E9@acm.org> On Sep 27, 2007, at 1:02 PM, Arnaud Delobelle wrote: > def exhaust(iterable): > for i in iterable: pass > > Then you can write: > > exhaust(f(x) for x in mylist) Ooooh... I like this! Anyone who needs such a construct can just write their own exhaust() function, too, since I see no reason to pollute the distribution with this. -Fred -- Fred Drake From lucio.torre at gmail.com Thu Sep 27 20:06:12 2007 From: lucio.torre at gmail.com (Lucio Torre) Date: Thu, 27 Sep 2007 15:06:12 -0300 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: <63519.82.46.172.40.1190912548.squirrel@marooned.org.uk> References: <18170.53850.491682.600306@terry.local> <63519.82.46.172.40.1190912548.squirrel@marooned.org.uk> Message-ID: <999187ed0709271106vb3f879eo6b5ac9943d82cbe6@mail.gmail.com> On 9/27/07, Arnaud Delobelle wrote: > > > > > for x in mylist: > > f(x) > > > > > > [f(x) for x in mylist] > > > am i missing something or the one-liner syntax is great for this? for x in mylist: f(x) lucio. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmaurer at slickedit.com Thu Sep 27 23:34:11 2007 From: cmaurer at slickedit.com (Clark Maurer) Date: Thu, 27 Sep 2007 17:34:11 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures Message-ID: Hello, I've been following this discussion. My thoughts mostly reiterate what has already been said. There's no way to get rid of the GIL without significantly effecting single threaded performance. IMO getting rid of the GIL would require writing a mark-and-sweep algorithm. To improve performance you can do incremental (threaded) marking and detect page faults so that modified pages can be rescanned for references. The Boehm garbage collector does this (I think) but Python would need something much more custom. This type of garbage collector is VERY hard to write. Worse yet, the current implementation of Python would need a lot of rewriting. FYI: I tried using the Boehm collector in SlickEdit and it leaked memory like crazy. I never figured out why but I suspect it had to do with it treating everything in memory as a potential pointer. Ruby's mark-and sweep garbage collector illustrates the loss in single threaded performance and since it does its own thread scheduling, the thread performance is bad too. As Python stands right now, its performance is excellent for single threading, the implementation is simple, it works well for the typical Python user, and using processes at least gives a work around. I like to be a perfectionist as much as the next guy but the pay back doesn't warrant the level of effort. Where's the easy button when you need oneJ I thought you Python enthusiasts (especially Guido) might enjoy the article I just posted on the SlickEdit blog. I'm the CTO and founder of SlickEdit. I hate saying that because I'm a very humble guy but I thought you would want to know. The article is called "Comparing Python to Perl and Ruby", go to http://blog.slickedit.com/. I limited the article to a simple grammar comparison because I wanted to keep the article short. Hope you enjoy it. Guido, I have another article written which talks about Python as well but I have not yet posted it. If you give me an email address, I will send it to you to look over before I post it. Don't give me your email address here. Instead write to support at slickedit.com and let them know that I requested your email address. Cheers Clark -------------- next part -------------- An HTML attachment was scrubbed... URL: From terry at jon.es Thu Sep 27 23:48:47 2007 From: terry at jon.es (Terry Jones) Date: Thu, 27 Sep 2007 23:48:47 +0200 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: Your message at 14:46:36 on Thursday, 27 September 2007 References: <18170.53850.491682.600306@terry.local> <18170.59686.345068.336674@terry.local> <46FB198C.4070705@canterbury.ac.nz> Message-ID: <18172.9535.939185.173926@terry.local> Hi Greg | The way things are, there is only one coding style for when you don't want | the results. You're suggesting the addition of another one. That *would* be | un-Pythonic. But the same remark could be made about using a list and writing explicit loops to accumulate results, and the later addition of list comprehensions. Wasn't that un-Pythonic for the same reason? Terry From terry at jon.es Thu Sep 27 23:56:38 2007 From: terry at jon.es (Terry Jones) Date: Thu, 27 Sep 2007 23:56:38 +0200 Subject: [Python-ideas] Calling a function of a list without accumulating results In-Reply-To: Your message at 18:02:28 on Thursday, 27 September 2007 References: <18170.53850.491682.600306@terry.local> <63519.82.46.172.40.1190912548.squirrel@marooned.org.uk> Message-ID: <18172.10006.586289.448207@terry.local> Hi Arnaud | If you want to do it like this, why not do it explicitly: | | def exhaust(iterable): | for i in iterable: pass | | Then you can write: | | exhaust(f(x) for x in mylist) Thanks - that's nice. It also gives me the generality I wanted, which was the ability to use the full LC/genexp "for..." syntax, which I should have emphasized more, including in the subject of the thread. Terry From tjreedy at udel.edu Fri Sep 28 01:12:46 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 27 Sep 2007 19:12:46 -0400 Subject: [Python-ideas] Exploration PEP : Concurrency for moderately massive(4 to 32 cores) multi-core architectures References: Message-ID: "Clark Maurer" wrote in message news:ECCC6E9907B4CD4A83260A191A91F20E014C8C1E at wampa.office.slickedit.com... Guido, I have another article written which talks about Python as well but I have not yet posted it. If you give me an email address, I will send it to you to look over before I post it. Don't give me your email address here. Instead write to support at slickedit.com and let them know that I requested your email address. ============== Cloak and dagger stuff is not necessary. Guido, like most of us, uses a valid email in Python discussion groups (guido @ python.org recently). From george.sakkis at gmail.com Fri Sep 28 03:47:02 2007 From: george.sakkis at gmail.com (George Sakkis) Date: Thu, 27 Sep 2007 21:47:02 -0400 Subject: [Python-ideas] Removing the del statement Message-ID: <91ad5bf80709271847g726efcf7l919fddcce6332c53@mail.gmail.com> I guess this has very few to zero chances of being considered, even for Python 3, but this being python-ideas I guess it's ok to bring it up. IMO the del statement is one of the relatively few constructs that stick out like a sore thumb. For one thing, it is overloaded to mean three different things: 1) del x: Remove x from the current namespace 2) del x[i]: Equivalent to x.__delitem__(i) 3) del x.a: Equivalent to x.__delattr__('a') and delattr(x,'a') Here I am mostly arguing for removing the last two; the first could also be removed if/when Python gets block namespaces, but it is orthogonal to the others. I don't see the point of complicating the lexer and the grammar with an extra keyword and statement for something that is typically handled by a method (my preference), or at least a generic (bulitin) function like len(). The last case is especially superfluous given that there is both a special method and a generic builtin (delattr) that does the same thing. Neither item nor attribute deletion are so pervasive to be granted special treatment at the language level. I wonder if this was considered and rejected in the Py3K discussions; PEP 3099 doesn't mention anything about it. George -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam at atlas.st Fri Sep 28 03:58:11 2007 From: adam at atlas.st (Adam Atlas) Date: Thu, 27 Sep 2007 21:58:11 -0400 Subject: [Python-ideas] Removing the del statement In-Reply-To: <91ad5bf80709271847g726efcf7l919fddcce6332c53@mail.gmail.com> References: <91ad5bf80709271847g726efcf7l919fddcce6332c53@mail.gmail.com> Message-ID: <6F2A191C-E194-4F01-8150-496BC3973424@atlas.st> On 27 Sep 2007, at 21:47, George Sakkis wrote: > I guess this has very few to zero chances of being considered, even > for Python 3, but this being python-ideas I guess it's ok to bring > it up. IMO the del statement is one of the relatively few > constructs that stick out like a sore thumb. For one thing, it is > overloaded to mean three different things: > 1) del x: Remove x from the current namespace > 2) del x[i]: Equivalent to x.__delitem__(i) > 3) del x.a: Equivalent to x.__delattr__('a') and delattr(x,'a') I guess this has very few to zero chances of being considered, even for Python 3, but this being python-ideas I guess it's ok to bring it up. IMO the = statement is one of the relatively few constructs that stick out like a sore thumb. For one thing, it is overloaded to mean three different things: 1) x = : Assign x in the current namespace 2) x[i] = : Equivalent to x.__setitem__(i) 3) x.a = : Equivalent to x.__setattr__('a') and setattr(x,'a') (Sorry for the slight sarcasm, but I hope you see my point. I don't see why the deletion statement should go while the perfectly complementary and nearly-identically-"overloaded" assignment statement should stay.) From gsakkis at rutgers.edu Fri Sep 28 04:13:11 2007 From: gsakkis at rutgers.edu (George Sakkis) Date: Thu, 27 Sep 2007 22:13:11 -0400 Subject: [Python-ideas] Removing the del statement In-Reply-To: <6F2A191C-E194-4F01-8150-496BC3973424@atlas.st> References: <91ad5bf80709271847g726efcf7l919fddcce6332c53@mail.gmail.com> <6F2A191C-E194-4F01-8150-496BC3973424@atlas.st> Message-ID: <91ad5bf80709271913i2b96e166yded2ed18ad4cb0c7@mail.gmail.com> On 9/27/07, Adam Atlas wrote: > > > On 27 Sep 2007, at 21:47, George Sakkis wrote: > > > I guess this has very few to zero chances of being considered, even > > for Python 3, but this being python-ideas I guess it's ok to bring > > it up. IMO the del statement is one of the relatively few > > constructs that stick out like a sore thumb. For one thing, it is > > overloaded to mean three different things: > > 1) del x: Remove x from the current namespace > > 2) del x[i]: Equivalent to x.__delitem__(i) > > 3) del x.a: Equivalent to x.__delattr__('a') and delattr(x,'a') > > > I guess this has very few to zero chances of being considered, even > for Python 3, but this being python-ideas I guess it's ok to bring it > up. IMO the = statement is one of the relatively few constructs that > stick out like a sore thumb. For one thing, it is overloaded to mean > three different things: > 1) x = : Assign x in the current namespace > 2) x[i] = : Equivalent to x.__setitem__(i) > 3) x.a = : Equivalent to x.__setattr__('a') and setattr(x,'a') > > > (Sorry for the slight sarcasm, but I hope you see my point. I don't > see why the deletion statement should go while the perfectly > complementary and nearly-identically-"overloaded" assignment > statement should stay.) Apples to oranges. I thought It would be obvious and that's why I didn't mention it, but getitem/setitem and friends use almost universally known punctuation; OTOH only Python AFAIK uses a keyword for a relatively infrequent operation. George -------------- next part -------------- An HTML attachment was scrubbed... URL: From bjourne at gmail.com Fri Sep 28 12:13:10 2007 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Fri, 28 Sep 2007 12:13:10 +0200 Subject: [Python-ideas] Removing the del statement In-Reply-To: <91ad5bf80709271847g726efcf7l919fddcce6332c53@mail.gmail.com> References: <91ad5bf80709271847g726efcf7l919fddcce6332c53@mail.gmail.com> Message-ID: <740c3aec0709280313o7830cb5uaa389e653eb2c334@mail.gmail.com> On 9/28/07, George Sakkis wrote: > I wonder if this was considered and rejected in the Py3K discussions; PEP > 3099 doesn't mention anything about it. Yes. del (and especially __del___) has been discussed on and off on the python-3000 list. http://mail.python.org/pipermail/python-3000/2006-September/003855.html http://mail.python.org/pipermail/python-3000/2007-May/007129.html http://mail.python.org/pipermail/python-3000/2007-May/007683.html I have used "del x" a few times to shorten the list of exported names in modules which helps epydoc. Never found any use for del x[i] or del x.a though. -- mvh Bj?rn From guido at python.org Fri Sep 28 16:42:34 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Sep 2007 07:42:34 -0700 Subject: [Python-ideas] Removing the del statement In-Reply-To: <740c3aec0709280313o7830cb5uaa389e653eb2c334@mail.gmail.com> References: <91ad5bf80709271847g726efcf7l919fddcce6332c53@mail.gmail.com> <740c3aec0709280313o7830cb5uaa389e653eb2c334@mail.gmail.com> Message-ID: On 9/28/07, BJ?rn Lindqvist wrote: > I have used "del x" a few times to shorten the list of exported names > in modules which helps epydoc. Never found any use for del x[i] or del > x.a though. You never use dictionaries? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From clarkksv at yahoo.com Sat Sep 29 13:05:40 2007 From: clarkksv at yahoo.com (Joseph Maurer) Date: Sat, 29 Sep 2007 04:05:40 -0700 (PDT) Subject: [Python-ideas] Enhance reload Message-ID: <844399.5421.qm@web58901.mail.re1.yahoo.com> I'm still new to the technical abilities of Python so help me if I misunderstand the current capabilities. I'd like to see the reload feature of Python enhanced so it can replace the methods for existing class instances, references to methods, and references to functions. Here's the scenario. Let's say you want to use Python as a macro language. Currently, you can bind a Python function to a key or menu (better do it by name and not reference). That's what most apps need. However, an advanced app like SlickEdit would have classes instances for modeless dialogs (including tool windows) and other data structures. There are also callbacks which would preferably need to be references to functions or methods. With the current implementation you would have to close and reopen dialogs. In other cases, you would need to exit SlickEdit and restart. While there always will be cases where this is necessary, I can tell you from experience that this is a great feature to have since Slick-C does this. I suspect that there are other scenarios that users would like this capability for. Java and C# support something like this to a limited extent when you are debugging. This capability could be a reload option. Their could be cases where one might want the existing instances to use the old implementation. You wouldn't need this to be an option for me. There will always be cases where you have to restart because you made too many changes. ____________________________________________________________________________________ Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. http://smallbusiness.yahoo.com/webhosting From steven.bethard at gmail.com Sat Sep 29 15:16:49 2007 From: steven.bethard at gmail.com (Steven Bethard) Date: Sat, 29 Sep 2007 07:16:49 -0600 Subject: [Python-ideas] Enhance reload In-Reply-To: <844399.5421.qm@web58901.mail.re1.yahoo.com> References: <844399.5421.qm@web58901.mail.re1.yahoo.com> Message-ID: On 9/29/07, Joseph Maurer wrote: > I'd like to see the reload feature of Python enhanced so it > can replace the methods for existing class instances, > references to methods, and references to functions. I'd be surprised if there's anyone out there who really doesn't want a better reload(). The real question is who's going to figure out how to implement it. ;-) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy From clarkksv at yahoo.com Sat Sep 29 16:14:54 2007 From: clarkksv at yahoo.com (Joseph Maurer) Date: Sat, 29 Sep 2007 07:14:54 -0700 (PDT) Subject: [Python-ideas] Enhance reload Message-ID: <644262.85437.qm@web58907.mail.re1.yahoo.com> I'm glad to hear it isn't a matter of whether it was useful or not. The way I implemented this feature in Slick-C is with indirection. In Python terms, this means that a separate data structure that isn't reference counted holds the method/function object data. The method/function object is changed to just contain a pointer to it. The data structure which holds all method/function data should probably be a non-reference counted dictionary. When a function is deleted, it's name remains in the dictionary but the entry needs to be changed to indicate that it is "null/invalid". When a deleted function is called, an exception should be raised. Adding a function/method means replacing the data in the dictionary. This type of implementation is simple. There's an insignificant amount of overhead on a function/method call (i.e. instead of "func->data" you have "func=*pfunc;if ( func->isInvalid() ) throw exception; else func->data" ). Technically this algorithm leaks memory since deleted functions/methods are never removed. My response is who cares. When the interpreter cleanup everything function is called, you simple deallocate everything in the hash table. Does anyone know what level of effort would be needed for something like this? Is my proposed implementation a good one for Python? ____________________________________________________________________________________ Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC From gsakkis at rutgers.edu Sat Sep 29 17:01:56 2007 From: gsakkis at rutgers.edu (George Sakkis) Date: Sat, 29 Sep 2007 11:01:56 -0400 Subject: [Python-ideas] Enhance reload In-Reply-To: <644262.85437.qm@web58907.mail.re1.yahoo.com> References: <644262.85437.qm@web58907.mail.re1.yahoo.com> Message-ID: <91ad5bf80709290801n60907f3ci1cf49039e8c69b59@mail.gmail.com> On 9/29/07, Joseph Maurer wrote: I'm glad to hear it isn't a matter of whether it was useful or not. > > The way I implemented this feature in Slick-C is with indirection. In > Python terms, this means that a separate data structure that isn't reference > counted holds the method/function object data. The method/function object > is changed to just contain a pointer to it. The data structure which holds > all method/function data should probably be a non-reference counted > dictionary. When a function is deleted, it's name remains in the dictionary > but the entry needs to be changed to indicate that it is > "null/invalid". When a deleted function is called, an exception should be > raised. Adding a function/method means replacing the data in the dictionary. > This type of implementation is simple. There's an insignificant amount of > overhead on a function/method call (i.e. instead of "func->data" you > have "func=*pfunc;if ( func->isInvalid() ) throw exception; else > func->data" ). > > Technically this algorithm leaks memory since deleted functions/methods > are never removed. My response is who cares. When the interpreter cleanup > everything function is called, you simple deallocate everything in the hash > table. > > Does anyone know what level of effort would be needed for something like > this? > Is my proposed implementation a good one for Python? You may want to take a look at a relevant Cookbook recipe: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/160164 George -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Sep 29 21:16:05 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 29 Sep 2007 15:16:05 -0400 Subject: [Python-ideas] Enhance reload References: <844399.5421.qm@web58901.mail.re1.yahoo.com> Message-ID: "Joseph Maurer" wrote in message news:844399.5421.qm at web58901.mail.re1.yahoo.com... | I'd like to see the reload feature of Python enhanced so it can replace the methods for existing class instances, references to methods, and references to functions. I think would we could get farther by restricting concern to replacing class attributes so that existing class instances would use their new definitions. As I understand, the problem is this. After somemod is imported, 'import somemod' simply binds 'somemod' to the existing module object, while 'reload somemod' replaces the module object with a new object with all new contents, while references to objects within the old module object remain as are. So I propose this. 'Reclass somemod' (by whatever syntax) would execute the corresponding code in a new namespace (dict). But instead of making that dict the __dict__ attribute of a new module, reclass would match class names with the existing __dict__, and replace the class.__dict__ attributes, so that subsequent access to class attributes, including particularly methods, would get the new versions. In other words use the existing indirection involved in attribute access. New classes could simple be added. Deleted classes could be disabled, but this really requires a restart after editing files that reference such classes, so deleting classes should not be done for the restricted reload uses this idea is aimed at. It would probably be possible to modify function objects (replace func_code, etc), but this is more difficult. It is simpler, at least for a beginning, to require that functions be put within a class when reclassing is anticipated. Terry Jan Reedy From brett at python.org Sat Sep 29 21:27:18 2007 From: brett at python.org (Brett Cannon) Date: Sat, 29 Sep 2007 12:27:18 -0700 Subject: [Python-ideas] Enhance reload In-Reply-To: References: <844399.5421.qm@web58901.mail.re1.yahoo.com> Message-ID: On 9/29/07, Terry Reedy wrote: > > "Joseph Maurer" wrote in > message news:844399.5421.qm at web58901.mail.re1.yahoo.com... > | I'd like to see the reload feature of Python enhanced so it can replace > the methods for existing class instances, references to methods, and > references to functions. > > I think would we could get farther by restricting concern to replacing > class attributes so that existing class instances would use their new > definitions. > > As I understand, the problem is this. After somemod is imported, 'import > somemod' simply binds 'somemod' to the existing module object, while > 'reload somemod' replaces the module object with a new object with all new > contents, while references to objects within the old module object remain > as are. > Actually, the way reload works is that it takes the module object from sys.modules, and then re-initializes mod.__dict__ (this is why import loaders must use any pre-existing module's __dict__ when loading). So the module object itself is not replaced, it's just that it's __dict__ is mutated in-place. -Brett From greg.ewing at canterbury.ac.nz Sun Sep 30 01:25:21 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 30 Sep 2007 11:25:21 +1200 Subject: [Python-ideas] Enhance reload In-Reply-To: <644262.85437.qm@web58907.mail.re1.yahoo.com> References: <644262.85437.qm@web58907.mail.re1.yahoo.com> Message-ID: <46FEDEE1.1020209@canterbury.ac.nz> Joseph Maurer wrote: > > The way I implemented this feature in Slick-C is with indirection... > > Is my proposed implementation a good one for Python? It's nowhere near detailed enough to be able to tell. When Steven said "figure out how to implement it", he meant working out the details, not just coming up with a high-level idea. What you suggest sounds like it ought to be possible, at first sight, since Python function objects are already containers with a reference to another object that holds the function's code. The problem will be figuring out *when* you're redefining a function, because the process of loading a module is a very dynamic one in Python. Defining functions and classes is done by executing code, not by statically analysing declarations as a C compiler does. -- Greg From clarkksv at yahoo.com Sun Sep 30 15:22:36 2007 From: clarkksv at yahoo.com (Joseph Maurer) Date: Sun, 30 Sep 2007 06:22:36 -0700 (PDT) Subject: [Python-ideas] Enhance reload Message-ID: <401200.41417.qm@web58907.mail.re1.yahoo.com> Greg Ewin wrote: > What you suggest sounds like it ought to be possible, > at first sight, since Python function objects are already > containers with a reference to another object that > holds the function's code. > The problem will be figuring out *when* you're redefining > a function, because the process of loading a module is a > very dynamic one in Python. Defining functions and classes > is done by executing code, not by statically analysing > declarations as a C compiler does. My implementation is definitely a high level scetch. Greg, The kind of issues you are bringing up is exactly the kind of thing I'm looking for. If there are more, lets see them. Would temporarily marking the module with "replace" work? I would think that when the function is defined, it has access to the module (because it is adding to its dictionary) and it could check for the "replace" attribute. I'm assuming a certain sequence of execution here since the "replace" attribute would have to removed after the function/method code was executed/loaded. Anyone who knows that this isn't the case, please shoot this down. Another post I read proposed a Reclass feature that only worked for classes. Given the macro language scenario, you definitely need functions too. ____________________________________________________________________________________ Tonight's top picks. What will you watch tonight? Preview the hottest shows on Yahoo! TV. http://tv.yahoo.com/ From clarkksv at yahoo.com Sun Sep 30 16:57:09 2007 From: clarkksv at yahoo.com (Joseph Maurer) Date: Sun, 30 Sep 2007 07:57:09 -0700 (PDT) Subject: [Python-ideas] Enhance reload Message-ID: <925784.74407.qm@web58910.mail.re1.yahoo.com> Ok, my idea of a temporary "replace" attribute wrapped around a reload-like function is not a good idea for Python given that things can be dynamically added to modules at any time. --------------------------------------- Here is my original high level design: The way I implemented this feature in Slick-C is with indirection. In Python terms, this means that a separate data structure that isn't reference counted holds the method/function object data. The method/function object is changed to just contain a pointer to it. The data structure which holds all method/function data should probably be a non-reference counted dictionary. When a function is deleted, it's name remains in the dictionary but the entry needs to be changed to indicate that it is "null/invalid". When a deleted function is called, an exception should be raised. Adding a function/method means replacing the data in the dictionary. This type of implementation is simple. There's an insignificant amount of overhead on a function/method call (i.e. instead of "func->data" you have "func=*pfunc;if ( func->isInvalid() ) throw exception; else func->data" ). Technically this algorithm leaks memory since deleted functions/methods are never removed. My response is who cares. When the interpreter cleanup everything function is called, you simple deallocate everything in the dictionary. --------------------------------- Instead of a temporary "replace" attribute wrapped into a reload-like call, how about giving modules a user settable "replace" attribute. At any time, the user can set/reset this attribute. This would specify how the user wanted functions/methods processed. Always added or always replaced. The "replace" attribute would likely need to be pass through to function objects, class objects, and method objects. For the macro language scenarios, I would just mark every module that got loaded with this attribute. The proposed implementation I have given is intended to by very "file" oriented (which maps to a Python module). Would this work in the current code base? I'm assuming the following: When a function is added/executed, the module structure is accessible. When a class is added (i.e. class myclass), the module structure is accessible. When a method is added/executed, at least the class structure is accessible? I hope you see where I'm going here. The executed "class myclass" code which defines a new class can copy the module "replace" attribute. the executed "def myfunction" code which defines a new method can copy the class "replace" attribute. The function/method object structure could remain the same except for the addition of a new function/method pointer member. The additional code for a function call would look like this: // Did this function get defined in "replace" mode? if ( func->doReplace() ) { // For this one, use the indirect pointer and not the other member data. func= func->pfunc; if ( !func->isValid() ) { throw exception. // Python exeception, not C++ exception return here... } } // Now do what we used to do Given the OO nature of Python, a separate function/method type for a replacable function/method could be defined but I suspect it isn't worth the effort. The above psuedo code is very efficient "doReplace" would probably be just a boolean/int member. The "isValid" called would be efficient as well. One thing my proposed implementation does not cover is adding new data members to a class. I think it is acceptable for this not to be handled. Please shot this down this high level implementation if it won't work in the current code base. Also, what does everyone think about the idea of some sort of "replace" attribute for the module? How should it get set? "import module; module.replace=1". I'm probably showing a little lack of knowledge here. Teach me and I'll get it. ____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ From jimjjewett at gmail.com Sun Sep 30 18:46:38 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 30 Sep 2007 12:46:38 -0400 Subject: [Python-ideas] Enhance reload In-Reply-To: <844399.5421.qm@web58901.mail.re1.yahoo.com> References: <844399.5421.qm@web58901.mail.re1.yahoo.com> Message-ID: On 9/29/07, Joseph Maurer wrote: > I'd like to see the reload feature of Python enhanced so > it can replace the methods for existing class instances, > references to methods, and references to functions. Guido did some work on this (as xreload) for Py3K, but gave up for the moment. In the meantime, you can sometimes get a bit closer with indirection. Instead of from othermod import anobject # won't notice if anobject is replaced later use import othermod ... othermod.anobject # always looks up the current anobject That said, subclasses and existing instances generally don't use indirection, because it is slower; replacing bound methods will almost certainly always require at least the moral equivalanent of reloading the dialog after reloading the module. -jJ