Mailman 3 September 2007 - Python-ideas

is in operator
by Mathias Panzenböck Sept. 27, 2007

Sept. 27, 2007

Sometimes I want to compare a "pointer" to more then one others. The "in" operator would be handy, but it uses the "==" operator instead of the "is" operator. So a "is in" operator would be nice. Though I don't know how easy it is for a newbie to see what does what. # This: if x is in (a, b, c): ... # would be equivalent to this: if x is a or x is b or x is c: ... # And of course there should be a "is not in" operator, too: if x is not in (a, b, c): ... # this would be equivalent to tis: … [View More]

3 3

Re: [Python-ideas] Calling a function of a list without accumulating results
by Terry Jones Sept. 27, 2007

Sept. 27, 2007

Hi Brett & Adam Thanks for the replies. | Only after the list is completely constructed. List comprehensions | are literally 'for' loops with an append call to a method so without | extending the peepholer to notice this case and strip out the list | creation and appending it is not optimized. | | > A possible syntax change would be to allow the unadorned | > | > f(x) for x in mylist | > | > And raise an error if someone tries to assign to this. | | Go with the 'for' … [View More]

4 3

Re: [Python-ideas] Calling a function of a list without accumulating results
by Terry Jones Sept. 27, 2007

Sept. 27, 2007

Hi Brett | > The first is the same as one of the arguments for providing list | > comprehensions and generator expressions - because it makes common | > multi-line boilerplate much more concise. | | OK, the question is how common is this. I don't know. I use it maybe once every few weeks. There's a function to do this in Common Lisp (mapc), not that that means anything much. | But are you sure Python's grammar could support it? Parentheses are | needed for genexps in certain … [View More]situations for disambiguation because of | Python's LL(1) grammar. I don't know the answer to this either. I imagine it's a matter of tacking an optional "for ..." clause onto the end of an expression. The "for ..." part can certainly be handled (is being handled already), so I think it might not be too hard - supposing there is already a non-terminal for the "for ..." clause. | > The trivial case I posted isn't much of a win over the simple 2-line | > alternative, but it's easy to go to further: | > | > f(x, y) for x in myXlist for y in myYlist | > | > instead of | > | > for x in myXlist: | > for y in myYlist: | > f(x, y) | > | > and of course there are many more examples. | | Right, but the second one is so much easier to read and comprehend. I tend to agree, but the language supports the more concise form for both list comprehension and genexps, so there must be a fair number of people who thought it was a win to allow the compact form. | I think "force" is rather strong wording for "choice". OK, how about "lack of choice"? :-) Seriously (to take an example from the Python pocket ref page 24), you do have the choice to write a = [x for x in range(5) if x % 2 == 0] instead of a = [] for x in range(5): if x % 2 == 0: a.append(x) but you don't have a (simple) choice if you don't want to accumulate results. I'm merely saying that I think it would be cleaner and more consistent to allow print(x) for x in range(5) if x % 2 == 0 instead of having the non-choice but to write something like for x in range(5): if x % 2 == 0: print x Yes, I (and thank you for it) could now use your suggested for _ in ...: pass trick, but that's not really the whole point, to me. If the language can be made simpler and more consistent, I think that's generally a good thing. I know, I don't know anything about the implementation. But this is an ideas list. | Heck, I would vote to ditch listcomps for ``list(genexp)`` had genexps | come first and have the options trimmed down even more. Me too. But even if that eventuated, I'd _still_ propose allowing the unadorned genexp, for the case where you don't want the results. | Basically, unless you can go through the stdlib and find all the | instances of the pattern you want to prove it is common enough to | warrant support it none of the core developers will probably go for | this. Understood. Thanks, Terry [View Less]

4 3

Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures
by Brian Granger Sept. 26, 2007

Sept. 26, 2007

Thinking about how Python can better support parallelism and concurrency is an important topic. Here is how I see it: if we don't address the issue, the Python interpreter 5 or 10 years from now will run at roughly the same speed as it does today. This is because single CPU cores are not getting much faster (power consumption is too high). Instead, most of the performance gains in hardware will be due to increased hardware parallelism, which means multi/many core CPUs. What to do about … [View More]this pending crisis is a complicated issue. There are (at least) two levels that are important: 1. Language level features that make it possible to build higher-level libraries/tools for parallelism. 2. The high-level libraries/tools that most users and developers would use to express parallelism. I think it is absolutely critical that we worry about (1) before jumping to (2). So, some thoughts about (1). Does Python itself need to be changed to better enable people to write libraries for expressing parallelism? My answer to this is no. The dominant languages for parallel computing (C/C++/Fortran) don't really have any additional constructs or features above Python in this respect. Java has a more sophisticated support for threads. Erlang has concurrency built into its core. But, Python is not Erlang or Java. As Twisted demonstrates, Python as a language is plenty powerful enough to express concurrency in an elegant way. I am not saying that parallelism and concurrency is easy or wonderful today in Python, just that the language itself is not the problem. We don't necessarily need new language features, we simply need bright people to sit down and think about the right way to express parallelism in Python and then write libraries (maybe in the stdlib) that implement those ideas. But, there is a critical problem in CPython's implementation that prevents people from really breaking new ground in this area with Python. It is the GIL and here is why: * For the platforms on which Python runs, threads are what the hardware+OS people have given to us as the most fine grained way of mapping parallelism onto hardware. This is true, even if you have philosophical or existential problems with threads. With the limitations of the GIL, we can't take advantage of what hardware gives to us. * A process based solution using message passing is simply not suitable for many parallel algorithms that are communications bound. The shared state of threads is needed in many cases, not because sharing state is a "fantastic idea", but rather because it is fast. This will only become more true as multicore CPUs gain more sophisticated memory architectures with higher bandwidths. Also, the overhead of managing processes is much greater than with threads. Many exellent fine grained parallel approaches like Cilk would not be possible with processes only. * There are a number of powerful, high-level Python packages that already exist (these have been named in the various threads) that allow parallelism to be expressed. All of these suffer from a GIL related problem even though they are process based and use message passing. Regardless of whether you are using blocking/non-blocking sockets/IPC, you can't run long running CPU bound code, because all the network related stuff will stop. You then think, "OK, I will run the CPU intensive stuff in a different thread." If the CPU intensive code is just regular Python, you are fine, the Python interpreter will switch between the network thread and the CPU intensive thread every so often. But the second you run extension code that doesn't release the GIL, you are screwed. The network thread will die until the extension code is done. When it comes to implementing robust process based parallelism using sockets, the last thing you can afford is to have your networking black out like this, and in CPython it can't be avoided. <disclaimer> I am not saying that threads are what everyone should be using to express parallelism. I am only saying that they are needed to implement robust higher-level forms of parallelism on multicore systems, regardless of whether the solution is using process+ threads or threads alone. </disclaimer> Of the dozen or so "parallel Python" packages that currently exist, they _all_ suffer from this problem (some hide it better than others though using clever tricks). We can run but we can't hide. Because of these things, I think the current "Exploratory PEP" is entirely premature. Let's figure out exactly what to do with the GIL and _then_ think about the fun stuff. Brian [View Less]

10 20

Thread exceptions and interruption
by Adam Olsen Sept. 19, 2007

Sept. 19, 2007

One of the core problems with threading is what to do with exceptions and how to gracefully exit when one goes unhandled. My approach is to replace the independently spawned threads with "branches" off of your main thread's call stack. The standard example looks like this[1]: def handle_client(conn, addr): with conn: ... def accept_loop(server_conn): with branch() as clients: with server_conn: while True: clients.add(handle_client, *… [View More]

3 10

Exploration PEP : Concurrency for moderately massive (4 to 32 cores) multi-core architectures
by Krishna Sankar Sept. 16, 2007

Sept. 16, 2007

PEP: xxxxxxxx Title: Concurrency for moderately massive (4 to 32 cores) multi-core architectures Version: $Revision$ Last-Modified: $Date$ Author: Krishna Sankar <ksankar (at) doubleclix.net>, Status: Wandering ! (as in "Not all those who wander are lost ..." -J.R.R.Tolkien) Type: Process Content-Type: text/x-rst Created: 15-Sep-2007 Abstract -------- This proposal aims at leveraging the multi-core capability as an embedded mechanism in python. It is not whether python is slow or fast, … [View More]but of performance and control of parallelism/concurrency in a moderately massive parallelism world. The aim is 4 to 32 cores. The proposal advocates two mechanisms - one for task parallelism and another for data intensive parallelism. Scientific computing and web 2.0 frameworks are the forefront users for this proposal. Other applications would benefit as well. Rationale --------- Multicore architectures need no introductions and their ubiquity is evident. It is imperative that Python has one or more standard ways of leveraging multi-core architectures. OTOH, traditional thread based concurrency and lock based exclusions are becoming more and more difficult to program correctly. First of all, the question is not whether py is slow or fast but performance of a system written in py. Which means, ability to leverage multi-core architectures as well as control. Control in term of things like ability to pin one process/task to a core, ability to pin one or more homogeneous tasks to specific cores et al, as well as not wait for a global lock and similar primitives. (Before anybody jumps into a conclusion, this is not about GIL by any means ;o)) Second, it is clear that we need a good solution (not THE solution) for moderately massive parallelism in multi-core architectures (i.e. 8-32 cores). Share nothing might not be optimal; we need some form of memory sharing, not just copy all data via messages. May be functional programming based on the blackboard pattern would work, who knows. I have seen systems saturated still having only ~25% of CPU utilization (in a 4 core system!). It is because we didn't leverage multi-cores and parallelism. So while py3k will not be slow, lack of a cohesive multi-core strategy will show up in system performance and byte us later(pun intended!). At least, in my mind, this is not an exercise about exposing locks and mutexes or threads in Python. I do believe that the GIL will be refactored to more granularity in the coming months (similar to the Global Locks in Linux) and most probably we will get microThreads et al. As we all know, architecture is constraining as well as liberating. The language primitives influence greatly how we think about a problem. In the discussions, Guido is right in insisting on speed, and Bruce is right in asking for language constructs. Without pragmatic speed, folks won't use it; same is the case without the required constructs. Both are barriers to adoption. We have an opportunity to offer a solution for multi-core architectures and let us seize it - we will rush in where angels fear to tread! Programming Models ------------------ There are at least 3 possible paradigms A. conventional threading model B. Functional model, Erlang being the most appropriate C. Some form of limited shared memory model (message passing but pass pointers, blackboard model) D. Others, like Transactional Memory [2] There is enough literature out there, so do not plan to explain these here. (<KS> Do we need more explanation? </KS>) Pragmatic proposal ------------------ May I suggest we embed two primitives in Python 3K: A) A functional style share-nothing set of interfaces (and implementations thereof) - provides the task parallelism/concurrency capability, "small messages, big computations" as Joe Armstrong calls it[3] B) A limited shared memory based model for data intensive parallelism Most probably this would be part of stdlib. While Guido is almost right in saying that this is a (std)library problem, it is not fully so. We would need a few primitives from the underlying PVM substrate. Possibly one reason for Guido's position is the lack of clarity as to what needs to be changed and why. IMHO, just saying take GIL off does not solve the problem either. The Zen of Python parallelism ----------------------------- I draw inspiration for the very timely article by James Reinders in DDJ [1]. It embodies what we should be doing viz.: 1. Refactor the problem into parallel tasks. We cannot help if the domain is sequential 2. Program to abstraction & program chores not cores. Writing correct program using raw threads et al is difficult. Let the underlying substrate decide how best to optimize 3. Design for scale 4. Have an option to turn concurrency off, for debugging 5. Declarative parallelism based mechanisms (?) Related Efforts --------------- The good news is there are at least 2 or 3 paradigms with implementations and rough benchmarks. Hopefully we can leverage the implementations and mature them to stdlib (with required primitives in pvm) Parallel python http://www.artima.com/weblogs/viewpost.jsp?thread=214303 http://cheeseshop.python.org/pypi/parallel Processing http://cheeseshop.python.org/pypi/processing http://code.google.com/p/papyros/ Discussions ----------- There are at least four thread sets (pardon the pun !) I am aware of: 1. The GIL discussions in python-dev and Guido's blog on GIL http://www.artima.com/weblogs/viewpost.jsp?thread=214235 2. The py3k topics started by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214112, response by Guide http://www.artima.com/weblogs/viewpost.jsp?thread=214325 and reply to reply by Bruce http://www.artima.com/weblogs/viewpost.jsp?thread=214480 3. Python and concurrency http://mail.python.org/pipermail/python-ideas/2007-March/000338.html References ---------- [1]http://www.ddj.com/architect/201804248 [2]Transaction http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=444 [3]Programming Erlang by Joe Armstrong [View Less]

2 2

loop, breakif, skip
by Jim Hill Sept. 9, 2007

Sept. 9, 2007

These 4 proposals are somewhat inter-dependent, so I include them in a single message. They are simple ideas, childish almost. Think of non-programmers writing simple scripts, and children learning basic coding in grade school. Hope I'm not wasting your time with nonsense. (I'm not an advanced programmer, so can't be sure.) ------------ Proposal 1 Abolish 'continue' in loops, and use 'skip' instead. In normal English 'continue' means 'carry on at the next line'. In most programming … [View More]languages 'continue' means 'jump back UP the page to the start of this loop'. This is OK for programmers accustomed to C, but I find it very counter-intuitive, even though i know it means 'continue with the next iteration'. On the other hand, 'skip', meaning 'skip the rest of this iteration', feels more intuitive to me. 'continue' is too long, 'skip' is short. One new keyword. Breaks existing code. ------------ Proposal 2 an alternative to PEP-315, and a simpler way to write while loops of various flavours. part A 'loop:' is exactly equivalent to 'while True:' part B (1) '*breakif <condition>' is exactly equivalent to 'if <condition>: break' (2) '*skipif <condition>' is exactly equivalent to 'if <condition>: skip' (assuming 'skip' replaces 'continue') * is to make the word easier to find by human eye. would some other character do it better? 2a and 2b together allow while loops to optionally look something like this: loop: <statements> *breakif <condition> <statements> *skipif <condition> <statements> *breakif and *skipif can be used in for loops too, of course. 3 new keywords. Existing code would not be affected, unless it was already using loop, *breakif or *skipif as names. ------------ Proposal 3 Mainly for young students learning to program. the keyword 'loop' can be placed in front of the keyword 'while' the keyword 'loop' can be placed in front of the keyword 'for' without changing the meaning of 'while' or 'for'. Looks like this loop while <condition>: <statements> loop for <iteration expression>: <statements> Allows beginner students the satisfaction of thinking that every kind of loop begins with the word 'loop', which also makes learning a little easier. (Later they will learn that 'loop' can be left out.) Existing code would not be affected. ------------ Proposal 4 If 'continue' is not used in loops, it can have a more meaningful role in switch/case blocks. 'continue' in a Python switch block would have a meaning opposite to that of 'break' in a C switch block, allowing you to do 'fall-through'. Here 'continue' would have its intuitive meaning of 'carry on at the next line'. switch <expression>: case <values>: <statements> [continue] case <values>: <statements> [continue] case <values>: <statements> Existing code would not be affected, as switch/case is not implemented yet. ------------ Jim Hill [View Less]

2 1

FInd first tuple argument for str.find and str.index
by Ron Adam Sept. 6, 2007

Sept. 6, 2007

Could we add the ability of str.index and str.find to accept a tuple as the first argument and return the index of the first item found in it. This is similar to how str.startswith and str.endswith already works. | startswith(...) | S.startswith(prefix[, start[, end]]) -> bool | | Return True if S starts with the specified prefix, False otherwise. | With optional start, test S beginning at that position. | With optional end, stop comparing S at that … [View More]

2 2

Re: [Python-ideas] FInd first tuple argument for str.find and str.index
by Terry Jones Sept. 5, 2007

Sept. 5, 2007

Hi Ron >>>>> "Ron" == Ron Adam <rrr(a)ronadam.com> writes: Ron> I was thinking of something a bit more light weight. Ah, now we got to what you actually want to do :-) Ron> For more complex stuff I think the 're' module already does pretty Ron> much what you are describing. It may even already take advantage of Ron> the algorithms you referred to. If not, that would be an important Ron> improvement to the re module. :-) Yes, that would make a good SoC … [View More]

2 1

Re: [Python-ideas] FInd first tuple argument for str.find and str.index
by Terry Jones Sept. 5, 2007

Sept. 5, 2007

>>>>> "Mathias" == Mathias Panzenböck <grosser.meister.morti(a)gmx.net> writes: Mathias> I would expect such a method to return the index where one of the Mathias> given strings was found. Or maybe a tuple: (start, end) or a Mathias> tuple: (start, searchstring). It could do something like that if you passed an argument telling it to quit on the first match. But that makes the return type depend on the passed arg, which I guess is not good. We'd already be doing … [View More]

2 1