[Python-checkins] r63921 - peps/trunk/pep-0371.txt
david.goodger
python-checkins at python.org
Tue Jun 3 16:19:58 CEST 2008
Author: david.goodger
Date: Tue Jun 3 16:19:58 2008
New Revision: 63921
Log:
re-wrapped text to 70 columns
Modified:
peps/trunk/pep-0371.txt
Modified: peps/trunk/pep-0371.txt
==============================================================================
--- peps/trunk/pep-0371.txt (original)
+++ peps/trunk/pep-0371.txt Tue Jun 3 16:19:58 2008
@@ -14,50 +14,54 @@
Abstract
- This PEP proposes the inclusion of the pyProcessing [1] package into the
- Python standard library, renamed to "multiprocessing".
+ This PEP proposes the inclusion of the pyProcessing [1] package
+ into the Python standard library, renamed to "multiprocessing".
- The processing package mimics the standard library threading module and API
- to provide a process-based approach to "threaded programming" allowing
- end-users to dispatch multiple tasks that effectively side-step the global
- interpreter lock.
-
- The package also provides server and client functionality (processing.Manager)
- to provide remote sharing and management of objects and tasks so that
- applications may not only leverage multiple cores on the local machine,
- but also distribute objects and tasks across a cluster of networked machines.
-
- While the distributed capabilities of the package are beneficial, the primary
- focus of this PEP is the core threading-like API and capabilities of the
- package.
+ The processing package mimics the standard library threading
+ module and API to provide a process-based approach to "threaded
+ programming" allowing end-users to dispatch multiple tasks that
+ effectively side-step the global interpreter lock.
+
+ The package also provides server and client functionality
+ (processing.Manager) to provide remote sharing and management of
+ objects and tasks so that applications may not only leverage
+ multiple cores on the local machine, but also distribute objects
+ and tasks across a cluster of networked machines.
+
+ While the distributed capabilities of the package are beneficial,
+ the primary focus of this PEP is the core threading-like API and
+ capabilities of the package.
Rationale
- The current CPython interpreter implements the Global Interpreter Lock (GIL)
- and barring work in Python 3000 or other versions currently planned [2], the
- GIL will remain as-is within the CPython interpreter for the foreseeable
- future. While the GIL itself enables clean and easy to maintain C code for
- the interpreter and extensions base, it is frequently an issue for those
- Python programmers who are leveraging multi-core machines.
-
- The GIL itself prevents more than a single thread from running within the
- interpreter at any given point in time, effectively removing Python's
- ability to take advantage of multi-processor systems. While I/O bound
- applications do not suffer the same slow-down when using threading, they do
- suffer some performance cost due to the GIL.
-
- The pyProcessing package offers a method to side-step the GIL allowing
- applications within CPython to take advantage of multi-core architectures
- without asking users to completely change their programming paradigm (i.e.:
- dropping threaded programming for another "concurrent" approach - Twisted,
- etc).
-
- The Processing package offers CPython users a known API (that of the
- threading module), with known semantics and easy-scalability. In the
- future, the package might not be as relevant should the CPython interpreter
- enable "true" threading, however for some applications, forking an OS
- process may sometimes be more desirable than using lightweight threads,
- especially on those platforms where process creation is fast/optimized.
+ The current CPython interpreter implements the Global Interpreter
+ Lock (GIL) and barring work in Python 3000 or other versions
+ currently planned [2], the GIL will remain as-is within the
+ CPython interpreter for the foreseeable future. While the GIL
+ itself enables clean and easy to maintain C code for the
+ interpreter and extensions base, it is frequently an issue for
+ those Python programmers who are leveraging multi-core machines.
+
+ The GIL itself prevents more than a single thread from running
+ within the interpreter at any given point in time, effectively
+ removing Python's ability to take advantage of multi-processor
+ systems. While I/O bound applications do not suffer the same
+ slow-down when using threading, they do suffer some performance
+ cost due to the GIL.
+
+ The pyProcessing package offers a method to side-step the GIL
+ allowing applications within CPython to take advantage of
+ multi-core architectures without asking users to completely change
+ their programming paradigm (i.e.: dropping threaded programming
+ for another "concurrent" approach - Twisted, etc).
+
+ The Processing package offers CPython users a known API (that of
+ the threading module), with known semantics and easy-scalability.
+ In the future, the package might not be as relevant should the
+ CPython interpreter enable "true" threading, however for some
+ applications, forking an OS process may sometimes be more
+ desirable than using lightweight threads, especially on those
+ platforms where process creation is fast/optimized.
For example, a simple threaded application:
@@ -70,52 +74,56 @@
t.start()
t.join()
- The pyprocessing package mirrors the API so well, that with a simple change
- of the import to:
+ The pyprocessing package mirrors the API so well, that with a
+ simple change of the import to:
from processing import Process as worker
- The code now executes through the processing.Process class. This type of
- compatibility means that, with a minor (in most cases) change in code,
- users' applications will be able to leverage all cores and processors on a
- given machine for parallel execution. In many cases the pyprocessing package
- is even faster than the normal threading approach for I/O bound programs.
- This of course, takes into account that the pyprocessing package is in
- optimized C code, while the threading module is not.
+ The code now executes through the processing.Process class. This
+ type of compatibility means that, with a minor (in most cases)
+ change in code, users' applications will be able to leverage all
+ cores and processors on a given machine for parallel execution.
+ In many cases the pyprocessing package is even faster than the
+ normal threading approach for I/O bound programs. This of course,
+ takes into account that the pyprocessing package is in optimized C
+ code, while the threading module is not.
The "Distributed" Problem
- In the discussion on Python-Dev about the inclusion of this package [3] there
- was confusion about the intentions this PEP with an attempt to solve the
- "Distributed" problem - frequently comparing the functionality of this
- package with other solutions like MPI-based communication [4], CORBA, or
- other distributed object approaches [5].
-
- The "distributed" problem is large and varied. Each programmer working
- within this domain has either very strong opinions about their favorite
- module/method or a highly customized problem for which no existing solution
- works.
+ In the discussion on Python-Dev about the inclusion of this
+ package [3] there was confusion about the intentions this PEP with
+ an attempt to solve the "Distributed" problem - frequently
+ comparing the functionality of this package with other solutions
+ like MPI-based communication [4], CORBA, or other distributed
+ object approaches [5].
+
+ The "distributed" problem is large and varied. Each programmer
+ working within this domain has either very strong opinions about
+ their favorite module/method or a highly customized problem for
+ which no existing solution works.
The acceptance of this package does not preclude or recommend that
- programmers working on the "distributed" problem not examine other solutions
- for their problem domain. The intent of including this package is to provide
- entry-level capabilities for local concurrency and the basic support to
- spread that concurrency across a network of machines - although the two are
- not tightly coupled, the pyprocessing package could in fact, be used in
+ programmers working on the "distributed" problem not examine other
+ solutions for their problem domain. The intent of including this
+ package is to provide entry-level capabilities for local
+ concurrency and the basic support to spread that concurrency
+ across a network of machines - although the two are not tightly
+ coupled, the pyprocessing package could in fact, be used in
conjunction with any of the other solutions including MPI/etc.
- If necessary - it is possible to completely decouple the local concurrency
- abilities of the package from the network-capable/shared aspects of the
- package. Without serious concerns or cause however, the author of this PEP
- does not recommend that approach.
+ If necessary - it is possible to completely decouple the local
+ concurrency abilities of the package from the
+ network-capable/shared aspects of the package. Without serious
+ concerns or cause however, the author of this PEP does not
+ recommend that approach.
Performance Comparison
- As we all know - there are "lies, damned lies, and benchmarks". These speed
- comparisons, while aimed at showcasing the performance of the pyprocessing
- package, are by no means comprehensive or applicable to all possible use
- cases or environments. Especially for those platforms with sluggish process
- forking timing.
+ As we all know - there are "lies, damned lies, and benchmarks".
+ These speed comparisons, while aimed at showcasing the performance
+ of the pyprocessing package, are by no means comprehensive or
+ applicable to all possible use cases or environments. Especially
+ for those platforms with sluggish process forking timing.
All benchmarks were run using the following:
* 4 Core Intel Xeon CPU @ 3.00GHz
@@ -127,16 +135,17 @@
http://jessenoller.com/code/bench-src.tgz
The basic method of execution for these benchmarks is in the
- run_benchmarks.py script, which is simply a wrapper to execute a target
- function through a single threaded (linear), multi-threaded (via threading),
- and multi-process (via pyprocessing) function for a static number of
- iterations with increasing numbers of execution loops and/or threads.
+ run_benchmarks.py script, which is simply a wrapper to execute a
+ target function through a single threaded (linear), multi-threaded
+ (via threading), and multi-process (via pyprocessing) function for
+ a static number of iterations with increasing numbers of execution
+ loops and/or threads.
- The run_benchmarks.py script executes each function 100 times, picking the
- best run of that 100 iterations via the timeit module.
+ The run_benchmarks.py script executes each function 100 times,
+ picking the best run of that 100 iterations via the timeit module.
- First, to identify the overhead of the spawning of the workers, we execute
- an function which is simply a pass statement (empty):
+ First, to identify the overhead of the spawning of the workers, we
+ execute an function which is simply a pass statement (empty):
cmd: python run_benchmarks.py empty_func.py
Importing empty_func
@@ -157,11 +166,12 @@
threaded (8 threads) 0.007990 seconds
processes (8 procs) 0.005512 seconds
- As you can see, process forking via the pyprocessing package is faster than
- the speed of building and then executing the threaded version of the code.
+ As you can see, process forking via the pyprocessing package is
+ faster than the speed of building and then executing the threaded
+ version of the code.
- The second test calculates 50000 Fibonacci numbers inside of each thread
- (isolated and shared nothing):
+ The second test calculates 50000 Fibonacci numbers inside of each
+ thread (isolated and shared nothing):
cmd: python run_benchmarks.py fibonacci.py
Importing fibonacci
@@ -182,8 +192,8 @@
threaded (8 threads) 1.596824 seconds
processes (8 procs) 0.417899 seconds
- The third test calculates the sum of all primes below 100000, again sharing
- nothing.
+ The third test calculates the sum of all primes below 100000,
+ again sharing nothing.
cmd: run_benchmarks.py crunch_primes.py
Importing crunch_primes
@@ -204,17 +214,18 @@
threaded (8 threads) 5.109192 seconds
processes (8 procs) 1.077939 seconds
-
- The reason why tests two and three focused on pure numeric crunching is to
- showcase how the current threading implementation does hinder non-I/O
- applications. Obviously, these tests could be improved to use a queue for
- coordination of results and chunks of work but that is not required to show
- the performance of the package and core Processing module.
-
- The next test is an I/O bound test. This is normally where we see a steep
- improvement in the threading module approach versus a single-threaded
- approach. In this case, each worker is opening a descriptor to lorem.txt,
- randomly seeking within it and writing lines to /dev/null:
+ The reason why tests two and three focused on pure numeric
+ crunching is to showcase how the current threading implementation
+ does hinder non-I/O applications. Obviously, these tests could be
+ improved to use a queue for coordination of results and chunks of
+ work but that is not required to show the performance of the
+ package and core Processing module.
+
+ The next test is an I/O bound test. This is normally where we see
+ a steep improvement in the threading module approach versus a
+ single-threaded approach. In this case, each worker is opening a
+ descriptor to lorem.txt, randomly seeking within it and writing
+ lines to /dev/null:
cmd: python run_benchmarks.py file_io.py
Importing file_io
@@ -235,14 +246,14 @@
threaded (8 threads) 2.437204 seconds
processes (8 procs) 0.203438 seconds
- As you can see, pyprocessing is still faster on this I/O operation than
- using multiple threads. And using multiple threads is slower than the
- single threaded execution itself.
-
- Finally, we will run a socket-based test to show network I/O performance.
- This function grabs a URL from a server on the LAN that is a simple error
- page from tomcat. It gets the page 100 times. The network is silent, and a
- 10G connection:
+ As you can see, pyprocessing is still faster on this I/O operation
+ than using multiple threads. And using multiple threads is slower
+ than the single threaded execution itself.
+
+ Finally, we will run a socket-based test to show network I/O
+ performance. This function grabs a URL from a server on the LAN
+ that is a simple error page from tomcat. It gets the page 100
+ times. The network is silent, and a 10G connection:
cmd: python run_benchmarks.py url_get.py
Importing url_get
@@ -263,16 +274,19 @@
threaded (8 threads) 0.659298 seconds
processes (8 procs) 0.298625 seconds
- We finally see threaded performance surpass that of single-threaded
- execution, but the pyprocessing package is still faster when increasing the
- number of workers. If you stay with one or two threads/workers, then the
- timing between threads and pyprocessing is fairly close.
-
- One item of note however, is that there is an implicit overhead within the
- pyprocessing package's Queue implementation due to the object serialization.
+ We finally see threaded performance surpass that of
+ single-threaded execution, but the pyprocessing package is still
+ faster when increasing the number of workers. If you stay with
+ one or two threads/workers, then the timing between threads and
+ pyprocessing is fairly close.
+
+ One item of note however, is that there is an implicit overhead
+ within the pyprocessing package's Queue implementation due to the
+ object serialization.
- Alec Thomas provided a short example based on the run_benchmarks.py script
- to demonstrate this overhead versus the default Queue implementation:
+ Alec Thomas provided a short example based on the
+ run_benchmarks.py script to demonstrate this overhead versus the
+ default Queue implementation:
cmd: run_bench_queue.py
non_threaded (1 iters) 0.010546 seconds
@@ -291,21 +305,23 @@
threaded (8 threads) 0.184254 seconds
processes (8 procs) 0.302999 seconds
- Additional benchmarks can be found in the pyprocessing package's source
- distribution's examples/ directory. The examples will be included in the
- package's documentation.
+ Additional benchmarks can be found in the pyprocessing package's
+ source distribution's examples/ directory. The examples will be
+ included in the package's documentation.
Maintenance
- Richard M. Oudkerk - the author of the pyprocessing package has agreed to
- maintain the package within Python SVN. Jesse Noller has volunteered to
- also help maintain/document and test the package.
+ Richard M. Oudkerk - the author of the pyprocessing package has
+ agreed to maintain the package within Python SVN. Jesse Noller
+ has volunteered to also help maintain/document and test the
+ package.
API Naming
- The API of the pyprocessing package is designed to closely mimic that of
- the threading and Queue modules. It has been proposed that instead of
- adding the package as-is, we rename it to be PEP 8 compliant instead.
+ The API of the pyprocessing package is designed to closely mimic
+ that of the threading and Queue modules. It has been proposed that
+ instead of adding the package as-is, we rename it to be PEP 8
+ compliant instead.
Since the aim of the package is to be a drop-in for the threading
module, the authors feel that the current API should be used.
@@ -314,43 +330,50 @@
Timing/Schedule
- Some concerns have been raised about the timing/lateness of this PEP
- for the 2.6 and 3.0 releases this year, however it is felt by both
- the authors and others that the functionality this package offers
- surpasses the risk of inclusion.
-
- However, taking into account the desire not to destabilize Python-core, some
- refactoring of pyprocessing's code "into" Python-core can be withheld until
- the next 2.x/3.x releases. This means that the actual risk to Python-core
- is minimal, and largely constrained to the actual package itself.
+ Some concerns have been raised about the timing/lateness of this
+ PEP for the 2.6 and 3.0 releases this year, however it is felt by
+ both the authors and others that the functionality this package
+ offers surpasses the risk of inclusion.
+
+ However, taking into account the desire not to destabilize
+ Python-core, some refactoring of pyprocessing's code "into"
+ Python-core can be withheld until the next 2.x/3.x releases. This
+ means that the actual risk to Python-core is minimal, and largely
+ constrained to the actual package itself.
Open Issues
- * All existing tests for the package should be converted to UnitTest format.
+ * All existing tests for the package should be converted to
+ UnitTest format.
* Existing documentation has to be moved to ReST formatting.
* Verify code coverage percentage of existing test suite.
- * Identify any requirements to achieve a 1.0 milestone if required.
- * Verify current source tree conforms to standard library practices.
- * Rename top-level package from "pyprocessing" to "multiprocessing".
- * Confirm no "default" remote connection capabilities, if needed enable the
- remote security mechanisms by default for those classes which offer remote
- capabilities.
- * Some of the API (Queue methods qsize(), task_done() and join()) either
- need to be added, or the reason for their exclusion needs to be identified
- and documented clearly.
- * Add in "multiprocessing.setExecutable()" method to override the default
- behavior of the package to spawn processes using the current executable
- name rather than the Python interpreter. Note that Mark Hammond has
- suggested a factory-style interface for this[7].
- * Also note that the default behavior of process spawning does not make
- it compatible with use within IDLE as-is, this will be examined as
- a bug-fix or "setExecutable" enhancement.
+ * Identify any requirements to achieve a 1.0 milestone if
+ required.
+ * Verify current source tree conforms to standard library
+ practices.
+ * Rename top-level package from "pyprocessing" to
+ "multiprocessing".
+ * Confirm no "default" remote connection capabilities, if needed
+ enable the remote security mechanisms by default for those
+ classes which offer remote capabilities.
+ * Some of the API (Queue methods qsize(), task_done() and join())
+ either need to be added, or the reason for their exclusion needs
+ to be identified and documented clearly.
+ * Add in "multiprocessing.setExecutable()" method to override the
+ default behavior of the package to spawn processes using the
+ current executable name rather than the Python interpreter. Note
+ that Mark Hammond has suggested a factory-style interface for
+ this[7].
+ * Also note that the default behavior of process spawning does
+ not make it compatible with use within IDLE as-is, this will
+ be examined as a bug-fix or "setExecutable" enhancement.
Closed Issues
- * Reliance on ctypes: The pyprocessing package's reliance on ctypes prevents
- the package from functioning on platforms where ctypes is not supported.
- This is not a restriction of this package, but rather of ctypes.
+ * Reliance on ctypes: The pyprocessing package's reliance on
+ ctypes prevents the package from functioning on platforms where
+ ctypes is not supported. This is not a restriction of this
+ package, but rather of ctypes.
References
@@ -369,8 +392,9 @@
http://wiki.python.org/moin/ParallelProcessing
[6] The original run_benchmark.py code was published in Python
- Magazine in December 2008: "Python Threads and the Global Interpreter
- Lock" by Jesse Noller. It has been modified for this PEP.
+ Magazine in December 2008: "Python Threads and the Global
+ Interpreter Lock" by Jesse Noller. It has been modified for
+ this PEP.
[7] http://groups.google.com/group/python-dev2/msg/54cf06d15cbcbc34
More information about the Python-checkins
mailing list