Mailman 3 October 2006 - Python-Dev

PEP 1, PEP Purpose and Guidelines
by barry＠zope.com 18 May '21

18 May '21

It has been a while since I posted a copy of PEP 1 to the mailing lists and newsgroups. I've recently done some updating of a few sections, so in the interest of gaining wider community participation in the Python development process, I'm posting the latest revision of PEP 1 here. A version of the PEP is always available on-line at http://www.python.org/peps/pep-0001.html Enjoy, -Barry -------------------- snip snip -------------------- PEP: 1 Title: PEP Purpose and Guidelines Version: $Revision: 1.36 $ Last-Modified: $Date: 2002/07/29 18:34:59 $ Author: Barry A. Warsaw, Jeremy Hylton Status: Active Type: Informational Created: 13-Jun-2000 Post-History: 21-Mar-2001, 29-Jul-2002 What is a PEP? PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python. The PEP should provide a concise technical specification of the feature and a rationale for the feature. We intend PEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. Because the PEPs are maintained as plain text files under CVS control, their revision history is the historical record of the feature proposal[1]. Kinds of PEPs There are two kinds of PEPs. A standards track PEP describes a new feature or implementation for Python. An informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore informational PEPs or follow their advice. PEP Work Flow The PEP editor, Barry Warsaw <peps(a)python.org>, assigns numbers for each PEP and changes its status. The PEP process begins with a new idea for Python. It is highly recommended that a single PEP contain a single key proposal or new idea. The more focussed the PEP, the more successfully it tends to be. The PEP editor reserves the right to reject PEP proposals if they appear too unfocussed or too broad. If in doubt, split your PEP into several well-focussed ones. Each PEP must have a champion -- someone who writes the PEP using the style and format described below, shepherds the discussions in the appropriate forums, and attempts to build community consensus around the idea. The PEP champion (a.k.a. Author) should first attempt to ascertain whether the idea is PEP-able. Small enhancements or patches often don't need a PEP and can be injected into the Python development work flow with a patch submission to the SourceForge patch manager[2] or feature request tracker[3]. The PEP champion then emails the PEP editor <peps(a)python.org> with a proposed title and a rough, but fleshed out, draft of the PEP. This draft must be written in PEP style as described below. If the PEP editor approves, he will assign the PEP a number, label it as standards track or informational, give it status 'draft', and create and check-in the initial draft of the PEP. The PEP editor will not unreasonably deny a PEP. Reasons for denying PEP status include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not in keeping with the Python philosophy. The BDFL (Benevolent Dictator for Life, Guido van Rossum) can be consulted during the approval phase, and is the final arbitrator of the draft's PEP-ability. If a pre-PEP is rejected, the author may elect to take the pre-PEP to the comp.lang.python newsgroup (a.k.a. python-list(a)python.org mailing list) to help flesh it out, gain feedback and consensus from the community at large, and improve the PEP for re-submission. The author of the PEP is then responsible for posting the PEP to the community forums, and marshaling community support for it. As updates are necessary, the PEP author can check in new versions if they have CVS commit permissions, or can email new PEP versions to the PEP editor for committing. Standards track PEPs consists of two parts, a design document and a reference implementation. The PEP should be reviewed and accepted before a reference implementation is begun, unless a reference implementation will aid people in studying the PEP. Standards Track PEPs must include an implementation - in the form of code, patch, or URL to same - before it can be considered Final. PEP authors are responsible for collecting community feedback on a PEP before submitting it for review. A PEP that has not been discussed on python-list(a)python.org and/or python-dev(a)python.org will not be accepted. However, wherever possible, long open-ended discussions on public mailing lists should be avoided. Strategies to keep the discussions efficient include, setting up a separate SIG mailing list for the topic, having the PEP author accept private comments in the early design phases, etc. PEP authors should use their discretion here. Once the authors have completed a PEP, they must inform the PEP editor that it is ready for review. PEPs are reviewed by the BDFL and his chosen consultants, who may accept or reject a PEP or send it back to the author(s) for revision. Once a PEP has been accepted, the reference implementation must be completed. When the reference implementation is complete and accepted by the BDFL, the status will be changed to `Final.' A PEP can also be assigned status `Deferred.' The PEP author or editor can assign the PEP this status when no progress is being made on the PEP. Once a PEP is deferred, the PEP editor can re-assign it to draft status. A PEP can also be `Rejected'. Perhaps after all is said and done it was not a good idea. It is still important to have a record of this fact. PEPs can also be replaced by a different PEP, rendering the original obsolete. This is intended for Informational PEPs, where version 2 of an API can replace version 1. PEP work flow is as follows: Draft -> Accepted -> Final -> Replaced ^ +----> Rejected v Deferred Some informational PEPs may also have a status of `Active' if they are never meant to be completed. E.g. PEP 1. What belongs in a successful PEP? Each PEP should have the following parts: 1. Preamble -- RFC822 style headers containing meta-data about the PEP, including the PEP number, a short descriptive title (limited to a maximum of 44 characters), the names, and optionally the contact info for each author, etc. 2. Abstract -- a short (~200 word) description of the technical issue being addressed. 3. Copyright/public domain -- Each PEP must either be explicitly labelled as placed in the public domain (see this PEP as an example) or licensed under the Open Publication License[4]. 4. Specification -- The technical specification should describe the syntax and semantics of any new language feature. The specification should be detailed enough to allow competing, interoperable implementations for any of the current Python platforms (CPython, JPython, Python .NET). 5. Motivation -- The motivation is critical for PEPs that want to change the Python language. It should clearly explain why the existing language specification is inadequate to address the problem that the PEP solves. PEP submissions without sufficient motivation may be rejected outright. 6. Rationale -- The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale should provide evidence of consensus within the community and discuss important objections or concerns raised during discussion. 7. Backwards Compatibility -- All PEPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The PEP must explain how the author proposes to deal with these incompatibilities. PEP submissions without a sufficient backwards compatibility treatise may be rejected outright. 8. Reference Implementation -- The reference implementation must be completed before any PEP is given status 'Final,' but it need not be completed before the PEP is accepted. It is better to finish the specification and rationale first and reach consensus on it before writing code. The final implementation must include test code and documentation appropriate for either the Python language reference or the standard library reference. PEP Template PEPs are written in plain ASCII text, and should adhere to a rigid style. There is a Python script that parses this style and converts the plain text PEP to HTML for viewing on the web[5]. PEP 9 contains a boilerplate[7] template you can use to get started writing your PEP. Each PEP must begin with an RFC822 style header preamble. The headers must appear in the following order. Headers marked with `*' are optional and are described below. All other headers are required. PEP: <pep number> Title: <pep title> Version: <cvs version string> Last-Modified: <cvs date string> Author: <list of authors' real names and optionally, email addrs> * Discussions-To: <email address> Status: <Draft | Active | Accepted | Deferred | Final | Replaced> Type: <Informational | Standards Track> * Requires: <pep numbers> Created: <date created on, in dd-mmm-yyyy format> * Python-Version: <version number> Post-History: <dates of postings to python-list and python-dev> * Replaces: <pep number> * Replaced-By: <pep number> The Author: header lists the names and optionally, the email addresses of all the authors/owners of the PEP. The format of the author entry should be address(a)dom.ain (Random J. User) if the email address is included, and just Random J. User if the address is not given. If there are multiple authors, each should be on a separate line following RFC 822 continuation line conventions. Note that personal email addresses in PEPs will be obscured as a defense against spam harvesters. Standards track PEPs must have a Python-Version: header which indicates the version of Python that the feature will be released with. Informational PEPs do not need a Python-Version: header. While a PEP is in private discussions (usually during the initial Draft phase), a Discussions-To: header will indicate the mailing list or URL where the PEP is being discussed. No Discussions-To: header is necessary if the PEP is being discussed privately with the author, or on the python-list or python-dev email mailing lists. Note that email addresses in the Discussions-To: header will not be obscured. Created: records the date that the PEP was assigned a number, while Post-History: is used to record the dates of when new versions of the PEP are posted to python-list and/or python-dev. Both headers should be in dd-mmm-yyyy format, e.g. 14-Aug-2001. PEPs may have a Requires: header, indicating the PEP numbers that this PEP depends on. PEPs may also have a Replaced-By: header indicating that a PEP has been rendered obsolete by a later document; the value is the number of the PEP that replaces the current document. The newer PEP must have a Replaces: header containing the number of the PEP that it rendered obsolete. PEP Formatting Requirements PEP headings must begin in column zero and the initial letter of each word must be capitalized as in book titles. Acronyms should be in all capitals. The body of each section must be indented 4 spaces. Code samples inside body sections should be indented a further 4 spaces, and other indentation can be used as required to make the text readable. You must use two blank lines between the last line of a section's body and the next section heading. You must adhere to the Emacs convention of adding two spaces at the end of every sentence. You should fill your paragraphs to column 70, but under no circumstances should your lines extend past column 79. If your code samples spill over column 79, you should rewrite them. Tab characters must never appear in the document at all. A PEP should include the standard Emacs stanza included by example at the bottom of this PEP. A PEP must contain a Copyright section, and it is strongly recommended to put the PEP in the public domain. When referencing an external web page in the body of a PEP, you should include the title of the page in the text, with a footnote reference to the URL. Do not include the URL in the body text of the PEP. E.g. Refer to the Python Language web site [1] for more details. ... [1] http://www.python.org When referring to another PEP, include the PEP number in the body text, such as "PEP 1". The title may optionally appear. Add a footnote reference that includes the PEP's title and author. It may optionally include the explicit URL on a separate line, but only in the References section. Note that the pep2html.py script will calculate URLs automatically, e.g.: ... Refer to PEP 1 [7] for more information about PEP style ... References [7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton http://www.python.org/peps/pep-0001.html If you decide to provide an explicit URL for a PEP, please use this as the URL template: http://www.python.org/peps/pep-xxxx.html PEP numbers in URLs must be padded with zeros from the left, so as to be exactly 4 characters wide, however PEP numbers in text are never padded. Reporting PEP Bugs, or Submitting PEP Updates How you report a bug, or submit a PEP update depends on several factors, such as the maturity of the PEP, the preferences of the PEP author, and the nature of your comments. For the early draft stages of the PEP, it's probably best to send your comments and changes directly to the PEP author. For more mature, or finished PEPs you may want to submit corrections to the SourceForge bug manager[6] or better yet, the SourceForge patch manager[2] so that your changes don't get lost. If the PEP author is a SF developer, assign the bug/patch to him, otherwise assign it to the PEP editor. When in doubt about where to send your changes, please check first with the PEP author and/or PEP editor. PEP authors who are also SF committers, can update the PEPs themselves by using "cvs commit" to commit their changes. Remember to also push the formatted PEP text out to the web by doing the following: % python pep2html.py -i NUM where NUM is the number of the PEP you want to push out. See % python pep2html.py --help for details. Transferring PEP Ownership It occasionally becomes necessary to transfer ownership of PEPs to a new champion. In general, we'd like to retain the original author as a co-author of the transferred PEP, but that's really up to the original author. A good reason to transfer ownership is because the original author no longer has the time or interest in updating it or following through with the PEP process, or has fallen off the face of the 'net (i.e. is unreachable or not responding to email). A bad reason to transfer ownership is because you don't agree with the direction of the PEP. We try to build consensus around a PEP, but if that's not possible, you can always submit a competing PEP. If you are interested assuming ownership of a PEP, send a message asking to take over, addressed to both the original author and the PEP editor <peps(a)python.org>. If the original author doesn't respond to email in a timely manner, the PEP editor will make a unilateral decision (it's not like such decisions can be reversed. :). References and Footnotes [1] This historical record is available by the normal CVS commands for retrieving older revisions. For those without direct access to the CVS tree, you can browse the current and past PEP revisions via the SourceForge web site at http://cvs.sourceforge.net/cgi-bin/cvsweb.cgi/python/nondist/peps/?cvsroot=… [2] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [3] http://sourceforge.net/tracker/?atid=355470&group_id=5470&func=browse [4] http://www.opencontent.org/openpub/ [5] The script referred to here is pep2html.py, which lives in the same directory in the CVS tree as the PEPs themselves. Try "pep2html.py --help" for details. The URL for viewing PEPs on the web is http://www.python.org/peps/ [6] http://sourceforge.net/tracker/?group_id=5470&atid=305470 [7] PEP 9, Sample PEP Template http://www.python.org/peps/pep-0009.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:

8 14

pthreads, fork, import, and execvp
by Rotem Yaari 16 Sep '09

16 Sep '09

Hello everyone! We have been encountering several deadlocks in a threaded Python application which calls subprocess.Popen (i.e. fork()) in some of its threads. This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel. Preliminary analysis of the hang shows that the child process blocks upon entering the execvp function, in which the import_lock is acquired due to the following line: def _ execvpe(file, args, env=None): from errno import ENOENT, ENOTDIR ... It is known that when forking from a pthreaded application, acquisition attempts on locks which were already locked by other threads while fork() was called will deadlock. Due to these oddities we were wondering if it would be better to extract the above import line from the execvpe call, to prevent lock acquisition attempts in such cases. Another workaround could be re-assigning a new lock to import_lock (such a thing is done with the global interpreter lock) at PyOS_AfterFork or pthread_atfork. We'd appreciate any opinions you might have on the subject. Thanks in advance, Yair and Rotem

12 29

Re: subprocess and EINTR errnos
by Peter Astrand 06 Jul '09

06 Jul '09

On Wed, 10 Nov 2004, John P Speno wrote: Hi, sorry for the delayed response. > While using subprocess (aka popen5), I came across one potential gotcha. I've had > exceptions ending like this: > > File "test.py", line 5, in test > cmd = popen5.Popen(args, stdout=PIPE) > File "popen5.py", line 577, in __init__ > data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB > OSError: [Errno 4] Interrupted system call > > (on Solaris 9) > > Would it make sense for subprocess to use a more robust read() function > which can handle these cases, i.e. when the parent's read on the pipe > to the child's stderr is interrupted by a system call, and returns EINTR? > I imagine it could catch EINTR and EAGAIN and retry the failed read(). I assume you are using signals in your application? The os.read above is not the only system call that can fail with EINTR. subprocess.py is full of other system calls that can fail, and I suspect that many other Python modules are as well. I've made a patch (attached) to subprocess.py (and test_subprocess.py) that should guard against EINTR, but I haven't committed it yet. It's quite large. Are Python modules supposed to handle EINTR? Why not let the C code handle this? Or, perhaps the signal module should provide a sigaction function, so that users can use SA_RESTART. Index: subprocess.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/subprocess.py,v retrieving revision 1.8 diff -u -r1.8 subprocess.py --- subprocess.py 7 Nov 2004 14:30:34 -0000 1.8 +++ subprocess.py 17 Nov 2004 19:42:30 -0000 @@ -888,6 +888,50 @@ pass + def _read_no_intr(self, fd, buffersize): + """Like os.read, but retries on EINTR""" + while True: + try: + return os.read(fd, buffersize) + except OSError, e: + if e.errno == errno.EINTR: + continue + else: + raise + + + def _read_all(self, fd, buffersize): + """Like os.read, but retries on EINTR, and reads until EOF""" + all = "" + while True: + data = self._read_no_intr(fd, buffersize) + all += data + if data == "": + return all + + + def _write_no_intr(self, fd, s): + """Like os.write, but retries on EINTR""" + while True: + try: + return os.write(fd, s) + except OSError, e: + if e.errno == errno.EINTR: + continue + else: + raise + + def _waitpid_no_intr(self, pid, options): + """Like os.waitpid, but retries on EINTR""" + while True: + try: + return os.waitpid(pid, options) + except OSError, e: + if e.errno == errno.EINTR: + continue + else: + raise + def _execute_child(self, args, executable, preexec_fn, close_fds, cwd, env, universal_newlines, startupinfo, creationflags, shell, @@ -963,7 +1007,7 @@ exc_value, tb) exc_value.child_traceback = ''.join(exc_lines) - os.write(errpipe_write, pickle.dumps(exc_value)) + self._write_no_intr(errpipe_write, pickle.dumps(exc_value)) # This exitcode won't be reported to applications, so it # really doesn't matter what we return. @@ -979,7 +1023,7 @@ os.close(errwrite) # Wait for exec to fail or succeed; possibly raising exception - data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB + data = self._read_all(errpipe_read, 1048576) # Exceptions limited to 1 MB os.close(errpipe_read) if data != "": child_exception = pickle.loads(data) @@ -1003,7 +1047,7 @@ attribute.""" if self.returncode == None: try: - pid, sts = os.waitpid(self.pid, os.WNOHANG) + pid, sts = self._waitpid_no_intr(self.pid, os.WNOHANG) if pid == self.pid: self._handle_exitstatus(sts) except os.error: @@ -1015,7 +1059,7 @@ """Wait for child process to terminate. Returns returncode attribute.""" if self.returncode == None: - pid, sts = os.waitpid(self.pid, 0) + pid, sts = self._waitpid_no_intr(self.pid, 0) self._handle_exitstatus(sts) return self.returncode @@ -1049,27 +1093,33 @@ stderr = [] while read_set or write_set: - rlist, wlist, xlist = select.select(read_set, write_set, []) + try: + rlist, wlist, xlist = select.select(read_set, write_set, []) + except select.error, e: + if e[0] == errno.EINTR: + continue + else: + raise if self.stdin in wlist: # When select has indicated that the file is writable, # we can write up to PIPE_BUF bytes without risk # blocking. POSIX defines PIPE_BUF >= 512 - bytes_written = os.write(self.stdin.fileno(), input[:512]) + bytes_written = self._write_no_intr(self.stdin.fileno(), input[:512]) input = input[bytes_written:] if not input: self.stdin.close() write_set.remove(self.stdin) if self.stdout in rlist: - data = os.read(self.stdout.fileno(), 1024) + data = self._read_no_intr(self.stdout.fileno(), 1024) if data == "": self.stdout.close() read_set.remove(self.stdout) stdout.append(data) if self.stderr in rlist: - data = os.read(self.stderr.fileno(), 1024) + data = self._read_no_intr(self.stderr.fileno(), 1024) if data == "": self.stderr.close() read_set.remove(self.stderr) Index: test/test_subprocess.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/test/test_subprocess.py,v retrieving revision 1.14 diff -u -r1.14 test_subprocess.py --- test/test_subprocess.py 12 Nov 2004 15:51:48 -0000 1.14 +++ test/test_subprocess.py 17 Nov 2004 19:42:30 -0000 @@ -7,6 +7,7 @@ import tempfile import time import re +import errno mswindows = (sys.platform == "win32") @@ -35,6 +36,16 @@ fname = tempfile.mktemp() return os.open(fname, os.O_RDWR|os.O_CREAT), fname + def read_no_intr(self, obj): + while True: + try: + return obj.read() + except IOError, e: + if e.errno == errno.EINTR: + continue + else: + raise + # # Generic tests # @@ -123,7 +134,7 @@ p = subprocess.Popen([sys.executable, "-c", 'import sys; sys.stdout.write("orange")'], stdout=subprocess.PIPE) - self.assertEqual(p.stdout.read(), "orange") + self.assertEqual(self.read_no_intr(p.stdout), "orange") def test_stdout_filedes(self): # stdout is set to open file descriptor @@ -151,7 +162,7 @@ p = subprocess.Popen([sys.executable, "-c", 'import sys; sys.stderr.write("strawberry")'], stderr=subprocess.PIPE) - self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()), + self.assertEqual(remove_stderr_debug_decorations(self.read_no_intr(p.stderr)), "strawberry") def test_stderr_filedes(self): @@ -186,7 +197,7 @@ 'sys.stderr.write("orange")'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) - output = p.stdout.read() + output = self.read_no_intr(p.stdout) stripped = remove_stderr_debug_decorations(output) self.assertEqual(stripped, "appleorange") @@ -220,7 +231,7 @@ stdout=subprocess.PIPE, cwd=tmpdir) normcase = os.path.normcase - self.assertEqual(normcase(p.stdout.read()), normcase(tmpdir)) + self.assertEqual(normcase(self.read_no_intr(p.stdout)), normcase(tmpdir)) def test_env(self): newenv = os.environ.copy() @@ -230,7 +241,7 @@ 'sys.stdout.write(os.getenv("FRUIT"))'], stdout=subprocess.PIPE, env=newenv) - self.assertEqual(p.stdout.read(), "orange") + self.assertEqual(self.read_no_intr(p.stdout), "orange") def test_communicate(self): p = subprocess.Popen([sys.executable, "-c", @@ -305,7 +316,8 @@ 'sys.stdout.write("\\nline6");'], stdout=subprocess.PIPE, universal_newlines=1) - stdout = p.stdout.read() + + stdout = self.read_no_intr(p.stdout) if hasattr(open, 'newlines'): # Interpreter with universal newline support self.assertEqual(stdout, @@ -343,7 +355,7 @@ def test_no_leaking(self): # Make sure we leak no resources - max_handles = 1026 # too much for most UNIX systems + max_handles = 10 # too much for most UNIX systems if mswindows: max_handles = 65 # a full test is too slow on Windows for i in range(max_handles): @@ -424,7 +436,7 @@ 'sys.stdout.write(os.getenv("FRUIT"))'], stdout=subprocess.PIPE, preexec_fn=lambda: os.putenv("FRUIT", "apple")) - self.assertEqual(p.stdout.read(), "apple") + self.assertEqual(self.read_no_intr(p.stdout), "apple") def test_args_string(self): # args is a string @@ -457,7 +469,7 @@ p = subprocess.Popen(["echo $FRUIT"], shell=1, stdout=subprocess.PIPE, env=newenv) - self.assertEqual(p.stdout.read().strip(), "apple") + self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple") def test_shell_string(self): # Run command through the shell (string) @@ -466,7 +478,7 @@ p = subprocess.Popen("echo $FRUIT", shell=1, stdout=subprocess.PIPE, env=newenv) - self.assertEqual(p.stdout.read().strip(), "apple") + self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple") def test_call_string(self): # call() function with string argument on UNIX @@ -525,7 +537,7 @@ p = subprocess.Popen(["set"], shell=1, stdout=subprocess.PIPE, env=newenv) - self.assertNotEqual(p.stdout.read().find("physalis"), -1) + self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1) def test_shell_string(self): # Run command through the shell (string) @@ -534,7 +546,7 @@ p = subprocess.Popen("set", shell=1, stdout=subprocess.PIPE, env=newenv) - self.assertNotEqual(p.stdout.read().find("physalis"), -1) + self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1) def test_call_string(self): # call() function with string argument on Windows /Peter Åstrand <astrand(a)lysator.liu.se>

5 5

Python + Java Integration
by Chas Emerick 05 Dec '08

05 Dec '08

This may seem like it's coming out of left field for a minute, but bear with me. There is no doubt that Ruby's success is a concern for anyone who sees it as diminishing Python's status. One of the reasons for Ruby's success is certainly the notion (originally advocated by Bruce Tate, if I'm not mistaken) that it is the "next Java" -- the language and environment that mainstream Java developers are, or will, look to as a natural next step. One thing that would help Python in this "debate" (or, perhaps simply put it in the running, at least as a "next Java" candidate) would be if Python had an easier migration path for Java developers that currently rely upon various third-party libraries. The wealth of third-party libraries available for Java has always been one of its great strengths. Ergo, if Python had an easy-to-use, recommended way to use those libraries within the Python environment, that would be a significant advantage to present to Java developers and those who would choose Ruby over Java. Platform compatibility is always a huge motivator for those looking to migrate or upgrade. In that vein, I would point to JPype (http://jpype.sourceforge.net). JPype is a module that gives "python programs full access to java class libraries". My suggestion would be to either: (a) include JPype in the standard library, or barring that, (b) make a very strong push to support JPype (a) might be difficult or cumbersome technically, as JPype does need to build against Java headers, which may or may not be possible given the way that Python is distributed, etc. However, (b) is very feasible. I can't really say what "supporting JPype" means exactly -- maybe GvR and/or other heavyweights in the Python community make public statements regarding its existence and functionality, maybe JPype gets a strong mention or placement on python.org....all those details are obviously not up to me, and I don't know the workings of the "official" Python organizations enough to make serious suggestions. Regardless of the form of support, I think raising people's awareness of JPype and what it adds to the Python environment would be a Good Thing (tm). For our part, we've used JPype to make PDFTextStream (our previously Java-only PDF text extraction library) available and supported for Python. You can read some about it here: http://snowtide.com/PDFTextStream.Python And I've blogged about how PDFTextStream.Python came about, and how we worked with Steve Ménard, the maintainer of JPype, to make it all happen (watch out for this URL wrapping): http://blog.snowtide.com/2006/08/21/working-together-pythonjava-open- sourcecommercial Cheers, Chas Emerick Founder, Snowtide Informatics Systems Enterprise-class PDF content extraction cemerick(a)snowtide.com http://snowtide.com | +1 413.519.6365

3 2

Re: [Python-Dev] [Python-checkins] r45510 - python/trunk/Lib/pkgutil.py python/trunk/Lib/pydoc.py
by M.-A. Lemburg 21 Mar '08

21 Mar '08

Phillip.eby wrote: > Author: phillip.eby > Date: Tue Apr 18 02:59:55 2006 > New Revision: 45510 > > Modified: > python/trunk/Lib/pkgutil.py > python/trunk/Lib/pydoc.py > Log: > Second phase of refactoring for runpy, pkgutil, pydoc, and setuptools > to share common PEP 302 support code, as described here: > > http://mail.python.org/pipermail/python-dev/2006-April/063724.html Shouldn't this new module be named "pkglib" to be in line with the naming scheme used for all the other utility modules, e.g. httplib, imaplib, poplib, etc. ? > pydoc now supports PEP 302 importers, by way of utility functions in > pkgutil, such as 'walk_packages()'. It will properly document > modules that are in zip files, and is backward compatible to Python > 2.3 (setuptools installs for Python <2.5 will bundle it so pydoc > doesn't break when used with eggs.) Are you saying that the installation of setuptools in Python 2.3 and 2.4 will then overwrite the standard pydoc included with those versions ? I think that's the wrong way to go if not made an explicit option in the installation process or a separate installation altogether. I bothered by the fact that installing setuptools actually changes the standard Python installation by either overriding stdlib modules or monkey-patching them at setuptools import time. > What has not changed is that pydoc command line options do not support > zip paths or other importer paths, and the webserver index does not > support sys.meta_path. Those are probably okay as limitations. > > Tasks remaining: write docs and Misc/NEWS for pkgutil/pydoc changes, > and update setuptools to use pkgutil wherever possible, then add it > to the stdlib. Add setuptools to the stdlib ? I'm still missing the PEP for this along with the needed discussion touching among other things, the change of the distutils standard "python setup.py install" to install an egg instead of a site package. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 18 2006) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

14 51

Making builtins more efficient
by Steven Elliott 14 Apr '07

14 Apr '07

I'm interested in how builtins could be more efficient. I've read over some of the PEPs having to do with making global variables more efficient (search for "global"): http://www.python.org/doc/essays/pepparade.html But I think the problem can be simplified by focusing strictly on builtins. One of my assumptions is that only a small fractions of modules override the default builtins with something like: import mybuiltins __builtins__ = mybuiltins As you probably know each access of a builtin requires two hash table lookups. First, the builtin is not found in the list of globals. It is then found in the list of builtins. Why not have a means of referencing the default builtins with some sort of index the way the LOAD_FAST op code currently works? In other words, by default each module gets the default set of builtins indexed (where the index indexes into an array) in a certain order. The version stored in the pyc file would be bumped each time the set of default builtins is changed. I don't have very strong feelings whether things like True = (1 == 1) would be a syntax error, but assigning to a builtin could just do the equivalent of STORE_FAST. I also don't have very strong feelings about whether the array of default builtins would be shared between modules. To simulate the current behavior where attempting to assign to builtin actually alters that module's global hashtable a separate array of builtins could be used for each module. As to assigning to __builtins__ (like I mentioned at the beginning of this post) perhaps it could assign to the builtin array for those items that have a name that matches a default builtin (such as "True" or "len"). Those items that don't match a default builtin would just create global variables. Perhaps what I'm suggesting isn't feasible for reasons that have already been discussed. But it seems like it should be possible to make "while True" as efficient as "while 1". -- ----------------------------------------------------------------------- | Steven Elliott | selliott4(a)austin.rr.com | -----------------------------------------------------------------------

13 27

GeneratorExit inheriting from Exception
by Nick Coghlan 28 Feb '07

28 Feb '07

Should GeneratorExit inherit from Exception or BaseException? Currently, a generator that catches Exception and continues on to yield another value can't be closed properly (you get a runtime error pointing out that the generator ignored GeneratorExit). The only decent reference I could find to it in the old PEP 348/352 discussions is Guido writing [1]: > when GeneratorExit or StopIteration > reach the outer level of an app, it's a bug like all the others that > bare 'except:' WANTS to catch. (at that point in the conversation, I believe bare except was considered the equivalent of "except Exception:") While I agree with what Guido says about GeneratorExit being a bug if it reaches the outer level of an app, it seems like a bit of a trap that a correctly written generator can't write "except Exception:" without preceding it with an "except GeneratorExit:" that reraises the exception. Isn't that exactly the idiom we're trying to get rid of for SystemExit and KeyboardInterrupt? Regards, Nick. [1] http://mail.python.org/pipermail/python-dev/2005-August/055173.html -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

16 41

Status of pairing_heap.py?
by Paul Chiusano 05 Nov '06

05 Nov '06

I was looking for a good pairing_heap implementation and came across one that had apparently been checked in a couple years ago (!). Here is the full link: http://svn.python.org/view/sandbox/trunk/collections/pairing_heap.py?rev=40… I was just wondering about the status of this implementation. The api looks pretty good to me -- it's great that the author decided to have the insert method return a node reference which can then be passed to delete and adjust_key. It's a bit of a pain to implement that functionality, but it's extremely useful for a number of applications. If that project is still alive, I have a couple api suggestions: * Add a method which nondestructively yields the top K elements of the heap. This would work by popping the top k elements of the heap into a list, then reinserting those elements in reverse order. By reinserting the sorted elements in reverse order, the top of the heap is essentially a sorted linked list, so if the exact operation is repeated again, the removals take contant time rather than amortized logarthmic. * So, for example: if we have a min heap, the topK method would pop K elements from the heap, say they are {1, 3, 5, 7}, then do insert(7), followed by insert(5), ... insert(1). * Even better might be if this operation avoided having to allocate new heap nodes, and just reused the old ones. * I'm not sure if adjust_key should throw an exception if the key adjustment is in the wrong direction. Perhaps it should just fall back on deleting and reinserting that node? Paul

3 6

PATCH submitted: Speed up + for string concatenation, now as fast as "".join(x) idiom
by Larry Hastings 04 Nov '06

04 Nov '06

I've never liked the "".join([]) idiom for string concatenation; in my opinion it violates the principles "Beautiful is better than ugly." and "There should be one-- and preferably only one --obvious way to do it.". (And perhaps several others.) To that end I've submitted patch #1569040 to SourceForge: http://sourceforge.net/tracker/index.php?func=detail&aid=1569040&group_id=5… This patch speeds up using + for string concatenation. It's been in discussion on c.l.p for about a week, here: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc… I'm not a Python guru, and my initial benchmark had many mistakes. With help from the community correct benchmarks emerged: + for string concatenation is now roughly as fast as the usual "".join() idiom when appending. (It appears to be *much* faster for prepending.) The patched Python passes all the tests in regrtest.py for which I have source; I didn't install external packages such as bsddb and sqlite3. My approach was to add a "string concatenation" object; I have since learned this is also called a "rope". Internally, a PyStringConcatationObject is exactly like a PyStringObject but with a few extra members taking an additional thirty-six bytes of storage. When you add two PyStringObjects together, string_concat() returns a PyStringConcatationObject which contains references to the two strings. Concatenating any mixture of PyStringObjects and PyStringConcatationObjects works similarly, though there are some internal optimizations. These changes are almost entirely contained within Objects/stringobject.c and Include/stringobject.h. There is one major externally-visible change in this patch: PyStringObject.ob_sval is no longer a char[1] array, but a char *. Happily, this only requires a recompile, because the CPython source is *marvelously* consistent about using the macro PyString_AS_STRING(). (One hopes extension authors are as consistent.) I only had to touch two other files (Python/ceval.c and Objects/codeobject.c) and those were one-line changes. There is one remaining place that still needs fixing: the self-described "hack" in Mac/Modules/MacOS.c. Fixing that is beyond my pay grade. I changed the representation of ob_sval for two reasons: first, it is initially NULL for a string concatenation object, and second, because it may point to separately-allocated memory. That's where the speedup came from--it doesn't render the string until someone asks for the string's value. It is telling to see my new implementation of PyString_AS_STRING, as follows (casts and extra parentheses removed for legibility): #define PyString_AS_STRING(x) ( x->ob_sval ? x->ob_sval : PyString_AsString(x) ) This adds a layer of indirection for the string and a branch, adding a tiny (but measurable) slowdown to the general case. Again, because the changes to PyStringObject are hidden by this macro, external users of these objects don't notice the difference. The patch is posted, and I have donned the thickest skin I have handy. I look forward to your feedback. Cheers, /larry/

18 51

PEP: Adding data-type objects to Python
by Travis E. Oliphant 03 Nov '06

03 Nov '06

PEP: <unassigned> Title: Adding data-type objects to the standard library Version: $Revision: $ Last-Modified: $Date: $ Author: Travis Oliphant <oliphant(a)ee.byu.edu> Status: Draft Type: Standards Track Created: 05-Sep-2006 Python-Version: 2.6 Abstract This PEP proposes adapting the data-type objects from NumPy for inclusion in standard Python, to provide a consistent and standard way to discuss the format of binary data. Rationale There are many situations crossing multiple areas where an interpretation is needed of binary data in terms of fundamental data-types such as integers, floating-point, and complex floating-point values. Having a common object that carries information about binary data would be beneficial to many people. The creation of data-type objects in NumPy to carry the load of describing what each element of the array contains represents an evolution of a solution that began with the PyArray_Descr structure in Python's own array object. These data-type objects can represent arbitrary byte data. Currently such information is usually constructed using strings and character codes which is unwieldy when a data-type consists of nested structures. Proposal Add a PyDatatypeObject in Python (adapted from NumPy's dtype object which evolved from the PyArray_Descr structure in Python's array module) that holds information about a data-type. This object will allow packages to exchange information about binary data in a uniform way (see the extended buffer protocol PEP for an application to exchanging information about array data). Specification The datatype is an object that specifies how a certain block of memory should be interpreted as a basic data-type. In addition to being able to describe basic data-types, the data-type object can describe a data-type that is itself an array of other data-types as well as a data-type that contains arbitrary "fields" (structure members) which are located at specific offsets. In its most basic form, however, a data-type is of a particular kind (bit, bool, int, uint, float, complex, object, string, unicode, void) and size. Datatype objects can be created using either a type-object, a string, a tuple, a list, or a dictionary according to the following constructors: Type-object: For a select set of type-objects a data-type object describing that basic type can be described: Examples: >>> datatype(float) datatype('float64') >>> datatype(int) datatype('int32') # on 32-bit platform (64 if c-long is 64-bits) Tuple-object A tuple of length 2 can be used to specify a data-type that is an array of another kind of basic data-type (this array always describes a C-contiguous array). Examples: >>> datatype((int, 5)) datatype(('int32', (5,))) # describes a 5*4=20-byte block of memory laid out as # a[0], a[1], a[2], a[3], a[4] >>> datatype((float, (3,2)) datatype(('float64', (3,2)) # describes a 3*2*8=48 byte block of memory that should be # interpreted as 6 doubles laid out as arr[0,0], arr[0,1], # ... a[2,0], a[1,2] String-object: The basic format is '%s%s%s%d' % (endian, shape, kind, itemsize) kind : one of the basic array kinds given below. itemsize : the nubmer of bytes (or bits for 't' kind) for this data-type. endian : either '', '=' (native), '|' (doesn't matter), '>' (big-endian) or '<' (little-endian). shape : either '', or a shape-tuple describing a data-type that is an array of the given shape. A string can also be a comma-separated sequence of basic formats. The result will be a data-type with default field names: 'f0', 'f1', ..., 'fn'. Examples: >>> datatype('u4') datatype('uint32') >>> datatype('f4') datatype('float32') >>> datatype('(3,2)f4') datatype(('float32', (3,2)) >>> datatype('(5,)i4, (3,2)f4, S5') datatype([('f0', '<i4', (5,)), ('f1', '<f4', (3, 2)), ('f2', '|S5')]) List-object: A list should be a list of tuples where each tuple describes a field. Each tuple should contain (name, datatype{, shape}) or ((meta-info, name), datatype{, shape}) in order to specify the data-type. This list must fully specify the data-type (no memory holes). If would would like to return a data-type with memory holes where the compiler would place them, then pass the keyword align=1 to this construction. This will result in un-named fields of Void kind of the correct size interspersed where needed. Examples: datatype([( ([1,2],'coords'), 'f4', (3,6)), ('address', 'S30')]) A data-type that could represent the structure float coords[3*6] /* Has [1,2] associated with this field */ char address[30] datatype([( 'simple', 'i4'), ('nested', [('name', 'S30'), ('addr', 'S45'), ('amount', 'i4')])]) Can represent the memory layout of struct { int simple; struct nested { char name[30]; char addr[45]; int amount; } There is no formal limit to the nesting that is possible. datatype('i2, i4, i1, f8', align=1) datatype([('f0', '<i2'), ('', '|V2'), ('f1', '<i4'), ('f2', '|i1'), ('', '|V3'), ('f3', '<f8')]) # Notice the padding bytes placed in the structure to make sure # f1 and f8 are aligned correctly for the 32-bit system. Dictionary-object: Sometimes, you are only concerned about a few fields in a larger memory structure. The dictionary object allows specification of a data-type with fields using a dictionary with names as keys and tuples as values. The value tuples are (data-type, offset{, meta-info}). The offset is the offset in bytes (or bits when data-type is 't') from the beginning of the structure to the field data-type. Example: datatype({'f3' : ('f8', 12), 'f2': ('i1', 8)}) type([('', '|V8'), ('f2', '|i1'), ('', '|V3'), ('f3', '<f8')]) Attributes byteorder -- returns the byte-order of this data-type isnative -- returns True if this data-type is in correct byte-order for the platform. descr -- returns an description of this data-type as a list of tuples (name or (name, meta), datatype{, shape}) itemsize -- returns the total size of the data-type. kind -- returns the basic "kind" of the data-type. The basic kinds are: 't' - bit, 'b' - bool, 'i' - signed integer, 'u' - unsigned integer, 'f' - floating point, 'c' - complex floating point, 'S' - string (fixed-length sequence of char), 'U' - fixed length sequence of UCS4, 'O' - pointer to PyObject, 'V' - Void (anything else). names -- returns a list of names (keys to the fields dictionary) in offset-order. fields -- returns a read-only dictionary indicating the fields or None if this data-type has no fields. The dictionary is keyed by the field name and each entry contains a tuple of (data-type, offset{, meta-object}). The offset indicates the byte-offset (or bit-offset for 't') from the beginning of the data-type to the data-type indicated. hasobject -- returns True if this data-type is an "object" data-type or has "object" fields. name -- returns a 'name'-bitwidth description of data-type. base -- returns self unless this data-type is an array of some other data-type and then it returns that basic data-type. shape -- returns the shape of this data-type (for data-types that are arrays of other data-types) or () if there is no array. str -- returns the type-string of this data-type which is the basic kind followed by the number of bytes (or bits for 't') alignment -- returns alignment needed for this data-type on platform as determined by the compiler. Methods newbyteorder ({endian}) create a new data-type with byte-order changed in any and all fields (including deeply nested ones), to {endian}. If endian is not given, then swap all byte-orders. __len__(self) equivalent to len(self.field) __getitem__(self, name) get the field named [name]. Equivalent to self.field[name]. C-functions : These are function pointers attached in a C-structure connected with the data-type object that perform specific functions. setitem (PyObject *datatype, void *data, PyObject *obj) set a Python object into memory of this data-type at the given memory location. getitem (PyObject *datatype, void *data) get a Python object from memory of this data-type. Implementation A reference implementation (with more features than are proposed here) is available in NumPy and will be adapted if this PEP is accepted. Questions: There should probably be a limited C-API so that data-type objects can be returned and sent through the extended buffer protocol (see extended buffer protocol PEP). Should bit-fields be handled by re-interpreting the offsets as bit-values, use some other mechanism for handling the offset, or should they be unsupported? NumPy supports "string" and "unicode" data-types. The unicode data-type in NumPy always means UCS4 (but it is translated back-and forth to Python unicode scalars as needed for narrow builds). With Python 3.0 looming, we should probably support different encodings as data-types and drop the string type for a bytes type. Some help in understanding what to do here is appreciated. Copyright This PEP is placed in the public domain

20 77