[Python-checkins] r86995 - in python/branches/pep-0384: Doc/library/difflib.rst Doc/library/io.rst Lib/argparse.py Lib/difflib.py Lib/test/test_io.py Lib/test/test_unittest.py Misc/NEWS Modules/_io/bufferedio.c

Fri Dec 3 20:54:31 CET 2010

Author: martin.v.loewis
Date: Fri Dec  3 20:54:30 2010
New Revision: 86995

Log:
Merged revisions 86981,86983-86986,86993 via svnmerge from 
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r86981 | antoine.pitrou | 2010-12-03 19:41:39 +0100 (Fr, 03 Dez 2010) | 5 lines
  
  Issue #10478: Reentrant calls inside buffered IO objects (for example by
  way of a signal handler) now raise a RuntimeError instead of freezing the
  current process.
........
  r86983 | terry.reedy | 2010-12-03 19:57:42 +0100 (Fr, 03 Dez 2010) | 1 line
........
  r86984 | antoine.pitrou | 2010-12-03 20:14:17 +0100 (Fr, 03 Dez 2010) | 3 lines
  
  Add an "advanced topics" section to the io doc.
........
  r86985 | eric.araujo | 2010-12-03 20:19:17 +0100 (Fr, 03 Dez 2010) | 5 lines
  
  Fix incorrect use of gettext in argparse (#10497).
  
  Steven, the maintainer of argparse, agreed to have this committed
  without tests for now, since the fix is obvious.  See the bug log.
........
  r86986 | michael.foord | 2010-12-03 20:20:44 +0100 (Fr, 03 Dez 2010) | 1 line
  
  Fix so that test.test_unittest can be executed by unittest and not just regrtest
........
  r86993 | eric.araujo | 2010-12-03 20:41:00 +0100 (Fr, 03 Dez 2010) | 7 lines
  
  Allow translators to reorder placeholders in localizable messages from
  argparse (#10528).
  
  There is no unit test; I checked with xgettext that no more warnings
  were emitted.  Steven approved the change.
........


Modified:
   python/branches/pep-0384/   (props changed)
   python/branches/pep-0384/Doc/library/difflib.rst
   python/branches/pep-0384/Doc/library/io.rst
   python/branches/pep-0384/Lib/argparse.py
   python/branches/pep-0384/Lib/difflib.py
   python/branches/pep-0384/Lib/test/test_io.py
   python/branches/pep-0384/Lib/test/test_unittest.py
   python/branches/pep-0384/Misc/NEWS
   python/branches/pep-0384/Modules/_io/bufferedio.c

Modified: python/branches/pep-0384/Doc/library/difflib.rst
==============================================================================

--- python/branches/pep-0384/Doc/library/difflib.rst	(original)
+++ python/branches/pep-0384/Doc/library/difflib.rst	Fri Dec  3 20:54:30 2010
@@ -358,6 +358,16 @@
    .. versionadded:: 3.2
       The *autojunk* parameter.
 
+   SequenceMatcher objects get three data attributes: *bjunk* is the
+   set of elements of b for which *isjunk* is True; *bpopular* is the set of non-
+   junk elements considered popular by the heuristic (if it is not disabled);
+   *b2j* is a dict mapping the remaining elements of b to a list of positions where
+   they occur. All three are reset whenever *b* is reset with :meth:`set_seqs`
+   or :meth:`set_seq2`.
+
+.. versionadded:: 3.2
+      The *bjunk* and *bpopular* attributes.
+
    :class:`SequenceMatcher` objects have the following methods:
 
 
@@ -538,7 +548,7 @@
 SequenceMatcher Examples
 ------------------------
 
-This example compares two strings, considering blanks to be "junk:"
+This example compares two strings, considering blanks to be "junk":
 
    >>> s = SequenceMatcher(lambda x: x == " ",
    ...                     "private Thread currentThread;",

Modified: python/branches/pep-0384/Doc/library/io.rst
==============================================================================
--- python/branches/pep-0384/Doc/library/io.rst	(original)
+++ python/branches/pep-0384/Doc/library/io.rst	Fri Dec  3 20:54:30 2010
@@ -54,12 +54,6 @@
 The text stream API is described in detail in the documentation for the
 :class:`TextIOBase`.
 
-.. note::
-
-   Text I/O over a binary storage (such as a file) is significantly slower than
-   binary I/O over the same storage.  This can become noticeable if you handle
-   huge amounts of text data (for example very large log files).
-
 
 Binary I/O
 ^^^^^^^^^^
@@ -506,8 +500,8 @@
 Buffered Streams
 ^^^^^^^^^^^^^^^^
 
-In many situations, buffered I/O streams will provide higher performance
-(bandwidth and latency) than raw I/O streams.  Their API is also more usable.
+Buffered I/O streams provide a higher-level interface to an I/O device
+than raw I/O does.
 
 .. class:: BytesIO([initial_bytes])
 
@@ -784,14 +778,72 @@
       # .getvalue() will now raise an exception.
       output.close()
 
-   .. note::
-
-      :class:`StringIO` uses a native text storage and doesn't suffer from the
-      performance issues of other text streams, such as those based on
-      :class:`TextIOWrapper`.
 
 .. class:: IncrementalNewlineDecoder
 
    A helper codec that decodes newlines for universal newlines mode.  It
    inherits :class:`codecs.IncrementalDecoder`.
 
+
+Advanced topics
+---------------
+
+Here we will discuss several advanced topics pertaining to the concrete
+I/O implementations described above.
+
+Performance
+^^^^^^^^^^^
+
+Binary I/O
+""""""""""
+
+By reading and writing only large chunks of data even when the user asks
+for a single byte, buffered I/O is designed to hide any inefficiency in
+calling and executing the operating system's unbuffered I/O routines.  The
+gain will vary very much depending on the OS and the kind of I/O which is
+performed (for example, on some contemporary OSes such as Linux, unbuffered
+disk I/O can be as fast as buffered I/O).  The bottom line, however, is
+that buffered I/O will offer you predictable performance regardless of the
+platform and the backing device.  Therefore, it is most always preferable to
+use buffered I/O rather than unbuffered I/O.
+
+Text I/O
+""""""""
+
+Text I/O over a binary storage (such as a file) is significantly slower than
+binary I/O over the same storage, because it implies conversions from
+unicode to binary data using a character codec.  This can become noticeable
+if you handle huge amounts of text data (for example very large log files).
+
+:class:`StringIO`, however, is a native in-memory unicode container and will
+exhibit similar speed to :class:`BytesIO`.
+
+Multi-threading
+^^^^^^^^^^^^^^^
+
+:class:`FileIO` objects are thread-safe to the extent that the operating
+system calls (such as ``read(2)`` under Unix) they are wrapping are thread-safe
+too.
+
+Binary buffered objects (instances of :class:`BufferedReader`,
+:class:`BufferedWriter`, :class:`BufferedRandom` and :class:`BufferedRWPair`)
+protect their internal structures using a lock; it is therefore safe to call
+them from multiple threads at once.
+
+:class:`TextIOWrapper` objects are not thread-safe.
+
+Reentrancy
+^^^^^^^^^^
+
+Binary buffered objects (instances of :class:`BufferedReader`,
+:class:`BufferedWriter`, :class:`BufferedRandom` and :class:`BufferedRWPair`)
+are not reentrant.  While reentrant calls will not happen in normal situations,
+they can arise if you are doing I/O in a :mod:`signal` handler.  If it is
+attempted to enter a buffered object again while already being accessed
+*from the same thread*, then a :exc:`RuntimeError` is raised.
+
+The above implicitly extends to text files, since the :func:`open()`
+function will wrap a buffered object inside a :class:`TextIOWrapper`.  This
+includes standard streams and therefore affects the built-in function
+:func:`print()` as well.
+

Modified: python/branches/pep-0384/Lib/argparse.py
==============================================================================
--- python/branches/pep-0384/Lib/argparse.py	(original)
+++ python/branches/pep-0384/Lib/argparse.py	Fri Dec  3 20:54:30 2010
@@ -1079,8 +1079,9 @@
         try:
             parser = self._name_parser_map[parser_name]
         except KeyError:
-            tup = parser_name, ', '.join(self._name_parser_map)
-            msg = _('unknown parser %r (choices: %s)' % tup)
+            args = {'parser_name': parser_name,
+                    'choices': ', '.join(self._name_parser_map)}
+            msg = _('unknown parser %(parser_name)r (choices: %(choices)s)') % args
             raise ArgumentError(self, msg)
 
         # parse all the remaining options into the namespace
@@ -1121,7 +1122,7 @@
             elif 'w' in self._mode:
                 return _sys.stdout
             else:
-                msg = _('argument "-" with mode %r' % self._mode)
+                msg = _('argument "-" with mode %r') % self._mode
                 raise ValueError(msg)
 
         # all other arguments are used as file names
@@ -1380,10 +1381,11 @@
         for option_string in args:
             # error on strings that don't start with an appropriate prefix
             if not option_string[0] in self.prefix_chars:
-                msg = _('invalid option string %r: '
-                        'must start with a character %r')
-                tup = option_string, self.prefix_chars
-                raise ValueError(msg % tup)
+                args = {'option': option_string,
+                        'prefix_chars': self.prefix_chars}
+                msg = _('invalid option string %(option)r: '
+                        'must start with a character %(prefix_chars)r')
+                raise ValueError(msg % args)
 
             # strings starting with two prefix characters are long options
             option_strings.append(option_string)
@@ -2049,8 +2051,9 @@
         if len(option_tuples) > 1:
             options = ', '.join([option_string
                 for action, option_string, explicit_arg in option_tuples])
-            tup = arg_string, options
-            self.error(_('ambiguous option: %s could match %s') % tup)
+            args = {'option': arg_string, 'matches': options}
+            msg = _('ambiguous option: %(option)s could match %(matches)s')
+            self.error(msg % args)
 
         # if exactly one action matched, this segmentation is good,
         # so return the parsed action
@@ -2229,8 +2232,9 @@
         # TypeErrors or ValueErrors also indicate errors
         except (TypeError, ValueError):
             name = getattr(action.type, '__name__', repr(action.type))
-            msg = _('invalid %s value: %r')
-            raise ArgumentError(action, msg % (name, arg_string))
+            args = {'type': name, 'value': arg_string}
+            msg = _('invalid %(type)s value: %(value)r')
+            raise ArgumentError(action, msg % args)
 
         # return the converted value
         return result
@@ -2238,9 +2242,10 @@
     def _check_value(self, action, value):
         # converted value must be one of the choices (if specified)
         if action.choices is not None and value not in action.choices:
-            tup = value, ', '.join(map(repr, action.choices))
-            msg = _('invalid choice: %r (choose from %s)') % tup
-            raise ArgumentError(action, msg)
+            args = {'value': value,
+                    'choices': ', '.join(map(repr, action.choices))}
+            msg = _('invalid choice: %(value)r (choose from %(choices)s)')
+            raise ArgumentError(action, msg % args)
 
     # =======================
     # Help-formatting methods
@@ -2332,4 +2337,5 @@
         should either exit or raise an exception.
         """
         self.print_usage(_sys.stderr)
-        self.exit(2, _('%s: error: %s\n') % (self.prog, message))
+        args = {'prog': self.prog, 'message': message}
+        self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

Modified: python/branches/pep-0384/Lib/difflib.py
==============================================================================
--- python/branches/pep-0384/Lib/difflib.py	(original)
+++ python/branches/pep-0384/Lib/difflib.py	Fri Dec  3 20:54:30 2010
@@ -213,6 +213,10 @@
         #      (at least 200 elements) and x accounts for more than 1 + 1% of
         #      its elements (when autojunk is enabled).
         #      DOES NOT WORK for x in a!
+        # bjunk
+        #      the items in b for which isjunk is True.
+        # bpopular
+        #      nonjunk items in b treated as junk by the heuristic (if used).
 
         self.isjunk = isjunk
         self.a = self.b = None
@@ -321,7 +325,7 @@
             indices.append(i)
 
         # Purge junk elements
-        junk = set()
+        self.bjunk = junk = set()
         isjunk = self.isjunk
         if isjunk:
             for elt in list(b2j.keys()):  # using list() since b2j is modified
@@ -330,7 +334,7 @@
                     del b2j[elt]
 
         # Purge popular elements that are not junk
-        popular = set()
+        self.bpopular = popular = set()
         n = len(b)
         if self.autojunk and n >= 200:
             ntest = n // 100 + 1

Modified: python/branches/pep-0384/Lib/test/test_io.py
==============================================================================
--- python/branches/pep-0384/Lib/test/test_io.py	(original)
+++ python/branches/pep-0384/Lib/test/test_io.py	Fri Dec  3 20:54:30 2010
@@ -2653,12 +2653,50 @@
     def test_interrupted_write_text(self):
         self.check_interrupted_write("xy", b"xy", mode="w", encoding="ascii")
 
+    def check_reentrant_write(self, data, **fdopen_kwargs):
+        def on_alarm(*args):
+            # Will be called reentrantly from the same thread
+            wio.write(data)
+            1/0
+        signal.signal(signal.SIGALRM, on_alarm)
+        r, w = os.pipe()
+        wio = self.io.open(w, **fdopen_kwargs)
+        try:
+            signal.alarm(1)
+            # Either the reentrant call to wio.write() fails with RuntimeError,
+            # or the signal handler raises ZeroDivisionError.
+            with self.assertRaises((ZeroDivisionError, RuntimeError)) as cm:
+                while 1:
+                    for i in range(100):
+                        wio.write(data)
+                        wio.flush()
+                    # Make sure the buffer doesn't fill up and block further writes
+                    os.read(r, len(data) * 100)
+            exc = cm.exception
+            if isinstance(exc, RuntimeError):
+                self.assertTrue(str(exc).startswith("reentrant call"), str(exc))
+        finally:
+            wio.close()
+            os.close(r)
+
+    def test_reentrant_write_buffered(self):
+        self.check_reentrant_write(b"xy", mode="wb")
+
+    def test_reentrant_write_text(self):
+        self.check_reentrant_write("xy", mode="w", encoding="ascii")
+
+
 class CSignalsTest(SignalsTest):
     io = io
 
 class PySignalsTest(SignalsTest):
     io = pyio
 
+    # Handling reentrancy issues would slow down _pyio even more, so the
+    # tests are disabled.
+    test_reentrant_write_buffered = None
+    test_reentrant_write_text = None
+
 
 def test_main():
     tests = (CIOTest, PyIOTest,

Modified: python/branches/pep-0384/Lib/test/test_unittest.py
==============================================================================
--- python/branches/pep-0384/Lib/test/test_unittest.py	(original)
+++ python/branches/pep-0384/Lib/test/test_unittest.py	Fri Dec  3 20:54:30 2010
@@ -4,8 +4,13 @@
 
 
 def test_main():
+    # used by regrtest
     support.run_unittest(unittest.test.suite())
     support.reap_children()
 
+def load_tests(*_):
+    # used by unittest
+    return unittest.test.suite()
+
 if __name__ == "__main__":
     test_main()

Modified: python/branches/pep-0384/Misc/NEWS
==============================================================================
--- python/branches/pep-0384/Misc/NEWS	(original)
+++ python/branches/pep-0384/Misc/NEWS	Fri Dec  3 20:54:30 2010
@@ -35,6 +35,15 @@
 Library
 -------
 
+- Issue #10528: Allow translators to reorder placeholders in localizable
+  messages from argparse.
+
+- Issue #10497: Fix incorrect use of gettext in argparse.
+
+- Issue #10478: Reentrant calls inside buffered IO objects (for example by
+  way of a signal handler) now raise a RuntimeError instead of freezing the
+  current process.
+
 - logging: Added getLogRecordFactory/setLogRecordFactory with docs and tests.
 
 - Issue #10549: Fix pydoc traceback when text-documenting certain classes.

Modified: python/branches/pep-0384/Modules/_io/bufferedio.c
==============================================================================
--- python/branches/pep-0384/Modules/_io/bufferedio.c	(original)
+++ python/branches/pep-0384/Modules/_io/bufferedio.c	Fri Dec  3 20:54:30 2010
@@ -225,6 +225,7 @@
 
 #ifdef WITH_THREAD
     PyThread_type_lock lock;
+    volatile long owner;
 #endif
 
     Py_ssize_t buffer_size;
@@ -260,17 +261,34 @@
 /* These macros protect the buffered object against concurrent operations. */
 
 #ifdef WITH_THREAD
-#define ENTER_BUFFERED(self) \
-    if (!PyThread_acquire_lock(self->lock, 0)) { \
-        Py_BEGIN_ALLOW_THREADS \
-        PyThread_acquire_lock(self->lock, 1); \
-        Py_END_ALLOW_THREADS \
+
+static int
+_enter_buffered_busy(buffered *self)
+{
+    if (self->owner == PyThread_get_thread_ident()) {
+        PyErr_Format(PyExc_RuntimeError,
+                     "reentrant call inside %R", self);
+        return 0;
     }
+    Py_BEGIN_ALLOW_THREADS
+    PyThread_acquire_lock(self->lock, 1);
+    Py_END_ALLOW_THREADS
+    return 1;
+}
+
+#define ENTER_BUFFERED(self) \
+    ( (PyThread_acquire_lock(self->lock, 0) ? \
+       1 : _enter_buffered_busy(self)) \
+     && (self->owner = PyThread_get_thread_ident(), 1) )
 
 #define LEAVE_BUFFERED(self) \
-    PyThread_release_lock(self->lock);
+    do { \
+        self->owner = 0; \
+        PyThread_release_lock(self->lock); \
+    } while(0);
+
 #else
-#define ENTER_BUFFERED(self)
+#define ENTER_BUFFERED(self) 1
 #define LEAVE_BUFFERED(self)
 #endif
 
@@ -444,7 +462,8 @@
     int r;
 
     CHECK_INITIALIZED(self)
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        return NULL;
 
     r = buffered_closed(self);
     if (r < 0)
@@ -465,7 +484,8 @@
     /* flush() will most probably re-take the lock, so drop it first */
     LEAVE_BUFFERED(self)
     res = PyObject_CallMethodObjArgs((PyObject *)self, _PyIO_str_flush, NULL);
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        return NULL;
     if (res == NULL) {
         goto end;
     }
@@ -679,6 +699,7 @@
         PyErr_SetString(PyExc_RuntimeError, "can't allocate read lock");
         return -1;
     }
+    self->owner = 0;
 #endif
     /* Find out whether buffer_size is a power of 2 */
     /* XXX is this optimization useful? */
@@ -705,7 +726,8 @@
     CHECK_INITIALIZED(self)
     CHECK_CLOSED(self, "flush of closed file")
 
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        return NULL;
     res = _bufferedwriter_flush_unlocked(self, 0);
     if (res != NULL && self->readable) {
         /* Rewind the raw stream so that its position corresponds to
@@ -732,7 +754,8 @@
         return NULL;
     }
 
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        return NULL;
 
     if (self->writable) {
         res = _bufferedwriter_flush_unlocked(self, 1);
@@ -767,7 +790,8 @@
 
     if (n == -1) {
         /* The number of bytes is unspecified, read until the end of stream */
-        ENTER_BUFFERED(self)
+        if (!ENTER_BUFFERED(self))
+            return NULL;
         res = _bufferedreader_read_all(self);
         LEAVE_BUFFERED(self)
     }
@@ -775,7 +799,8 @@
         res = _bufferedreader_read_fast(self, n);
         if (res == Py_None) {
             Py_DECREF(res);
-            ENTER_BUFFERED(self)
+            if (!ENTER_BUFFERED(self))
+                return NULL;
             res = _bufferedreader_read_generic(self, n);
             LEAVE_BUFFERED(self)
         }
@@ -803,7 +828,8 @@
     if (n == 0)
         return PyBytes_FromStringAndSize(NULL, 0);
 
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        return NULL;
     
     if (self->writable) {
         res = _bufferedwriter_flush_unlocked(self, 1);
@@ -859,7 +885,8 @@
     
     /* TODO: use raw.readinto() instead! */
     if (self->writable) {
-        ENTER_BUFFERED(self)
+        if (!ENTER_BUFFERED(self))
+            return NULL;
         res = _bufferedwriter_flush_unlocked(self, 0);
         LEAVE_BUFFERED(self)
         if (res == NULL)
@@ -903,7 +930,8 @@
         goto end_unlocked;
     }
 
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        goto end_unlocked;
 
     /* Now we try to get some more from the raw stream */
     if (self->writable) {
@@ -1053,7 +1081,8 @@
         }
     }
 
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        return NULL;
 
     /* Fallback: invoke raw seek() method and clear buffer */
     if (self->writable) {
@@ -1091,7 +1120,8 @@
         return NULL;
     }
 
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self))
+        return NULL;
 
     if (self->writable) {
         res = _bufferedwriter_flush_unlocked(self, 0);
@@ -1748,7 +1778,10 @@
         return NULL;
     }
 
-    ENTER_BUFFERED(self)
+    if (!ENTER_BUFFERED(self)) {
+        PyBuffer_Release(&buf);
+        return NULL;
+    }
 
     /* Fast path: the data to write can be fully buffered. */
     if (!VALID_READ_BUFFER(self) && !VALID_WRITE_BUFFER(self)) {