[Python-checkins] cpython: Issue #12319: Always send file request bodies using chunked encoding

martin.panter python-checkins at python.org
Fri Aug 26 22:42:38 EDT 2016


https://hg.python.org/cpython/rev/216f5451c35a
changeset:   102923:216f5451c35a
user:        Martin Panter <vadmium+py at gmail.com>
date:        Sat Aug 27 01:39:26 2016 +0000
summary:
  Issue #12319: Always send file request bodies using chunked encoding

The previous attempt to determine the file’s Content-Length gave a false
positive for pipes on Windows.

Also, drop the special case for sending zero-length iterable bodies.

files:
  Doc/library/http.client.rst    |  31 +++++------
  Doc/library/urllib.request.rst |  15 ++---
  Doc/whatsnew/3.6.rst           |  11 +++-
  Lib/http/client.py             |  31 +++---------
  Lib/test/test_httplib.py       |  27 +++++++---
  Lib/test/test_urllib2.py       |  56 +++++++++++++--------
  Misc/NEWS                      |   7 +-
  7 files changed, 96 insertions(+), 82 deletions(-)


diff --git a/Doc/library/http.client.rst b/Doc/library/http.client.rst
--- a/Doc/library/http.client.rst
+++ b/Doc/library/http.client.rst
@@ -240,17 +240,17 @@
    The *headers* argument should be a mapping of extra HTTP headers to send
    with the request.
 
-   If *headers* contains neither Content-Length nor Transfer-Encoding, a
-   Content-Length header will be added automatically if possible.  If
+   If *headers* contains neither Content-Length nor Transfer-Encoding,
+   but there is a request body, one of those
+   header fields will be added automatically.  If
    *body* is ``None``, the Content-Length header is set to ``0`` for
    methods that expect a body (``PUT``, ``POST``, and ``PATCH``).  If
-   *body* is a string or bytes-like object, the Content-Length header is
-   set to its length.  If *body* is a binary :term:`file object`
-   supporting :meth:`~io.IOBase.seek`, this will be used to determine
-   its size.  Otherwise, the Content-Length header is not added
-   automatically.  In cases where determining the Content-Length up
-   front is not possible, the body will be chunk-encoded and the
-   Transfer-Encoding header will automatically be set.
+   *body* is a string or a bytes-like object that is not also a
+   :term:`file <file object>`, the Content-Length header is
+   set to its length.  Any other type of *body* (files
+   and iterables in general) will be chunk-encoded, and the
+   Transfer-Encoding header will automatically be set instead of
+   Content-Length.
 
    The *encode_chunked* argument is only relevant if Transfer-Encoding is
    specified in *headers*.  If *encode_chunked* is ``False``, the
@@ -260,19 +260,18 @@
    .. note::
       Chunked transfer encoding has been added to the HTTP protocol
       version 1.1.  Unless the HTTP server is known to handle HTTP 1.1,
-      the caller must either specify the Content-Length or must use a
-      body representation whose length can be determined automatically.
+      the caller must either specify the Content-Length, or must pass a
+      :class:`str` or bytes-like object that is not also a file as the
+      body representation.
 
    .. versionadded:: 3.2
       *body* can now be an iterable.
 
    .. versionchanged:: 3.6
       If neither Content-Length nor Transfer-Encoding are set in
-      *headers* and Content-Length cannot be determined, *body* will now
-      be automatically chunk-encoded.  The *encode_chunked* argument
-      was added.
-      The Content-Length for binary file objects is determined with seek.
-      No attempt is made to determine the Content-Length for text file
+      *headers*, file and iterable *body* objects are now chunk-encoded.
+      The *encode_chunked* argument was added.
+      No attempt is made to determine the Content-Length for file
       objects.
 
 .. method:: HTTPConnection.getresponse()
diff --git a/Doc/library/urllib.request.rst b/Doc/library/urllib.request.rst
--- a/Doc/library/urllib.request.rst
+++ b/Doc/library/urllib.request.rst
@@ -187,12 +187,11 @@
    server, or ``None`` if no such data is needed.  Currently HTTP
    requests are the only ones that use *data*.  The supported object
    types include bytes, file-like objects, and iterables.  If no
-   ``Content-Length`` header has been provided, :class:`HTTPHandler` will
-   try to determine the length of *data* and set this header accordingly.
-   If this fails, ``Transfer-Encoding: chunked`` as specified in
-   :rfc:`7230`, Section 3.3.1 will be used to send the data.  See
-   :meth:`http.client.HTTPConnection.request` for details on the
-   supported object types and on how the content length is determined.
+   ``Content-Length`` nor ``Transfer-Encoding`` header field
+   has been provided, :class:`HTTPHandler` will set these headers according
+   to the type of *data*.  ``Content-Length`` will be used to send
+   bytes objects, while ``Transfer-Encoding: chunked`` as specified in
+   :rfc:`7230`, Section 3.3.1 will be used to send files and other iterables.
 
    For an HTTP POST request method, *data* should be a buffer in the
    standard :mimetype:`application/x-www-form-urlencoded` format.  The
@@ -256,8 +255,8 @@
 
    .. versionchanged:: 3.6
       Do not raise an error if the ``Content-Length`` has not been
-      provided and could not be determined.  Fall back to use chunked
-      transfer encoding instead.
+      provided and *data* is neither ``None`` nor a bytes object.
+      Fall back to use chunked transfer encoding instead.
 
 .. class:: OpenerDirector()
 
diff --git a/Doc/whatsnew/3.6.rst b/Doc/whatsnew/3.6.rst
--- a/Doc/whatsnew/3.6.rst
+++ b/Doc/whatsnew/3.6.rst
@@ -579,8 +579,8 @@
 urllib.request
 --------------
 
-If a HTTP request has a non-empty body but no Content-Length header
-and the content length cannot be determined up front, rather than
+If a HTTP request has a file or iterable body (other than a
+bytes object) but no Content-Length header, rather than
 throwing an error, :class:`~urllib.request.AbstractHTTPHandler` now
 falls back to use chunked transfer encoding.
 (Contributed by Demian Brecht and Rolf Krahl in :issue:`12319`.)
@@ -935,6 +935,13 @@
   This behavior has also been backported to earlier Python versions
   by Setuptools 26.0.0.
 
+* In the :mod:`urllib.request` module and the
+  :meth:`http.client.HTTPConnection.request` method, if no Content-Length
+  header field has been specified and the request body is a file object,
+  it is now sent with HTTP 1.1 chunked encoding. If a file object has to
+  be sent to a HTTP 1.0 server, the Content-Length value now has to be
+  specified by the caller. See :issue:`12319`.
+
 Changes in the C API
 --------------------
 
diff --git a/Lib/http/client.py b/Lib/http/client.py
--- a/Lib/http/client.py
+++ b/Lib/http/client.py
@@ -805,35 +805,21 @@
     def _get_content_length(body, method):
         """Get the content-length based on the body.
 
-        If the body is "empty", we set Content-Length: 0 for methods
-        that expect a body (RFC 7230, Section 3.3.2). If the body is
-        set for other methods, we set the header provided we can
-        figure out what the length is.
+        If the body is None, we set Content-Length: 0 for methods that expect
+        a body (RFC 7230, Section 3.3.2). We also set the Content-Length for
+        any method if the body is a str or bytes-like object and not a file.
         """
-        if not body:
+        if body is None:
             # do an explicit check for not None here to distinguish
             # between unset and set but empty
-            if method.upper() in _METHODS_EXPECTING_BODY or body is not None:
+            if method.upper() in _METHODS_EXPECTING_BODY:
                 return 0
             else:
                 return None
 
         if hasattr(body, 'read'):
             # file-like object.
-            if HTTPConnection._is_textIO(body):
-                # text streams are unpredictable because it depends on
-                # character encoding and line ending translation.
-                return None
-            else:
-                # Is it seekable?
-                try:
-                    curpos = body.tell()
-                    sz = body.seek(0, io.SEEK_END)
-                except (TypeError, AttributeError, OSError):
-                    return None
-                else:
-                    body.seek(curpos)
-                    return sz - curpos
+            return None
 
         try:
             # does it implement the buffer protocol (bytes, bytearray, array)?
@@ -1266,8 +1252,7 @@
         # the caller passes encode_chunked=True or the following
         # conditions hold:
         # 1. content-length has not been explicitly set
-        # 2. the length of the body cannot be determined
-        #    (e.g. it is a generator or unseekable file)
+        # 2. the body is a file or iterable, but not a str or bytes-like
         # 3. Transfer-Encoding has NOT been explicitly set by the caller
 
         if 'content-length' not in header_names:
@@ -1280,7 +1265,7 @@
                 encode_chunked = False
                 content_length = self._get_content_length(body, method)
                 if content_length is None:
-                    if body:
+                    if body is not None:
                         if self.debuglevel > 0:
                             print('Unable to determine size of %r' % body)
                         encode_chunked = True
diff --git a/Lib/test/test_httplib.py b/Lib/test/test_httplib.py
--- a/Lib/test/test_httplib.py
+++ b/Lib/test/test_httplib.py
@@ -381,6 +381,16 @@
             # same request
             self.assertNotIn('content-length', [k.lower() for k in headers])
 
+    def test_empty_body(self):
+        # Zero-length iterable should be treated like any other iterable
+        conn = client.HTTPConnection('example.com')
+        conn.sock = FakeSocket(b'')
+        conn.request('POST', '/', ())
+        _, headers, body = self._parse_request(conn.sock.data)
+        self.assertEqual(headers['Transfer-Encoding'], 'chunked')
+        self.assertNotIn('content-length', [k.lower() for k in headers])
+        self.assertEqual(body, b"0\r\n\r\n")
+
     def _make_body(self, empty_lines=False):
         lines = self.expected_body.split(b' ')
         for idx, line in enumerate(lines):
@@ -652,7 +662,9 @@
 
     def test_send_file(self):
         expected = (b'GET /foo HTTP/1.1\r\nHost: example.com\r\n'
-                    b'Accept-Encoding: identity\r\nContent-Length:')
+                    b'Accept-Encoding: identity\r\n'
+                    b'Transfer-Encoding: chunked\r\n'
+                    b'\r\n')
 
         with open(__file__, 'rb') as body:
             conn = client.HTTPConnection('example.com')
@@ -1717,7 +1729,7 @@
         self.assertEqual("5", message.get("content-length"))
         self.assertEqual(b'body\xc1', f.read())
 
-    def test_file_body(self):
+    def test_text_file_body(self):
         self.addCleanup(support.unlink, support.TESTFN)
         with open(support.TESTFN, "w") as f:
             f.write("body")
@@ -1726,10 +1738,8 @@
             message, f = self.get_headers_and_fp()
             self.assertEqual("text/plain", message.get_content_type())
             self.assertIsNone(message.get_charset())
-            # Note that the length of text files is unpredictable
-            # because it depends on character encoding and line ending
-            # translation.  No content-length will be set, the body
-            # will be sent using chunked transfer encoding.
+            # No content-length will be determined for files; the body
+            # will be sent using chunked transfer encoding instead.
             self.assertIsNone(message.get("content-length"))
             self.assertEqual("chunked", message.get("transfer-encoding"))
             self.assertEqual(b'4\r\nbody\r\n0\r\n\r\n', f.read())
@@ -1743,8 +1753,9 @@
             message, f = self.get_headers_and_fp()
             self.assertEqual("text/plain", message.get_content_type())
             self.assertIsNone(message.get_charset())
-            self.assertEqual("5", message.get("content-length"))
-            self.assertEqual(b'body\xc1', f.read())
+            self.assertEqual("chunked", message.get("Transfer-Encoding"))
+            self.assertNotIn("Content-Length", message)
+            self.assertEqual(b'5\r\nbody\xc1\r\n0\r\n\r\n', f.read())
 
 
 class HTTPResponseTest(TestCase):
diff --git a/Lib/test/test_urllib2.py b/Lib/test/test_urllib2.py
--- a/Lib/test/test_urllib2.py
+++ b/Lib/test/test_urllib2.py
@@ -913,40 +913,50 @@
             self.assertEqual(req.unredirected_hdrs["Spam"], "foo")
 
     def test_http_body_file(self):
-        # A regular file - Content Length is calculated unless already set.
+        # A regular file - chunked encoding is used unless Content Length is
+        # already set.
 
         h = urllib.request.AbstractHTTPHandler()
         o = h.parent = MockOpener()
 
         file_obj = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
         file_path = file_obj.name
-        file_obj.write(b"Something\nSomething\nSomething\n")
         file_obj.close()
+        self.addCleanup(os.unlink, file_path)
 
-        for headers in {}, {"Content-Length": 30}:
-            with open(file_path, "rb") as f:
-                req = Request("http://example.com/", f, headers)
-                newreq = h.do_request_(req)
-                self.assertEqual(int(newreq.get_header('Content-length')), 30)
+        with open(file_path, "rb") as f:
+            req = Request("http://example.com/", f, {})
+            newreq = h.do_request_(req)
+            te = newreq.get_header('Transfer-encoding')
+            self.assertEqual(te, "chunked")
+            self.assertFalse(newreq.has_header('Content-length'))
 
-        os.unlink(file_path)
+        with open(file_path, "rb") as f:
+            req = Request("http://example.com/", f, {"Content-Length": 30})
+            newreq = h.do_request_(req)
+            self.assertEqual(int(newreq.get_header('Content-length')), 30)
+            self.assertFalse(newreq.has_header("Transfer-encoding"))
 
     def test_http_body_fileobj(self):
-        # A file object - Content Length is calculated unless already set.
+        # A file object - chunked encoding is used
+        # unless Content Length is already set.
         # (Note that there are some subtle differences to a regular
         # file, that is why we are testing both cases.)
 
         h = urllib.request.AbstractHTTPHandler()
         o = h.parent = MockOpener()
+        file_obj = io.BytesIO()
 
-        file_obj = io.BytesIO()
-        file_obj.write(b"Something\nSomething\nSomething\n")
+        req = Request("http://example.com/", file_obj, {})
+        newreq = h.do_request_(req)
+        self.assertEqual(newreq.get_header('Transfer-encoding'), 'chunked')
+        self.assertFalse(newreq.has_header('Content-length'))
 
-        for headers in {}, {"Content-Length": 30}:
-            file_obj.seek(0)
-            req = Request("http://example.com/", file_obj, headers)
-            newreq = h.do_request_(req)
-            self.assertEqual(int(newreq.get_header('Content-length')), 30)
+        headers = {"Content-Length": 30}
+        req = Request("http://example.com/", file_obj, headers)
+        newreq = h.do_request_(req)
+        self.assertEqual(int(newreq.get_header('Content-length')), 30)
+        self.assertFalse(newreq.has_header("Transfer-encoding"))
 
         file_obj.close()
 
@@ -959,9 +969,7 @@
         h = urllib.request.AbstractHTTPHandler()
         o = h.parent = MockOpener()
 
-        cmd = [sys.executable, "-c",
-               r"import sys; "
-               r"sys.stdout.buffer.write(b'Something\nSomething\nSomething\n')"]
+        cmd = [sys.executable, "-c", r"pass"]
         for headers in {}, {"Content-Length": 30}:
             with subprocess.Popen(cmd, stdout=subprocess.PIPE) as proc:
                 req = Request("http://example.com/", proc.stdout, headers)
@@ -983,8 +991,6 @@
 
         def iterable_body():
             yield b"one"
-            yield b"two"
-            yield b"three"
 
         for headers in {}, {"Content-Length": 11}:
             req = Request("http://example.com/", iterable_body(), headers)
@@ -996,6 +1002,14 @@
             else:
                 self.assertEqual(int(newreq.get_header('Content-length')), 11)
 
+    def test_http_body_empty_seq(self):
+        # Zero-length iterable body should be treated like any other iterable
+        h = urllib.request.AbstractHTTPHandler()
+        h.parent = MockOpener()
+        req = h.do_request_(Request("http://example.com/", ()))
+        self.assertEqual(req.get_header("Transfer-encoding"), "chunked")
+        self.assertFalse(req.has_header("Content-length"))
+
     def test_http_body_array(self):
         # array.array Iterable - Content Length is calculated
 
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -52,10 +52,9 @@
 - Issue #12319: Chunked transfer encoding support added to
   http.client.HTTPConnection requests.  The
   urllib.request.AbstractHTTPHandler class does not enforce a Content-Length
-  header any more.  If a HTTP request has a non-empty body, but no
-  Content-Length header, and the content length cannot be determined
-  up front, rather than throwing an error, the library now falls back
-  to use chunked transfer encoding.
+  header any more.  If a HTTP request has a file or iterable body, but no
+  Content-Length header, the library now falls back to use chunked transfer-
+  encoding.
 
 - A new version of typing.py from https://github.com/python/typing:
   - Collection (only for 3.6) (Issue #27598)

-- 
Repository URL: https://hg.python.org/cpython


More information about the Python-checkins mailing list