[Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py

senthil.kumaran python-checkins at python.org
Thu Nov 25 09:18:20 CET 2010


Author: senthil.kumaran
Date: Thu Nov 25 09:18:20 2010
New Revision: 86748

Log:
Experimental - Transparent gzip Encoding in urllib2. There should be a good way to deal with Content-Length.



Modified:
   python/branches/py3k-urllib/Lib/http/client.py
   python/branches/py3k-urllib/Lib/urllib/request.py

Modified: python/branches/py3k-urllib/Lib/http/client.py
==============================================================================
--- python/branches/py3k-urllib/Lib/http/client.py	(original)
+++ python/branches/py3k-urllib/Lib/http/client.py	Thu Nov 25 09:18:20 2010
@@ -71,6 +71,7 @@
 import io
 import os
 import socket
+import gzip
 from urllib.parse import urlsplit
 import warnings
 
@@ -491,6 +492,11 @@
             self.close()
             return b""
 
+        if self.getheader('Content-Encoding') == 'gzip':
+            self.fp = gzip.GzipFile(fileobj=self.fp, mode='rb')
+            self.length = None
+            amt=None
+
         if self.chunked:
             return self._read_chunked(amt)
 

Modified: python/branches/py3k-urllib/Lib/urllib/request.py
==============================================================================
--- python/branches/py3k-urllib/Lib/urllib/request.py	(original)
+++ python/branches/py3k-urllib/Lib/urllib/request.py	Thu Nov 25 09:18:20 2010
@@ -1074,6 +1074,10 @@
         headers = dict(req.unredirected_hdrs)
         headers.update(dict((k, v) for k, v in req.headers.items()
                             if k not in headers))
+        hasAcceptEncoding = ("Accept-Encoding" in headers or
+                "Accept-encoding" in headers)
+        if not hasAcceptEncoding:
+            headers["Accept-Encoding"] = "gzip"
 
         # TODO(jhylton): Should this be redesigned to handle
         # persistent connections?
@@ -1104,6 +1108,7 @@
             raise URLError(err)
 
         r.url = req.full_url
+
         # This line replaces the .msg attribute of the HTTPResponse
         # with .headers, because urllib clients expect the response to
         # have the reason in .msg.  It would be good to mark this


More information about the Python-checkins mailing list