[Python-checkins] cpython (3.2): #11584: make Header and make_header handle binary unknown-8bit input

r.david.murray python-checkins at python.org
Sat Jun 18 19:02:59 CEST 2011


http://hg.python.org/cpython/rev/3875ccea6367
changeset:   70862:3875ccea6367
branch:      3.2
parent:      70860:d62e5682a8ac
user:        R David Murray <rdmurray at bitdance.com>
date:        Sat Jun 18 12:57:28 2011 -0400
summary:
  #11584: make Header and make_header handle binary unknown-8bit input

Analogous to the decode_header fix, this fix makes Header.append and
make_header correctly handle the unknown-8bit charset introduced by email5.1,
when the input to them is binary strings.  Previous to this fix the
make_header(decode_header(x)) == x invariant was broken in the face of the
unknown-8bit charset.

files:
  Lib/email/header.py          |   5 ++++-
  Lib/email/test/test_email.py |  15 +++++++++++++++
  Misc/NEWS                    |   3 ++-
  3 files changed, 21 insertions(+), 2 deletions(-)


diff --git a/Lib/email/header.py b/Lib/email/header.py
--- a/Lib/email/header.py
+++ b/Lib/email/header.py
@@ -275,7 +275,10 @@
             charset = Charset(charset)
         if not isinstance(s, str):
             input_charset = charset.input_codec or 'us-ascii'
-            s = s.decode(input_charset, errors)
+            if input_charset == _charset.UNKNOWN8BIT:
+                s = s.decode('us-ascii', 'surrogateescape')
+            else:
+                s = s.decode(input_charset, errors)
         # Ensure that the bytes we're storing can be decoded to the output
         # character set, otherwise an early error is thrown.
         output_charset = charset.output_codec or 'us-ascii'
diff --git a/Lib/email/test/test_email.py b/Lib/email/test/test_email.py
--- a/Lib/email/test/test_email.py
+++ b/Lib/email/test/test_email.py
@@ -4182,6 +4182,21 @@
                         'Ynwp4dUEbay Auction Semiar- No Charge \uFFFD Earn Big')
         self.assertEqual(email.header.decode_header(h), [(x, 'unknown-8bit')])
 
+    def test_header_handles_binary_unknown8bit(self):
+        x = b'Ynwp4dUEbay Auction Semiar- No Charge \x96 Earn Big'
+        h = Header(x, charset=email.charset.UNKNOWN8BIT)
+        self.assertEqual(str(h),
+                        'Ynwp4dUEbay Auction Semiar- No Charge \uFFFD Earn Big')
+        self.assertEqual(email.header.decode_header(h), [(x, 'unknown-8bit')])
+
+    def test_make_header_handles_binary_unknown8bit(self):
+        x = b'Ynwp4dUEbay Auction Semiar- No Charge \x96 Earn Big'
+        h = Header(x, charset=email.charset.UNKNOWN8BIT)
+        h2 = email.header.make_header(email.header.decode_header(h))
+        self.assertEqual(str(h2),
+                        'Ynwp4dUEbay Auction Semiar- No Charge \uFFFD Earn Big')
+        self.assertEqual(email.header.decode_header(h2), [(x, 'unknown-8bit')])
+
     def test_modify_returned_list_does_not_change_header(self):
         h = Header('test')
         chunks = email.header.decode_header(h)
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -26,7 +26,8 @@
 -------
 
 - Issue #11584: email.header.decode_header no longer fails if the header
-  passed to it is a Header object.
+  passed to it is a Header object, and Header/make_header no longer fail
+  if given binary unknown-8bit input.
 
 - Issue #11700: mailbox proxy object close methods can now be called multiple
   times without error.

-- 
Repository URL: http://hg.python.org/cpython


More information about the Python-checkins mailing list