[New-bugs-announce] [issue19003] email.generator.BytesGenerator corrupts data by changing line endings

Alexander Kruppa report at bugs.python.org
Wed Sep 11 09:47:20 CEST 2013


New submission from Alexander Kruppa:

This is a follow-up to #16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption.

Repost of the update I posted in #16564:

***********************************************************
~/build/Python-3.3.2$ ./python --version
Python 3.3.2

When modifying the test case in Lib/test/test_email/test_email.py like this:

--- Lib/test/test_email/test_email.py	2013-05-15 18:32:55.000000000 +0200
+++ Lib/test/test_email/test_email_mine.py	2013-09-10 14:22:08.160089440 +0200
@@ -1461,17 +1461,17 @@
         # Issue 16564: This does not produce an RFC valid message, since to be
         # valid it should have a CTE of binary.  But the below works in
         # Python2, and is documented as working this way.
-        bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff'
+        bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'
         msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop)
         # Treated as a string, this will be invalid code points.
-        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
         self.assertEqual(msg.get_payload(decode=True), bytesdata)
         s = BytesIO()
         g = BytesGenerator(s)
         g.flatten(msg)
         wireform = s.getvalue()
         msg2 = email.message_from_bytes(wireform)
-        self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
+        # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata))
         self.assertEqual(msg2.get_payload(decode=True), bytesdata)

then running:

./python ./Tools/scripts/run_tests.py test_email

results in:

======================================================================
FAIL: test_binary_body_with_encode_noop (test_email_mine.TestMIMEApplication)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/localdisk/kruppaal/build/Python-3.3.2/Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop
    self.assertEqual(msg2.get_payload(decode=True), bytesdata)
AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff'

The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted.

Encoding the bytes array:
bytes(range(256))

results output data (MIME Header stripped):

0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a  ................
0000010: 0e0f 1011 1213 1415 1617 1819 1a1b 1c0a  ................
0000020: 1d0a 1e0a 1f20 2122 2324 2526 2728 292a  ..... !"#$%&'()*
0000030: 2b2c 2d2e 2f30 3132 3334 3536 3738 393a  +,-./0123456789:
0000040: 3b3c 3d3e 3f40 4142 4344 4546 4748 494a  ;<=>?@ABCDEFGHIJ
0000050: 4b4c 4d4e 4f50 5152 5354 5556 5758 595a  KLMNOPQRSTUVWXYZ
0000060: 5b5c 5d5e 5f60 6162 6364 6566 6768 696a  [\]^_`abcdefghij
0000070: 6b6c 6d6e 6f70 7172 7374 7576 7778 797a  klmnopqrstuvwxyz
0000080: 7b7c 7d7e 7f80 8182 8384 8586 8788 898a  {|}~............
0000090: 8b8c 8d8e 8f90 9192 9394 9596 9798 999a  ................
00000a0: 9b9c 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa  ................
00000b0: abac adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba  ................
00000c0: bbbc bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca  ................
00000d0: cbcc cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da  ................
00000e0: dbdc ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea  ................
00000f0: ebec edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa  ................
0000100: fbfc fdfe ff                             .....

That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e', 
and '\x0d' is replaced by '\n\n'.

***********************************************************

I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings.

----------
components: email
messages: 197476
nosy: Alexander.Kruppa, barry, r.david.murray
priority: normal
severity: normal
status: open
title: email.generator.BytesGenerator corrupts data by changing line endings
type: behavior
versions: Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19003>
_______________________________________


More information about the New-bugs-announce mailing list