[New-bugs-announce] [issue34222] Email message serialization enters an infinite loop when folding non-ASCII headers with long words
Grigory Statsenko
report at bugs.python.org
Wed Jul 25 08:56:44 EDT 2018
New submission from Grigory Statsenko <grisha100 at gmail.com>:
(Discovered together with https://bugs.python.org/msg322348)
Email message serialization (in function _fold_as_ew) enters an infinite loop when folding non-ASCII headers whose words (after encoding) are longer than the given maxlen.
Besides being stuck in an infinite loop, it keeps appending to the `lines` list, so its memory usage keeps on growing also infinitely.
The code keeps appending encoded empty strings to the list like this:
lines: [
'Subject: =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' '
]
(and it keeps on growing)
Here is my code that can reproduce this issue (as a unittest):
import email.generator
import email.policy
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from unittest import TestCase
def create_message(subject, sender, recipients, body):
msg = MIMEMultipart()
msg.set_charset('utf-8')
msg.policy = email.policy.SMTP
msg.attach(MIMEText(body, 'html'))
msg['Subject'] = subject
msg['From'] = sender
msg['To'] = ';'.join(recipients)
return msg
class TestEmailMessage(TestCase):
def _make_message(self, subject):
return create_message(
subject=subject, sender='me at site.com',
recipients=['me at site.com'], body='Some text',
)
def test_ascii_message_with_len_limit(self):
# very long subject consisting of a single word
subject = 'Q' * 100
msg = self._make_message(subject)
self.assertTrue(msg.as_string(maxheaderlen=76))
def test_non_ascii_message_with_len_limit(self):
# very long subject consisting of a single word
subject = 'Ц' * 100
msg = self._make_message(subject)
self.assertTrue(msg.as_string(maxheaderlen=76))
The ASCII test passes, but the non-ASCII one never finishes.
>From what I can tell, the problem is in line 2728 of email/_header_value_parser.py:
first_part = first_part[:-excess]
where `excess` is calculated from the encoded string
(which is several times longer than the original one),
but it truncates the original (non-encoded string).
The problem arises when `excess` is actually greater than `first_part`
So, it attempts to encode the exact same part of the header and fails in every iteration,
instead appending an empty string to the list and encoding it as ' =?utf-8?q??='
What this amounts to is that it's now practically impossible to send emails with non-ACSII subjects without either disregarding the RFC recommendations and requirements for line length or risking hangs and memory leaks.
Just like in https://bugs.python.org/msg322348, this behavior is new in Python 3.6. Also does not work in 3.7 and 3.8
----------
components: email
messages: 322351
nosy: altvod, barry, r.david.murray
priority: normal
severity: normal
status: open
title: Email message serialization enters an infinite loop when folding non-ASCII headers with long words
versions: Python 3.6, Python 3.7, Python 3.8
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue34222>
_______________________________________
More information about the New-bugs-announce
mailing list