Public bug reported:
When a message contains an invalud unicode sequence in its header, qrunner flat out crashes on that:
May 17 15:32:20 2015 (981) Uncaught runner exception: 'utf8' codec can't decode byte 0xe9 in position 18: invalid continuation byte May 17 15:32:20 2015 (981) Traceback (most recent call last): File "/var/lib/mailman/Mailman/Queue/Runner.py", line 119, in _oneloop self._onefile(msg, msgdata) File "/var/lib/mailman/Mailman/Queue/Runner.py", line 190, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 130, in _dispose more = self._dopipeline(mlist, msg, msgdata, pipeline) File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 153, in _dopipeline sys.modules[modname].process(mlist, msg, msgdata) File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 239, in process i18ndesc = uheader(mlist, mlist.description, 'List-Id', maxlinelen=998) File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 65, in uheader return Header(s, charset, maxlinelen, header_name, continuation_ws) File "/usr/lib/python2.7/email/header.py", line 183, in __init__ self.append(s, charset, errors) File "/usr/lib/python2.7/email/header.py", line 267, in append ustr = unicode(s, incodec, errors) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 18: invalid continuation byte
May 17 15:32:20 2015 (981) SHUNTING: 1431869540.389822+156779307d54473d0eb732994bb67eee95733285
A solution for this specific case is to have Mailman/Handlers/CookHeaders.py pass the erorrs='replace' parameter.
I would say that this is actually a bug in python-email, since I think it doesn't make sense to set errors to "strict" rather than something like "replace" when the intention is to parse stuff so free-formed, under-specd and user-controlled as email. Nonetheless, Mailman already sets errors='replace' in some places so it might aswell add it here.
** Affects: mailman Importance: Undecided Status: New
Actually, the traceback says what's happening is CookHeaders is trying to create the List-Id: header to be added to the message.
It tries to create a header of the form:
List-Id: list description <list.example.com>
And the exception occurs when trying to rfc 2047 encode the list's description in the charset of the list's preferred language. This exception should be occurring on every list post. Is that the case?
Also, what is the list's preferred_language and what is the raw value of the list's description attribute. Obtain this info with something like:
$ bin/withlist list1 Loading list list1 (unlocked) The variable `m' is the list1 MailList instance
'My List one'
(of course the list name and responses will be different in your case.)
** Changed in: mailman Importance: Undecided => Medium
** Changed in: mailman Status: New => Incomplete
** Changed in: mailman Milestone: None => 2.1.21
** Changed in: mailman Assignee: (unassigned) => Mark Sapiro (msapiro)
I received this response:
root@barbershop:~# /usr/lib/mailman/bin/withlist caljente Loading list caljente (unlocked) The variable `m' is the caljente MailList instance
'Lijst voor Caljent\xe9-leden'
Not sure what encoding that is. I've changed it to "Caljente" for now, which should be a reasonable workaround.
It appears the underlying issue is someone has changed Mailman's character set for 'nl' (Dutch) from iso-8859-1 to utf-8. Possibly whoever did this did the appropriate things such as recoding the message catalog and templates to utf-8, but in any case, the strings in the attributes of this list weren't recoded. This is one of the major problems that make it difficult to change Mailman's encoding for a language. See the definitions of the recode(), doitem() and convert() functions in Mailman/versions.py in Mailman 2.1.19 or later.
So basically, this issue appears to be a 'shot oneself in the foot' thing and probably could be fixed by setting the list's description to 'Lijst voor Caljent\xc3\xa9-leden', although I would be concerned that there are other iso-8859-1 strings in list attributes.
Anyway, I see this as an issue worth fixing. The fix I would propose is in Mailman/Handlers/CookHeaders.py replace the line at the end of the definition of uheader which is currently
return Header(s, charset, maxlinelen, header_name, continuation_ws)
try: return Header(s, charset, maxlinelen, header_name, continuation_ws) except UnicodeError: syslog('error', 'list: %s: can't decode "%s" as %s', mlist.internal_name(), s, charset) return Header('', charset, maxlinelen, header_name, continuation_ws)
** Changed in: mailman Importance: Medium => Low
** Changed in: mailman Status: Incomplete => In Progress
** Branch linked: lp:mailman/2.1
** Changed in: mailman Status: In Progress => Fix Committed
Thanks for the fix! Although arguably a misconfiguration, it's good that it doesn't crash the qrunner.
Actually, IncomingRunner doesn't actually "crash"; it does encounter an unanticipated exception causing it to log the exception and shunt the message, and yes, the underlying issue is definitely a "misconfiguration", but catching the exception and dealing with it more gracefully without shunting the message wasn't hard, so I thought it worthwhile.
** Changed in: mailman Status: Fix Committed => Fix Released
** Changed in: mailman Milestone: 2.1.21 => 2.1.21rc1
For more information on the causes of this issue and the fallout from what turns out to be Debian's changing of the character set for several languages, see the thread "Encoding problem with 2.15 to 2.18 upgrade with Finnish" beginning at https://mail.python.org/pipermail/mailman- users/2015-December/080221.html and continuing at https://mail.python.org/pipermail/mailman- users/2016-January/080275.html. There is a script mentioned in that thread at https://www.msapiro.net/scripts/recode_list (mirrored at http://fog.ccsf.edu/~msapiro/scripts/recode_list) that can programmatically recode the strings in a list's configuration to "fix" this issue.
I just ran into this problem following an upgrade from 12.04 LTS to 16.04 LTS. recode_list fixed the problem (thank you, Mark!) but this seems like something Ubuntu should detect and offer to correct during the upgrade.
** Also affects: mailman (Ubuntu) Importance: Undecided Status: New
Setting the task for a whishlist item, since it is essentially a config change that breaks it (as Mark said 'shot oneself in the foot'.
I'm personally not so keen about on-upgrade detection+warning since that (in general) has a history of too many false-positives leading people to config-break their system without a reason. But then as I read Mark this is due to Debian intentionally changing some encodings, so maybe it should be done ...
** Changed in: mailman (Ubuntu) Status: New => Confirmed
** Changed in: mailman (Ubuntu) Importance: Undecided => Wishlist