UnicodeDecodeError during Archive Obscuring
I've seen a increase in use of Unicod'ed email address which fail HyperArch email obscuring and thus cause the msg to be shunted and not archived.
Specifically the problem lies in the encoding (strangely the error says "Decode") of the text ' at ' (as a substitute for "@") for Russian users of GMail.
Here is the error in full:
Nov 22 04:42:31 2007 (755) Uncaught runner exception: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128) Nov 22 04:42:31 2007 (755) Traceback (most recent call last): File "/usr/local/mailman/Mailman/Queue/Runner.py", line 112, in _oneloop self._onefile(msg, msgdata) File "/usr/local/mailman/Mailman/Queue/Runner.py", line 170, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/usr/local/mailman/Mailman/Queue/ArchRunner.py", line 89, in _dispose mlist.ArchiveMail(msg) File "/usr/local/mailman/Mailman/Archiver/Archiver.py", line 250, in ArchiveMail h.processUnixMailbox(f) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 580, in processUnixMailbox self.add_article(a) File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 621, in add_article filename)) File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1275, in write_article f.write(article.as_text()) File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 579, in as_text atmark = unicode(_(' at '), Utils.GetCharSet(self._lang)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)
Nov 22 04:42:31 2007 (755) SHUNTING: 1195717174.0985031 +6fdaf61658ca76ec5281e614be7b4b59d1e01bf6
This is a Mailman 2.1.9 system.
I did search the archives quite extensively, but I didn't find any cases where Mailman was having trouble encoding the hard coded ' at ' into unicode.
Here is a sample (modified from original to protect the innocent) of the message "From:" line: "=?ISO-8859-5?B?pyvY7zB08L6a0C==?=" <abc.xyz@gmail.com>
Any ideas on what to try/test/do?
Thanks!
-Jim P.
Hi,
I've seen a increase in use of Unicod'ed email address which fail HyperArch email obscuring and thus cause the msg to be shunted and not archived.
Specifically the problem lies in the encoding (strangely the error says "Decode") of the text ' at ' (as a substitute for "@") for Russian users of GMail.
Here is the error in full:
File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 579, in as_text atmark = unicode(_(' at '), Utils.GetCharSet(self._lang)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)
Nov 22 04:42:31 2007 (755) SHUNTING: 1195717174.0985031 +6fdaf61658ca76ec5281e614be7b4b59d1e01bf6 ----------------------------------------------------------------------------
This is a Mailman 2.1.9 system.
I did search the archives quite extensively, but I didn't find any cases where Mailman was having trouble encoding the hard coded ' at ' into unicode.
It is not ' at ' itself but it's translation which caused this error. It's strange though the language set immediately before should work for its unicode conversion.
Any ideas on what to try/test/do?
How about this patch (sorry for the word wrap) for work around. === modified file 'Mailman/Archiver/HyperArch.py' --- Mailman/Archiver/HyperArch.py 2007-11-21 05:21:24 +0000 +++ Mailman/Archiver/HyperArch.py 2007-11-25 03:59:07 +0000 @@ -412,7 +412,8 @@ otrans = i18n.get_translation() try: i18n.set_language(self._lang) - atmark = unicode(_(' at '), Utils.GetCharSet(self._lang)) + atmark = unicode(_(' at '), Utils.GetCharSet(self._lang), + 'replace') subject = re.sub(r'([-+,.\w]+)@([-+.\w]+)', '\g<1>' + atmark + '\g<2>', subject) finally: @@ -574,7 +575,7 @@ if mm_cfg.ARCHIVER_OBSCURES_EMAILADDRS: otrans = i18n.get_translation() try: - atmark = unicode(_(' at '), cset) + atmark = unicode(_(' at '), cset, 'replace') i18n.set_language(self._lang) body = re.sub(r'([-+,.\w]+)@([-+.\w]+)', '\g<1>' + atmark + '\g<2>', body) -- Tokio Kikuchi, tkikuchi@is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
Tokio Kikuchi wrote:
File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 579, in as_text atmark = unicode(_(' at '), Utils.GetCharSet(self._lang)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)
<snip>
It is not ' at ' itself but it's translation which caused this error. It's strange though the language set immediately before should work for its unicode conversion.
It's even stranger than that. the codec is 'ascii'. The only language with a charset of 'ascii' is 'en' and if the language is 'en', where does the '\xd0' come from?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro writes:
Tokio Kikuchi wrote:
File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 579, in as_text atmark = unicode(_(' at '), Utils.GetCharSet(self._lang)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 1: ordinal not in range(128)
<snip>
It is not ' at ' itself but it's translation which caused this error. It's strange though the language set immediately before should work for its unicode conversion.
It's even stranger than that. the codec is 'ascii'. The only language with a charset of 'ascii' is 'en' and if the language is 'en', where does the '\xd0' come from?
Almost certainly, the language is not 'en', the language is 'unknown'. The last time there was a spate of these problems, I took a quick look at the code. It appears to me that the Email module takes the MIME spec seriously, and applies the defaults to that case, ie, language = 'en' and charset = 'us-ascii'. IOW, it tests that headers are ASCII by decoding them as ASCII. Boom! Since there's no try specific to that attempt, you end up with the default catchall try.
I didn't have enough time to really understand what was going on, so this is really a wild-ass guess. Hope it helps, anyway.
Happy Thanksgiving (and roudou-kansha-hi)!
participants (4)
-
Jim Popovitch
-
Mark Sapiro
-
Stephen J. Turnbull
-
Tokio Kikuchi