[issue18044] Email headers do not properly decode to unicode.

New submission from Tim Rawlinson: In Python 3.3 decoding of headers to unicode is supposed to be automatic but fails in several cases, including one shown as successful in the documentation:
Although the following works:
Though this does not:
And just to prove some things are working as they should: >>> msg = message_from_string("Subject: =?gb2312?b?1eLKx9bQzsSy4srUo6E=?=\n\n", policy=default) >>> msg['Subject'] '这是中文测试!' ---------- assignee: docs@python components: Documentation, email messages: 189862 nosy: Tim.Rawlinson, barry, docs@python, r.david.murray priority: normal severity: normal status: open title: Email headers do not properly decode to unicode. type: behavior versions: Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

Changes by Ezio Melotti <ezio.melotti@gmail.com>: ---------- nosy: +ezio.melotti _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

Roundup Robot added the comment: New changeset 4acb822f4c43 by R David Murray in branch '3.3': #18044: Fix parsing of encoded words of the form =?utf8?q?=XX...?= http://hg.python.org/cpython/rev/4acb822f4c43 New changeset 32c6cfffbddd by R David Murray in branch 'default': Merge: #18044: Fix parsing of encoded words of the form =?utf8?q?=XX...?= http://hg.python.org/cpython/rev/32c6cfffbddd ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

R. David Murray added the comment: This is actually two separate bugs, both a bit embarrassing. The first one (that I just fixed) is that when parsing an encoded word I was only checking for decimal digits after an '=' (instead of the correct hex digits) when trying to do robust detection of encoded word limits. The second (the address one) is due to the fact that somewhere between stopping my full time work on the email project and actually committing the code, I lost track of the fact that I never implemented encoded word parsing for anything other than unstructured headers. The infrastructure is there, I just need to write tests and hook it up. I'm going to open a separate issue for that. ---------- resolution: -> fixed stage: -> committed/rejected status: open -> closed versions: +Python 3.4 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

R. David Murray added the comment: The issue for the second bug is issue 18431. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

Changes by Ezio Melotti <ezio.melotti@gmail.com>: ---------- nosy: +ezio.melotti _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

Roundup Robot added the comment: New changeset 4acb822f4c43 by R David Murray in branch '3.3': #18044: Fix parsing of encoded words of the form =?utf8?q?=XX...?= http://hg.python.org/cpython/rev/4acb822f4c43 New changeset 32c6cfffbddd by R David Murray in branch 'default': Merge: #18044: Fix parsing of encoded words of the form =?utf8?q?=XX...?= http://hg.python.org/cpython/rev/32c6cfffbddd ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

R. David Murray added the comment: This is actually two separate bugs, both a bit embarrassing. The first one (that I just fixed) is that when parsing an encoded word I was only checking for decimal digits after an '=' (instead of the correct hex digits) when trying to do robust detection of encoded word limits. The second (the address one) is due to the fact that somewhere between stopping my full time work on the email project and actually committing the code, I lost track of the fact that I never implemented encoded word parsing for anything other than unstructured headers. The infrastructure is there, I just need to write tests and hook it up. I'm going to open a separate issue for that. ---------- resolution: -> fixed stage: -> committed/rejected status: open -> closed versions: +Python 3.4 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________

R. David Murray added the comment: The issue for the second bug is issue 18431. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18044> _______________________________________
participants (4)
-
Ezio Melotti
-
R. David Murray
-
Roundup Robot
-
Tim Rawlinson