[New-bugs-announce] [issue4958] email/header.py ecre regular expression issue
Jan Malakhovski
report at bugs.python.org
Fri Jan 16 00:33:09 CET 2009
New submission from Jan Malakhovski <jan.malachowski at gmail.com>:
Hello.
I have dedicated mail server at home
and it holds about 1G of mail.
Most of mail is in non UTF-8 codepage, so today
I wrote little script that should recode
all letters to UTF. But I found that
email.header.decode_header parses some headers wrong.
For example, header
Content-Type: application/x-msword; name="2008
=?windows-1251?B?wu7v8O7x+w==?= 2 =?windows-1251?B?4+7kIDgwONUwMC5kb2M=?="
parsed as
[('application/x-msword; name="2008', None),
('\xc2\xee\xef\xf0\xee\xf1\xfb', 'windows-1251'), ('2
=?windows-1251?B?4+7kIDgwONUwMC5kb2M=?="', None)]
that is obviously wrong.
Now I'm playing with email/header.py file in
python 2.5 debian package
(but it's same in 2.6.1 version except that all <> changed to !=).
If it's patched with
==================BEGIN CUT==================
--- oldheader.py 2009-01-16 01:47:32.553130030 +0300
+++ header.py 2009-01-16 01:47:16.783119846 +0300
@@ -39,7 +39,6 @@
\? # literal ?
(?P<encoded>.*?) # non-greedy up to the next ?= is the encoded
string
\?= # literal ?=
- (?=[ \t]|$) # whitespace or the end of the string
''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
# Field name regexp, including trailing colon, but not separating
whitespace,
==================END CUT==================
it works fine.
So I wonder if this
(?=[ \t]|$) # whitespace or the end of the string
really needed, after all if there is only
whitespaces after encoded word, its just
appended to the list by
parts = ecre.split(line)
--
Also, there is related mail list thread:
http://mail.python.org/pipermail/python-dev/2009-January/085088.html
----------
components: Library (Lib)
messages: 79927
nosy: oxij
severity: normal
status: open
title: email/header.py ecre regular expression issue
type: behavior
versions: Python 2.5, Python 2.6
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4958>
_______________________________________
More information about the New-bugs-announce
mailing list