[Python-bugs-list] [ python-Bugs-594893 ] printing email object deletes whitespace
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 10 Sep 2002 10:29:04 -0700
Bugs item #594893, was opened at 2002-08-14 00:59
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=594893&group_id=5470
Category: Python Library
>Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: printing email object deletes whitespace
Initial Comment:
I certain situations when printing email Message objects (I think),
whitespace in headers disappears. The attached zip file
demonstrates this problem. In email.orig, there is a line break
followed by a TAB in the X-Vm-v5-Data header at the end of the
first continuation line. In email.new, which was generated by
printing an email.Message object, the line break and TAB are gone,
but no SPACE was inserted in their place.
This example is from a larger program which reads in a Unix
mailbox like so:
msgdict = {}
i = 0
for msg in mailbox.PortableUnixMailbox(f,
email.Parser.Parser().parse):
subj = msg["subject"]
item = msgdict.get(subj) or []
item.append((i, msg))
msgdict[subj] = item
i += 1
runs through msgdict and deletes a bunch of messages matching
various criteria, then prints out those which remain retaining the
relative order they had in the original mailbox:
msglist = []
for val in msgdict.values():
msglist.extend(val)
msglist.sort()
for i,msg in msglist:
print msg
email.orig was plucked from the input mailbox and email.new from
the output mailbox.
----------------------------------------------------------------------
>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-09-10 13:29
Message:
Logged In: YES
user_id=12800
Skip, you've got two difficult examples here. RFC 2822
recommends splitting lines at "the highest syntactic level"
possible, but that differs depending on the semantics of the
header. By default, Header._split_ascii() splits first on
semicolons (for multiple parameter headers) and then on
whitespace. Your two examples exploit weaknesses in this
algorithm.
In the first case, X-VM... has the syntax of a lisp
expression. A coarser way to look at the contents would be
to try to keep "-delimited strings without line breaks. The
email package doesn't know anything about either of these
syntactic levels.
In the second case, you actually have X-Face data which
contains a semi-colon, so the split mentioned above does the
wrong thing in this case.
I'm not sure what the best answer is. We can't hardcode too
much syntactic information into the Header class. Do we
need some kind of registration/callback mechanism so that
applications can create their own tokenization routines for
providing non-breaking tokens to the ascii_split() method?
Yeesh.
I'm up for suggestions. I can add a hack so that at least
the X-VM header doesn't *lose* information when printed, but
it's just a hack, so I'm not sure what the best solution is.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2002-08-29 10:45
Message:
Logged In: YES
user_id=44345
Hmmm... Sometimes seems to *add* whitespace as well. Here's an
example using the X-Face: header:
Before:
X-Face:
$LeJ8}Gzj%b'dmF:@bMiTrpT|UL=3O!CG~3;}dS[43`qefo('''9?B=2a0u*B4u+a)$"DYl
S
After:
X-Face: $LeJ8}Gzj%b'dmF:@bMiTrpT|UL=3O!CG~3;
}dS[43`qefo('''9?B=2a0u*B4u+a)$"DYlS
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=594893&group_id=5470