[Email-SIG] email.header.decode_header eats my spaces
Stephen J. Turnbull
stephen at xemacs.org
Wed Mar 28 17:25:18 CEST 2007
Tokio Kikuchi writes:
> Barry Warsaw wrote:
>
> > On Mar 27, 2007, at 3:06 AM, Tokio Kikuchi wrote:
> >
> >> In my opinion (may not be true to RFC2822 in detail), ascii strings in
> >> header object should be strip()ped and separated by FWS (including
> >> '\r\n ' or '\r\n\t').
> >
> > I actually think we should be doing the opposite, namely preserving any
> > FWS in the existing text and /not/ substituting continuation_ws for it
> > when we re-break the headers. This is the only way to maintain
> > idempotency short of saving the original header intact (but then memory
> > usage doubles).
Idempotency is a test, not a requirement. The requirement is "first,
do no harm". Ie, if you process the header, the result should be as
much "like" the original as possible. This is not actually
implementable (different people will have different opinions about
what that means, except only *really different* people will have the
opinion that idempotency is undesirable<wink>), but the email package
should make it possible for people to get pretty close without
rewriting the package.
> > continuation_ws should be used only when we're forced
> > to break at a non-existing FWS location, e.g. if we've split a non-ascii
> > header or at a non-whitespace header-specific syntactic break. In the
> > case of RFC 2047 headers, the FWS gets consumed anyway so it isn't
> > idempotentially (?!) significant.
Only in RFC 2047 conformant MUAs. IMHO, RFC 2047 conformance is a
requirement, but it's not sufficient. There are too many MUAs out
that that do not correctly handle headers folded between encoded words
(eg, Kyle Jones's VM). I don't know if you *should* care, but I think
that RFC 2047 is (unfortunately) insufficient grounds for refusing to
care at this stage.
AFAICS the implication is that you need to make a judicious choice of
the default for continuation_ws.
> Well, this will surely break my contribution on Mailman 2.2
> CookHeaders.py where unifying the code for subject prefix munging for
> both ascii and rfc2047. :-(
I don't see why it should, although there might be technical reasons
why it would. What I want, and what I think Barry is proposing, is
simply that the email package never does anything to disturb FWS by
default.
If you munge a header (even as trivially as removing a "Re:" prefix),
you must accept responsibility for formatting the result. At that
point, I see no reason why the email package shouldn't help you
"reflow" a header if that's desirable in your application---but the
application should have to request that explicitly. It shouldn't be
implicit in the setting of continuation_ws.
> May be we should add a option for email.header.Header(), like
> idempotent=Ture/False. ;-)
I think it would be better to add an option, or even a hook function,
for formatting. For example, I often use a docstring-like convention
for long subject headers, where the gist is in the first line, and the
rest is formatted nicely (ie, indented to align with the initial
character of the first line of the subject). It would be nice if that
kind of thing could be done with an application-supplied function (of
course email could provide a number of common ones itself).
More information about the Email-SIG
mailing list