[Email-SIG] fixing the current email module
v+python at g.nevcal.com
Fri Oct 9 22:43:59 CEST 2009
On approximately 10/9/2009 1:38 AM, came the following characters from
the keyboard of Tokio Kikuchi:
> Glenn Linderman wrote:
>> On approximately 10/8/2009 8:47 PM, came the following characters from
>> the keyboard of Tokio Kikuchi:
>>>>> Actually, as long as the prepended text is ASCII, all that work can be
>>>>> done on the encoded value. When it is not ASCII, it may still be
>>>>> separated and recognizable. Still that logic is more complex than
>>>>> decoding, handling as Unicode, and encoding.... when it works. Just
>>>>> pointing out that there is more than one way to do things...
>>> Oh, really?
>>> Base64 is 3 to 4 octets encoding and there is no way to prepend padding.
>> In header values, encoding is done using encoded-words. A header value
>> consists of a sequence of ASCII words, and encoded-words. While an
>> encoded word, that uses base64 encoding cannot easily be adjusted to
>> prepend data into that encoded-word, additional ASCII or encoded-words
>> can be prepended in front of the other ASCII or encoded words within the
>> So, yes, really!
> Following two lines have equivalent header contents:
> Re: [mmjp-users 123] =?iso-2022-jp?b?GyRCRnxLXDhsGyhC?=
> Re: =?iso-2022-jp?b?W21tanAtdXNlcnMgMTIzXSAbJEJGfEtcOGwbKEI=?=
> I'd like to see how you can extract ascii part without touching rest of
> the encoded word in the second example.
I can't, and I didn't say I could.
> What we do in mailman is that both are treated equally and delete
> [mmjp-users 123] from the subject and prefix again by [mmjp-users 124]
> (with new sequential number). Some MUA encode subjects like the second
> example and this is beyond our control. Therefore, we are forced to
> decode the whole part of header content.
Yes, if the MUA has created the second encoding, decoding is required in
order to replace the header prefix.
If the MUA has created the first encoding, then decoding would not be
required in order to replace the header prefix, but the logic to detect
which case and handle them separately, results in more complexity in the
What I said, was that prefixing a header value with additional text
didn't require decoding, and that is true.
What you are saying, is that you want to do more than prefix a header
value with additional text.
What you are saying is that you would rather choose to keep the
application logic simple, by assuming or requiring that the existing
header value is able to be decoded. If that is sufficient for your
application, it is a reasonable choice. What do you do with messages
for which the header you wish to modify cannot be decoded? Some options
1) bounce the message
2) discard the message
3) determine if the header value is partially able to be decoded, and if
the part that can be decoded contains the data you wish to modify,
modify it, and simply preserve and pass-through the parts that could not
4) if the header value cannot be at all decoded, or the parts that can
be decoded do not contain the data you wish to modify, then you could
possibly choose to simply prefix information into the header in that
case, again preserving and passing through the parts that could not be
decoded (or, in this case, the whole value).
Perhaps you can think of other alternatives besides these, feel free to
Naturally, doing options 3 or 4 above requires more complex logic for
the application than options 1 or 2. The requirements of your
application should determine the types of choices you make.
For example, if a new or non-standard charset appears, an application
that requires the ability to decode the header, but hasn't been update
to understand the charset, will fail to decode it. Yet, if it has logic
like 3 or 4, it may be more successful, and would be a more robust
Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
More information about the Email-SIG