[Mailman-Users] A scrubber issue

Todd Zullinger tmz at pobox.com
Sun Dec 10 03:15:31 CET 2006


Mark Sapiro wrote:
> Todd Zullinger wrote:
[...]
>>Poking in the email package (on python 2.4.4) shows:
>>
>>    def get_content_charset(self, failobj=None):
>>        """Return the charset parameter of the Content-Type header.
>>
>>        The returned string is always coerced to lower case.  If there is no
>>        Content-Type header, or if that header has no charset parameter,
>>        failobj is returned.
>>        """
>>
>>This seems to violate section 5.2 of RFC 2045 which says parts
>>lacking a Content-type header should be assumed to be text/plain
>>with a charset of us-ascii.  The get_content_type method in
>>email.Message does mention RFC 2045 and uses text/plain if the
>>content-type is invalid.
> 
> It does seem inconsistent, but I don't think we can call it a
> violation of the RFC yet, it depends on what the caller does with
> it.

You're correct that violate is probably the wrong word.  It doesn't
make the assumption the RFC calls for, but it does allow the user to
change that.  I imagine that the code is written the way it is due to
the types of inconsistencies and flagrant violations one sees in the
reality of MIME parts.  If only it were simple enough to just write to
the standard and have things just work. :)

>>Would it be appropriate to set failobj="us-ascii" when calling this
>>method in Scrubber.py?
> 
> It might be, but I'd like to hear from Tokio first.
>
> Clearly this was considered at one point as a specific case and
> message exist for it where it would have been simpler to just assume
> it is us-ascii. Thus, I think there must be messages in the wild
> with parts with unspecified character sets that aren't us-ascii.

That reminds me, I looked at bug 1099138 to see if there was a test
case message that I could use to ensure any changes I made didn't
cause a regression.  Would it be good to have more test case messages
around that could be checked whenever stuff like this comes up?  That
way when something is changed to fix one issue it can be checked
against the previous messages to ensure it doesn't break them.  I was
envisioning something like a dir correlating to the bug id that
contained messages which triggered it.  There are some in the tests/
dir, but not a lot it seems.

Or is that a lot more work than it's worth?

-- 
Todd        OpenPGP -> KeyID: 0xBEAF0CE3 | URL: www.pobox.com/~tmz/pgp
======================================================================
The law will never make men free; it is men who have got to make the
law free.
    -- Henry David Thoreau

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 542 bytes
Desc: not available
Url : http://mail.python.org/pipermail/mailman-users/attachments/20061209/42db0535/attachment.pgp 


More information about the Mailman-Users mailing list