[Mailman-Users] A scrubber issue
tkikuchi at is.kochi-u.ac.jp
Sun Dec 10 14:30:18 CET 2006
Todd Zullinger wrote:
> I wrote:
>> Tokio Kikuchi wrote:
>>> But as to the default charset is 'us-ascii' problem, if we put the
>>> part together the parts, some language text (like japanese) become
>>> irreversibly unreadable. It is safe to keep it in a separate file
>>> if you can't archive the whole message in multipart like in
>> Okay, that's understandable.
> Just another thought (because I realize now that I don't understand
> this as well as I thought at first :)...
> Are you saying there are messages which would lack a charset in a
> content-type header and include Japanese text? I wouldn't think they
> would be valid if they didn't. But I may not understand the types of
> message structures you mean.
> If the email parsing were to assume that lacking a content-type header
> the part should be assumed to be text/plain and us-ascii, would this
> break valid messages or only invalid ones (not that invalid ones could
> necessarily be ignored, particularly if they were a significant
> portion of the messages seen in reality :).
RFC822 email message without the charset parameter may be assumed to be
us-ascii. But for the text attachment, it may or may not be assumed.
For example, mailman patch file within the i18n message directory will
have mixed charset like iso-8859-1 for fr directory and euc-jp for ja.
If you assume the charset is us-ascii and made archive with ? for
unprintable characters, the patch file cannot be used. You are always
safe if you save the text file as is in a separate attachment directory.
In short, a text attachment is not a email message.
> I'd be grateful if you could enlighten me on this.
Tokio Kikuchi, tkikuchi at is.kochi-u.ac.jp
More information about the Mailman-Users