Ham, mailing lists, and oddball character sets
Hi.
I work a bit in OSS and contribute to mimedefang and spamassassin, and myself am on about 70 mailing lists.
I notice that some of these lists that are open to outside mailings are magnets for spam.
As a consequence, the end-recipients of these lists can't "trust" mail coming from these lists, and if we filter it (and reject it), we run the risk of being auto-unsubscribed for too many delivery failures...
It's a thorny issue.
In a perfect world, all MUA's would comply with the following recommendation:
RFC 2046, last paragraph of section 4.1.2:
In general, composition software should always use the "lowest common denominator" character set possible. For example, if a body contains only US-ASCII characters, it SHOULD be marked as being in the US- ASCII character set, not ISO-8859-1, which, like all the ISO-8859 family of character sets, is a superset of US-ASCII. More generally, if a widely-used character set is a subset of another character set, and a body contains only characters in the widely-used subset, it should be labelled as being in that subset. This will increase the chances that the recipient will be able to view the resulting entity correctly.
And thereby, it would be trivial to bounce a message sent to an English-language only mailing list that wasn't encoded in USASCII or Latin1 (iso-8859-1) as the charset.
But alas they don't.
So end-users mail systems end up having to do this, which creates all sorts of backscatter to the mailing list, etc.
What if mailing list exploders did the following?
When you receive a message that has text/plain parts that aren't "charset=usascii" or "charset=latin1" attempt to transcode the parts into one of these (in that order, until success).
If the transcoding fails, reject the message.
Otherwise, substitute the rewritten parts in the forwarded message.
Yes, I know that it's not a good thing to rewrite messages... but most mailing lists do a fair amount of message munging anyway (to the point that PGP becomes useless, for instance).
What do you all think?
It doesn't have to be the default behavior... But it would definitely be handy to be an option.
Thanks,
-Philip
On Tue, May 04, 2010 at 12:32:42PM -0600, Philip A. Prindeville wrote:
And thereby, it would be trivial to bounce a message sent to an English-language only mailing list that wasn't encoded in USASCII or Latin1 (iso-8859-1) as the charset.
But alas they don't.
It still wouldn't be trivial even if they did. What about people who put their normal, proper names in their signatures. Maybe they're Greek. Maybe they're Taiwanese....
Cheers,
Cristóbal Palmer ibiblio.org metalab.unc.edu
Cristóbal Palmer writes:
On Tue, May 04, 2010 at 12:32:42PM -0600, Philip A. Prindeville wrote:
And thereby, it would be trivial to bounce a message sent to an English-language only mailing list that wasn't encoded in USASCII or Latin1 (iso-8859-1) as the charset.
But alas they don't.
It still wouldn't be trivial even if they did. What about people who put their normal, proper names in their signatures. Maybe they're Greek. Maybe they're Taiwanese....
Or maybe they're returned ex-pats with fond memories (and perhaps family) of their former host countries. There is also the problem of Windose-1252, with various punctuation marks not available in ISO 8859-1, not to mention the even further extended set available in Unicode. Let's not annoy the punctuation pedants!
But Philip already addressed these issues by saying make it configurable. I think that's reasonable as long as the list owner is made aware that she's probably going to trash posts from some of her members, and sooner, rather than later. If she wants to blame the victims for not following the rules and the spammers for the rules, that's her problem, no?
In any case, people with horked MUAs are surely used to being maltreated by software; they're almost certainly using MS or Apple products, no?<wink>
participants (3)
-
Cristóbal Palmer
-
Philip A. Prindeville
-
Stephen J. Turnbull