Re: [Mailman-Developers] Missing footers with latest CVS
"Dan" == Dan Mick <dmick@utopia.West.Sun.COM> writes:
Dan> Installed latest CVS yesterday, and notice now that some Dan> posters are not getting footers added to their messages. I Dan> suspect the footer-add-based- on-language code is at fault, Dan> but I haven't had time to isolate a pattern yet. Anyway, Dan> consider this Distant Early Warning that that code isn't Dan> working for some otherwise-untroubled list members.
Maybe it's working as intended? ;)
well, then, how it's intended has a bug. :)
Are they posting messages with a charset that is not the same as the mailing list's charset? The default charset is us-ascii; I will hazard a guess that they are posting messages flagged as UTF-8 or ISO-8859-1 but that are really us-ascii.
Well, so, one of them has no charset expressed at all that I can see.
We can add a work around for this, but it will be hard.
Not really, if appropriate workaround is "ignore the incoming charset and add this footer unconditionally please".
"Dan" == Dan Mick <dmick@utopia.west.sun.com> writes:
Dan> Well, so, one of them has no charset expressed at all that I
Dan> can see.
That means their charset is us-ascii. Is the list set to some other language? Could you please post the configuration of the list, and an example message without footer that was sent to the list?
Basically, we need to deal with the case where a list is configured for something like iso-8859-2, but a user sends a message in iso-8859-1, or utf-8, etc. In these cases, we can't just tack the footer on -- we'll get a garbage message! We have to avoid adding a footer if the charsets mismatch; no other way about it.
Dan> Not really, if appropriate workaround is "ignore the incoming
Dan> charset and add this footer unconditionally please".
But this is the worst thing you can do. What happens when I post a message in UTF-8 and then a Japanese ISO-2022-JP footer gets tacked on? Not good.
Ben
-- Brought to you by the letters X and G and the number 15. "Tahiti is not in Europe." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
Hi,
Ben Gertzfield wrote:
"Dan" == Dan Mick <dmick@utopia.west.sun.com> writes:
Dan> Well, so, one of them has no charset expressed at all that I Dan> can see.
That means their charset is us-ascii.
Well, people around in Japan still use older MUAs and don't add charset for Japanese messages.
So, the meaning of no charset should depend on the mm_cfg.DEFAULT_SERVER_LANGUAGE, I suppose.
Dan> Not really, if appropriate workaround is "ignore the incoming Dan> charset and add this footer unconditionally please".
But this is the worst thing you can do. What happens when I post a message in UTF-8 and then a Japanese ISO-2022-JP footer gets tacked on? Not good.
That's why you need to 'normalize' the charsets of incoming mail (into Unicode).
-- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
"Tokio" == Tokio Kikuchi <tkikuchi@is.kochi-u.ac.jp> writes:
Dan> Well, so, one of them has no charset expressed at all that I
Dan> can see.
Ben> That means their charset is us-ascii.
Tokio> Well, people around in Japan still use older MUAs and don't
Tokio> add charset for Japanese messages.
Tokio> So, the meaning of no charset should depend on the
Tokio> mm_cfg.DEFAULT_SERVER_LANGUAGE, I suppose.
This would violate RFC 1522:
"5.2. Content-Type Defaults
Default RFC 822 messages without a MIME Content-Type header are taken by this protocol to be plain text in the US-ASCII character set, which can be explicitly specified as:
Content-type: text/plain; charset=us-ascii"
Are we sure we really want to do that?
Dan> Not really, if appropriate workaround is "ignore the incoming
Dan> charset and add this footer unconditionally please".
Ben> But this is the worst thing you can do. What happens when I
Ben> post a message in UTF-8 and then a Japanese ISO-2022-JP
Ben> footer gets tacked on? Not good.
Tokio> That's why you need to 'normalize' the charsets of incoming
Tokio> mail (into Unicode).
That would work in the best of all possible worlds (i.e. NOT the real world :) but I think there is still information lost when converting some charsets into Unicode, like Big5 and EUC-TW. Also, who knows if we would corrupt PGP signatures by doing something like this?
I would like to have Mailman move to this model eventually, when all mail readers support UTF-8 natively.
Ben
-- Brought to you by the letters N and R and the number 15. "Tahiti is not in Europe." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
"Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> This would violate RFC 1522:
That's right. People with broken mailers have broken mailers. Make sure that things are robust for those with decent software, and then do what we can for the former poor souls.
Anyway, even my boss---who used to send mail in NEC JIS---has finally converted to a MIME-capable mailer (and uses it to send me Word attachments :-( ).
Ben> That would work in the best of all possible worlds (i.e. NOT
Ben> the real world :) but I think there is still information lost
Ben> when converting some charsets into Unicode, like Big5 and
Ben> EUC-TW.
Big5, AFAIK, not. CNS yes (I don't know about EUC-TW). But you could handle this by converting to multipart/alternative. (Of course that leaves the people with obsolete mailers out in the cold.)
Ben> Also, who knows if we would corrupt PGP signatures by doing
Ben> something like this?
You do, now. It's corrupted. What you'd have to do is convert to multipart/alternative, you couldn't just transcode. This would also have broader semantic implications because people would not necessarily know in that case that the "real" message was signed--- they'd have to go look.
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Don't ask how you can "do" free software business; ask what your business can "do for" free software.
"SJT" == Stephen J Turnbull <stephen@xemacs.org> writes:
"Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> This would violate RFC 1522:
SJT> That's right. People with broken mailers have broken
SJT> mailers. Make sure that things are robust for those with
SJT> decent software, and then do what we can for the former poor
SJT> souls.
Totally agreed. I mean, look at me, a "dinosaur" who uses a MIME-aware MUA in a system that was never originally designed to support the stuff you get in email these days. And it's mostly bug free <wink>. (Aside to Stephen: do you know if Kyle still handles VM bug reports these days? ;).
The only hope we have of interoperating is to support the standards, or at least not willfully break them <Hippocratic oath wink>. Which means if the charsets don't match, we can't simply tack on headers and footers. So we either don't add them or we add some multipart/mixed chrome and do it in a MIME-compliant way.
I really don't want to think about PGP right now. Mailcrypt w/GnuPG seems to only sign or encrypt the body, and in a non-MIME way, so if we wanted to add headers and footers it seems like we'd be safe by wrapping the original body in multipart/mixed chrome. Of course you'd have to unpack the parts to verify (read) the signed (encrypted) part. Oh well, there's not really much more you /can/ do.
-Barry
"BAW" == Barry A Warsaw <barry@zope.com> writes: "SJT" == Stephen J Turnbull <stephen@xemacs.org> writes: "Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> This would violate RFC 1522:
SJT> That's right. People with broken mailers have broken
SJT> mailers. Make sure that things are robust for those with
SJT> decent software, and then do what we can for the former poor
SJT> souls.
BAW> The only hope we have of interoperating is to support the
BAW> standards, or at least not willfully break them <Hippocratic
BAW> oath wink>. Which means if the charsets don't match, we
BAW> can't simply tack on headers and footers. So we either don't
BAW> add them or we add some multipart/mixed chrome and do it in a
BAW> MIME-compliant way.
Can we safely assume that all charsets folks will use with Mailman will have us-ascii as a subset? It seems to be the case so far.
If so, I think the code I wrote can be loosened up a bit -- us-ascii headers/footers can *always* be added to a message, so if the list's language is US English, adding headers and footers should be fine no matter what the charset.
If the list's language is not US English, though, we just simply cannot add a header/footer blindly unless the message explicitly says it's the same charset as the header/footer will be. Any other way lyes madness. :)
BAW> I really don't want to think about PGP right now. Mailcrypt
BAW> w/GnuPG seems to only sign or encrypt the body, and in a
BAW> non-MIME way, so if we wanted to add headers and footers it
BAW> seems like we'd be safe by wrapping the original body in
BAW> multipart/mixed chrome. Of course you'd have to unpack the
BAW> parts to verify (read) the signed (encrypted) part. Oh well,
BAW> there's not really much more you /can/ do.
Well, remember that old-style PGP bodies will have the whole:
===BEGIN PGP SIGNED MESSAGE=== blah blah ===BEGIN PGP SIGNATURE=== abcdef123456 ===END PGP SIGNED MESSAGE===
thing going, so we can safely add headers/footers to these kinds of messages, as they will be outside these ===VERY LOUD AREAS===.
I don't know how it works for S/MIME, frankly.
Ben
-- Brought to you by the letters T and G and the number 17. "Ooh, don't touch him, HE'S got the wall sconses." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
"Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> Can we safely assume that all charsets folks will use with
Ben> Mailman will have us-ascii as a subset? It seems to be the
Ben> case so far.
Depends on how you define "subset". UTF-16 is what I have in mind.
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Don't ask how you can "do" free software business; ask what your business can "do for" free software.
"Stephen" == Stephen J Turnbull <stephen@xemacs.org> writes: "Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> Can we safely assume that all charsets folks will use with
Ben> Mailman will have us-ascii as a subset? It seems to be the
Ben> case so far.
Stephen> Depends on how you define "subset". UTF-16 is what I
Stephen> have in mind.
Mein Gott. If people start sending mail in UTF-16.. wow.
You gotta love the idea of architecture-dependant email.
Ben
-- Brought to you by the letters V and K and the number 9. "Ha ha! I have evaded you with the aid of these pasty white mints!" Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
"Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> Mein Gott. If people start sending mail in UTF-16.. wow.
Outhouse Abcess. <wink, wink, nudge, nudge> Say no more, eh?
And yes, I have gotten such. (A _very_ broken beta of Outhouse for Lose2k beta was responsible: the HTML tags were in ASCII ... surely I repeat myself? or aren't you reading TLUG these days, once-and-future listmaster, sir?)
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Don't ask how you can "do" free software business; ask what your business can "do for" free software.
Thread-relevant<wink>, so answered here.
"BAW" == Barry A Warsaw <barry@zope.com> writes:
BAW> (Aside to Stephen: do you know if Kyle still handles VM bug
BAW> reports these days? ;).
That depends. If you suggest providing support for correspondence with non-RFC-compliant MUAs, you get Belch.au as the bounce message. Otherwise, you get an answer, usually a fix.<wink>
<OT> Tit-for-tat: what's the favored Pythonic wiki (software) these days? Preferably with full support for Japanese, if not, I'll add it. :-þ </OT>
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Don't ask how you can "do" free software business; ask what your business can "do for" free software.
Hello Barry,
Here's my RUR .02:
On Thu, Mar 07, 2002 at 12:38:33AM -0500, Barry A. Warsaw wrote:
Ben> This would violate RFC 1522: SJT> That's right. People with broken mailers have broken SJT> mailers. Make sure that things are robust for those with SJT> decent software, and then do what we can for the former poor SJT> souls.
Totally agreed. I mean, look at me, a "dinosaur" who uses a MIME-aware MUA in a system that was never originally designed to support the stuff you get in email these days. And it's mostly bug free <wink>. (Aside to Stephen: do you know if Kyle still handles VM bug reports these days? ;).
The only hope we have of interoperating is to support the standards, or at least not willfully break them <Hippocratic oath wink>. Which means if the charsets don't match, we can't simply tack on headers and footers.
This steps on my pet peeve with Mailman: Does this matching regard Content-Transfer-Encoding as well? Tacking on text strings to a base64 text/plain body is a recipe for disaster, and such things happen, believe it or not.
So we either don't add them or we add some multipart/mixed chrome and do it in a MIME-compliant way.
Continuing the Hippocratic theme, I'd suggest a rule: don't meddle if it could hamper someone's reading capabilities. In this case, don't make multipart/mixed embellishments unless it IS multipart/mixed already. All other conversions would break some client's subtle neck or make things look uglier. God forbid messing with multipart/alternative or multipart/signed. It's only bulk informational add-ons, why shove it down everyone's throat? For the same reason, I would object things like recoding to and fro base64 to modify content. Above all, that would put an unnecessary load on the mail processor.
I really don't want to think about PGP right now. Mailcrypt w/GnuPG seems to only sign or encrypt the body, and in a non-MIME way, so if we wanted to add headers and footers it seems like we'd be safe by wrapping the original body in multipart/mixed chrome. Of course you'd have to unpack the parts to verify (read) the signed (encrypted) part. Oh well, there's not really much more you /can/ do.
I second the opinion that for the MUAs that use "magic" PGP tags in plain/text bodies, it would be safe to add text above and below.
-- Stay tuned, MhZ JID: mookid@jabber.org
In the misfortune of our friends we find something that is not displeasing to us. -- La Rochefoucauld, "Maxims"
"MZ" == Mikhail Zabaluev <mhz@alt-linux.org> writes:
MZ> This steps on my pet peeve with Mailman: Does this matching MZ> regard Content-Transfer-Encoding as well? Tacking on text MZ> strings to a base64 text/plain body is a recipe for disaster, MZ> and such things happen, believe it or not. It doesn't, but that's a good point, and I think, fairly easy to add (see attached). MZ> Continuing the Hippocratic theme, I'd suggest a rule: don't MZ> meddle if it could hamper someone's reading capabilities. In MZ> this case, don't make multipart/mixed embellishments unless it MZ> IS multipart/mixed already. This is the case, currently. It means that if the message doesn't meet the plain/text criteria (+charset, +cte), and it's not already a multipart/mixed, the header and footer are not added. -Barry -------------------- snip snip -------------------- Index: Decorate.py =================================================================== RCS file: /cvsroot/mailman/mailman/Mailman/Handlers/Decorate.py,v retrieving revision 2.13 diff -u -r2.13 Decorate.py --- Decorate.py 12 Mar 2002 00:49:40 -0000 2.13 +++ Decorate.py 12 Mar 2002 20:29:12 -0000 @@ -72,6 +72,7 @@ # BAW: If the charsets don't match, should we add the header and footer by # MIME multipart chroming the message? if not msg.is_multipart() and msgtype == 'text/plain' and \ + not msg.get('content-transfer-encoding').lower() == 'base64' and \ (lcset == 'us-ascii' or mcset == lcset): payload = header + msg.get_payload() + footer msg.set_payload(payload)
Hello Barry, On Tue, Mar 12, 2002 at 03:29:51PM -0500, Barry A. Warsaw wrote:
Index: Decorate.py =================================================================== RCS file: /cvsroot/mailman/mailman/Mailman/Handlers/Decorate.py,v retrieving revision 2.13 diff -u -r2.13 Decorate.py --- Decorate.py 12 Mar 2002 00:49:40 -0000 2.13 +++ Decorate.py 12 Mar 2002 20:29:12 -0000 @@ -72,6 +72,7 @@ # BAW: If the charsets don't match, should we add the header and footer by # MIME multipart chroming the message? if not msg.is_multipart() and msgtype == 'text/plain' and \ + not msg.get('content-transfer-encoding').lower() == 'base64' and \ (lcset == 'us-ascii' or mcset == lcset): payload = header + msg.get_payload() + footer msg.set_payload(payload)
That (lcset == 'us-ascii') alternative assumes that the body charset is always ASCII-friendly. Close, but not exactly true. Additionally, I would handle quoted-printable by encoding the header and the footer into it. -- Stay tuned, MhZ JID: mookid@jabber.org ___________ To be great is to be misunderstood. -- Ralph Waldo Emerson
"Mikhail" == Mikhail Zabaluev <mhz@alt-linux.org> writes:
Mikhail> That (lcset == 'us-ascii') alternative assumes that the
Mikhail> body charset is always ASCII-friendly. Close, but not
Mikhail> exactly true. Additionally, I would handle
Mikhail> quoted-printable by encoding the header and the footer
Mikhail> into it.
I brought this point up a week or so ago; can you come up with any examples of charsets besides UTF-16 that Mailman supports that are not strict supersets of 7-bit us-ascii?
Ben
-- Brought to you by the letters Q and G and the number 2. "Ha ha! I have evaded you with the aid of these pasty white mints!" Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
Hello Ben,
On Wed, Mar 13, 2002 at 08:03:34AM +0900, Ben Gertzfield wrote:
"Mikhail" == Mikhail Zabaluev <mhz@alt-linux.org> writes:
Mikhail> That (lcset == 'us-ascii') alternative assumes that the Mikhail> body charset is always ASCII-friendly. Close, but not Mikhail> exactly true. Additionally, I would handle Mikhail> quoted-printable by encoding the header and the footer Mikhail> into it.
I brought this point up a week or so ago; can you come up with any examples of charsets besides UTF-16 that Mailman supports that are not strict supersets of 7-bit us-ascii?
Umm... UCS-4? :) Do you imply that these two could be left out as special cases?
-- Stay tuned, MhZ JID: mookid@jabber.org
manic-depressive, adj.: Easy glum, easy glow.
"Mikhail" == Mikhail Zabaluev <mhz@alt-linux.org> writes:
Ben> I brought this point up a week or so ago; can you come up
Ben> with any examples of charsets besides UTF-16 that Mailman
Ben> supports that are not strict supersets of 7-bit us-ascii?
Mikhail> Umm... UCS-4? :) Do you imply that these two could be
Mikhail> left out as special cases?
I think if people are sending architecture-specific email, we have bigger problems, so yes. :)
Ben
-- Brought to you by the letters O and Z and the number 11. "Bill Gates is a talented evil man." Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
Hello Ben,
On Wed, Mar 13, 2002 at 12:33:35PM +0900, Ben Gertzfield wrote:
"Mikhail" == Mikhail Zabaluev <mhz@alt-linux.org> writes:
Ben> I brought this point up a week or so ago; can you come up Ben> with any examples of charsets besides UTF-16 that Mailman Ben> supports that are not strict supersets of 7-bit us-ascii? Mikhail> Umm... UCS-4? :) Do you imply that these two could be Mikhail> left out as special cases?
I think if people are sending architecture-specific email, we have bigger problems, so yes. :)
UTF-16 architecture-specific? Whatever happened to BOM characters?
-- Stay tuned, MhZ JID: mookid@jabber.org
If you think the system is working, ask someone who's waiting for a prompt.
"Mikhail" == Mikhail Zabaluev <mhz@alt-linux.org> writes:
Mikhail> UTF-16 architecture-specific? Whatever happened to BOM
Mikhail> characters?
Yes, there is the BOM to flag which endianness you're using. I'm mostly being facetious, though, because I don't think MUAs aside from broken beta versions of Outlook Express would even dare to use UTF-16 in real email for the real world.
If we run into any, then we can rethink adding the footer in us-ascii, but for now I think it's acceptable to keep people from saying "where have my footers gone?"
Currently we're always adding the footer anyway, so this is a step in the right direction.
Ben
-- Brought to you by the letters N and Y and the number 1. "I choose YOU! Pikachu!" Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/
On Tue, Mar 05, 2002 at 11:22:27AM +0900, Ben Gertzfield wrote:
That would work in the best of all possible worlds (i.e. NOT the real world :) but I think there is still information lost when converting some charsets into Unicode, like Big5 and EUC-TW. Also, who knows if we would corrupt PGP signatures by doing something like this?
I'm pretty sure that the answers there are a) I do and b) we would.
PGP almost certainly won't tolerate rewriting of the body, in much the same way that IPsec won't tolerate NAT rewrites on the channel.
Cheers, -- jra
Jay R. Ashworth jra@baylink.com Member of the Technical Staff Baylink RFC 2100 The Suncoast Freenet The Things I Think Tampa Bay, Florida http://baylink.pitas.com +1 727 647 1274
"If you don't have a dream; how're you gonna have a dream come true?" -- Captain Sensible, The Damned (from South Pacific's "Happy Talk")
participants (7)
-
barry@zope.com
-
Ben Gertzfield
-
Dan Mick
-
Jay R. Ashworth
-
Mikhail Zabaluev
-
Stephen J. Turnbull
-
Tokio Kikuchi