Hi! I'm sending this here because it is language related, sorry if this is not the list to post it. First I see that for the Spanish language we are having iso-8859-1 instead of iso-8859-15 which would allow us to use the euro symbol and all that. Second, I see an anoyance when running lists in a country that different language are used, or even with certain mailers alone, the problem is this: Suppose I have a list which default language is using charset iso-8859-1, then somebody comes and posts in iso-8859-15 and you get it all in three mime parts, one for the header, in iso-8859-1, one for the body in iso-8859-15 and then the footer in iso-8859-1 again. This seems quite anoying, at least when yo look at it. I was wondering if there is, or if it would make sense to add, something so that this doesn't happen. Also I see something that is for sure doable, and it is that when the same person writes with some mailers like mutt and doesn't use any 8 bits character, then the mailer would encode the text as us-ascii, which if I'm not wrong, should mix ok with iso-8859-1, I mean that the hole message could come as one using iso-8859-1, and that is not done right now. Well, just a couple of ideas, that's all. Regards... -- Manty/BestiaTester -> http://manty.net
Santiago Garcia Mantinan <mailman-i18n@manty.net> writes:
First I see that for the Spanish language we are having iso-8859-1 instead of iso-8859-15 which would allow us to use the euro symbol and all that.
That should not cause any annoyance, as none of the mailman messages ever uses a currency symbol. Users can happily sent email messages in any encoding they like.
Suppose I have a list which default language is using charset iso-8859-1, then somebody comes and posts in iso-8859-15 and you get it all in three mime parts, one for the header, in iso-8859-1, one for the body in iso-8859-15 and then the footer in iso-8859-1 again. This seems quite anoying, at least when yo look at it.
Why is this annoying? All three parts are plain text, so a capable mail reader should be able to render it all in a single message.
I was wondering if there is, or if it would make sense to add, something so that this doesn't happen.
Whether it would make sense, I don't know, but it would be possible. Of course, some people actually prefer to have the messages in three parts, since it allows users to retrieve the original body of the message. If you want to add it, please look at Mailman/Handlers/Decorate.py. The test if not msg.is_multipart() and msgtype == 'text/plain' and \ msg.get('content-transfer-encoding', '').lower() <> 'base64' and \ (lcset == 'us-ascii' or mcset == lcset): is the one that allows direct concatenation of the header. You could extend this to recode the header into the message charset.
Also I see something that is for sure doable, and it is that when the same person writes with some mailers like mutt and doesn't use any 8 bits character, then the mailer would encode the text as us-ascii, which if I'm not wrong, should mix ok with iso-8859-1, I mean that the hole message could come as one using iso-8859-1, and that is not done right now.
If the mailer already choses us-ascii as the body encoding, then mailman will do exactly that. However, it seems you assume a case where the body is in iso-8859-15, yet uses only ASCII characters. I don't think mailman should recode the body even in that case. Instead, recoding the headers to the message's encoding should work fine, again. Regards, Martin
On Oct 29, 2003, at 2:27 PM, Martin v. Löwis wrote:
Why is this annoying? All three parts are plain text, so a capable mail reader should be able to render it all in a single message.
Should be, yes -- but it turns out only Mozilla and a few other mail readers actually do the right thing and render the plain text attachments inline. Almost all other mail readers render the separate header and footer as "attachments", which makes them useless. Ben
On Wed, 2003-10-29 at 17:27, Martin v. Löwis wrote:
If you want to add it, please look at Mailman/Handlers/Decorate.py. The test
if not msg.is_multipart() and msgtype == 'text/plain' and \ msg.get('content-transfer-encoding', '').lower() <> 'base64' and \ (lcset == 'us-ascii' or mcset == lcset):
is the one that allows direct concatenation of the header. You could extend this to recode the header into the message charset.
Are Latin 1 and Latin 9 essentially compatible? Would something like the following work: def compatible(mcset, lcset): # If the list's preferred charset is us-ascii, we can always safely add # the header/footer to a plain text message since all email charsets # Mailman supports are strict supersets of us-ascii -- no, UTF-16 emails # are not supported. if lcset == 'us-ascii': return True if mcset == lcset: return True # Latin 1 and Latin 9 are basically compatible if lcset in LATIN_1_15 and mcset in LATIN_1_15: return True # For now, nothing else is compatible. return False ...and then replace the last conditional in the above with: compatible(mcset, lcset) ? -Barry
On Sunday 30 November 2003 23:04, Barry Warsaw wrote:
Are Latin 1 and Latin 9 essentially compatible? Would something like the following work:
latin1 (iso-8859-1) and latin9 (iso-8859-15) differ in at least 8 codepoints, not just in the euro sign: --- latin1 2003-11-30 23:35:06.485411504 +0100 +++ latin9 2003-11-30 23:35:11.354671264 +0100 @@ -52,34 +53,34 @@ 161 INVERTED EXCLAMATION MARK 162 CENT SIGN 163 POUND SIGN -164 CURRENCY SIGN +164 EURO SIGN 165 YEN SIGN -166 BROKEN BAR +166 LATIN CAPITAL LETTER S WITH CARON 167 SECTION SIGN -168 DIAERESIS +168 LATIN SMALL LETTER S WITH CARON 169 COPYRIGHT SIGN 170 FEMININE ORDINAL INDICATOR 171 LEFT-POINTING DOUBLE ANGLE QUOTATION MARK 172 NOT SIGN 173 SOFT HYPHEN 174 REGISTERED SIGN 175 MACRON 176 DEGREE SIGN 177 PLUS-MINUS SIGN 178 SUPERSCRIPT TWO 179 SUPERSCRIPT THREE -180 ACUTE ACCENT +180 LATIN CAPITAL LETTER Z WITH CARON 181 MICRO SIGN 182 PILCROW SIGN 183 MIDDLE DOT -184 CEDILLA +184 LATIN SMALL LETTER Z WITH CARON 185 SUPERSCRIPT ONE 186 MASCULINE ORDINAL INDICATOR 187 RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK -188 VULGAR FRACTION ONE QUARTER -189 VULGAR FRACTION ONE HALF -190 VULGAR FRACTION THREE QUARTERS +188 LATIN CAPITAL LIGATURE OE +189 LATIN SMALL LIGATURE OE +190 LATIN CAPITAL LETTER Y WITH DIAERESIS 191 INVERTED QUESTION MARK 192 LATIN CAPITAL LETTER A WITH GRAVE 193 LATIN CAPITAL LETTER A WITH ACUTE -- Adde parvum parvo magnus acervus erit -- Ovidio
On Mon, 2003-12-01 at 02:15, Martin v. Löwis wrote:
Barry Warsaw <barry@python.org> writes:
Are Latin 1 and Latin 9 essentially compatible? Would something like the following work:
It would be too dangerous. It *would* help if you could try to perform the recoding of the actual data.
Which would be too much for a patch release I think. Okay, thanks all, I'll punt on this one for 2.1.4. -Barry
Hi, Folks. Barry Warsaw wrote:
On Mon, 2003-12-01 at 02:15, Martin v. Löwis wrote:
Barry Warsaw <barry@python.org> writes:
Are Latin 1 and Latin 9 essentially compatible? Would something like the following work:
It would be too dangerous. It *would* help if you could try to perform the recoding of the actual data.
Which would be too much for a patch release I think. Okay, thanks all, I'll punt on this one for 2.1.4.
Do you remember this? I have uploaded a patch to improve the i18n feature of mailman at http://sourceforge.net/tracker/index.php?func=detail&aid=865661&group_id=103&atid=300103 This includes a patch for Decorate.py to do the best to keep plain text message as not being multiparted. The strategy is; 1. First convert the header, body and footer into unicode. 2. Then try to encode it with the list charset. 3. If it fails, the message body may have contained characters like euro sign. So, try to encode it with the message charset. 4. If both fails, fall back to the multipart message. If you keep the footer only in us-ascii for French or German list, you will always get non-multipart message even if the users post with euro sign. Cheers, -- Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/
Santiago Garcia Mantinan wrote:
Suppose I have a list which default language is using charset iso-8859-1, then somebody comes and posts in iso-8859-15 and you get it all in three mime parts, one for the header, in iso-8859-1, one for the body in iso-8859-15 and then the footer in iso-8859-1 again.
This is a big problem, one I didn't address with my initial charset support implementation for Mailman. I'll add this functionality as soon as I can, hopefully tonight or tomorrow. Sorry for all the inconvenience. Basically, we need to have Mailman's Charset.py treat iso-8859-1 exactly the same as iso-8859-15 as a special case. Ben
participants (6)
-
Barry Warsaw
-
Ben Gertzfield
-
martin@v.loewis.de
-
Santiago Garcia Mantinan
-
Simone Piunno
-
Tokio Kikuchi