[Mailman-Developers] [ mailman-Patches-634303 ] Recode pipermail templates

Mon Nov 18 23:25:18 2002

Patches item #634303, was opened at 2002-11-06 09:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300103&aid=634303&group_id=103

Category: Pipermail
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: Recode pipermail templates

Initial Comment:
This patch transparently recodes the pipermail
templates and messages if the article and index files
have a different encoding than the template language If
necessary, HTML character references are generate. If
recoding fails, it will proceed without exception, but
produce garbage.

For messages, this uses a method self._, which first
invokes _, then does the recoding. This makes it
necessary for _ to go back more than one frame, so the
number of frames is now an optional parameter
(defaulting to 1).

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-11-19 00:25

Message:
Logged In: YES 
user_id=21627

My concern with this approach is that we might not have a
codec for the encoding of the message. That means we cannot
decode the message to unicode, which means we cannot htmlify
unsupported characters. OTOH, we do have codecs for the
preferred encodings of all supported languages.
[Sorry I didn't think of this issue when we were talking
about it recently]

If we consider this case unlikely to occur in real life, and
are prepared to produce some error message in the page if it
does happen, then the approach is fine.

As for PEP293: Yes, this is precisely the intention: it
simplifies generating html-escaped strings, and it is more
efficient than any other strategy (since the codec will call
the error handling only when an encoding error occurs).

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2002-11-18 21:46

Message:
Logged In: YES 
user_id=12800

Feel free to shoot holes in this, but I think it might not
be simpler to just get rid of the multiple _charsets stuff
in HyperArch.HyperArchive.  Say we always encode the indices
pages with the character set of the list's preferred
language, except that we also html-ify any bogus characters
outside that charset.  Seems to me we can chop a bunch of
code and still get the results we want, even if at the
expense of potentially bigger pages (which I don't care about).

I'm going to commit some changes and test them out on the
playground list.  This seem to work well here for my test lists.

Semi-related: would PEP 293 allow us to get rid of
unicode_quote? 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300103&aid=634303&group_id=103