[Mailman-i18n] Subject lines in Archives

Ben Gertzfield che@debian.org
Tue, 02 Apr 2002 21:58:39 +0900

    Ben> Yes, I agree.  We just don't want to create files with
    Ben> invalid encodings; mixing encodings in a single HTML file is
    Ben> a recipe for disaster!

    Martin> If that is your concern, then things can remain as they
    Martin> are (or will be, after the patch) - it will just print the
    Martin> mime-encoded subject of the original message. If the
    Martin> original message had non-ASCII text in the subject that
    Martin> was not MIME-encoded, I still think it should be copied
    Martin> as-is to the HTML - proper display will then be the task
    Martin> of the Web browser.

Unfortunately, I have to disagree.  The main problem will come
with any encoding that is modal -- like UTF-8!=20

If we copy random 8-bit non-MIME encoded text (very common these days)
into an HTML page containing UTF-8 text (let's say the majority of
posts were in UTF-8 on this list) then we will not only produce
invalid UTF-8 text, but we could quite possibly shift the user's
terminal into a garbage state from the invalid 8-bit strings, making
further display impossible.

Not everyone views these archives with a GUI web browser that contains
work-arounds for all the invalid encoded text in the world; we need to
be liberal in what we accept, but conservative in what we emit.

I love the idea of using Unicode escapes for all text that we can
convert to Unicode, but any text we can't convert just is not safe to
include verbatim.  Perhaps we should make it an option for those who
really want to include possibly dangerous text directly in the

I know I would prefer a message like "(text with unknown encoding)"
over a garbled Japanese terminal any day.


