[Mailman-i18n] Pipermail and non-English lists
Tokio Kikuchi
tkikuchi@is.kochi-u.ac.jp
Fri Nov 22 00:19:17 2002
Hi,
> Why does it need to be part of the global pipeline? I'd imagine its
> only necessary where email messages might get displayed in http, so
> that'd mean Pipermail and the admindb.
Because there may arise many occasions to display or decorate
in the http/mail services, I want to convert the charset only once.
I also use namazu for indexing the archive and it is convenient if
charset is unique and not containing unicode.
> Most languages aren't going to need alternative charsets, right? I'm
> guessing Japanese and perhaps other Asian languages (but I don't know
> for sure). What would be the value of altcharsets for Japanese?
Since we set 'euc-jp' as the standard charset for japanese, remainings
are 'iso-2022-jp', 'shift_jis' and 'cp932'.
Here is my version of Entry.py which is going to be part of a japanese
patch I am going to release. I need to write Exit.py (or something)
because 'iso-2022-jp' is the de-facto standard for japanese mail.
===========================================
from Mailman import Message, mm_cfg
from email.Header import decode_header
import re
_ = i18n._
def get_header_decoded(h):
# decode mime header AND convert it to standard charset if
# alternate charset exists.
h = decode_header(h)
hs = ''
spc = ''
for (s, c) in h:
if c and c in altcharsets:
s = unicode(s, c, 'replace').encode(stdcharset)
hs = hs + spc + s
if c == None or c == 'us-ascii':
spc = ' '
else:
spc = ''
return hs
global stdcharset, altcharsets
lang = mlist.preferred_language
stdcharset = mm_cfg.LC_DESCRIPTIONS[lang][1]
try:
altcharsets = mm_cfg.LC_DESCRIPTIONS[lang][2]
except:
altcharsets = None
#
hs = get_header_decoded(msg.get('subject', _('no subject')))
msgdata['subject'] = hs
del msg['subject']
msg['Subject'] = hs
#
hs = get_header_decoded(msg['from'])
msgdata['from'] = hs
msg['X-MMOriginal-From'] = msg['from']
del msg['from']
msg['From'] = hs
#
if altcharsets:
for part in msg.walk():
ctype = part.get_type()
m = re.search(r'charset=["\']?([\w_-]+)',
msg['content-type'], re.I)
if m:
charset = m.group(1).lower()
else:
charset = 'us-ascii'
# charset = part.get_charset()
if charset in altcharsets:
u = unicode(part.get_payload(decode=1), charset, 'replace')
part.set_payload(u.encode(stdcharset))
# set_charset cannot be used here for it may automatically
# convert to mail-standard charset.
del part['content-type']
part['Content-Type'] = 'text/plain; charset=%s' % stdcharset
=================================================
With this Entry.py, you only have to check 'no subject' once.
I am going to upload this on SF when the exit part was done. It may be
japanese specific but I will try it to be i18n as much as possible.
--
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp
http://weather.is.kochi-u.ac.jp/
More information about the Mailman-i18n
mailing list