[Mailman-Developers] [PATCH] Header q-p/base64 RFC 2047 encoding for email module
Ben Gertzfield
che@debian.org
Wed, 14 Nov 2001 19:21:43 +0900
The following patch to the email module implements the RFC
2047-specified Base64 and quoted-printable (called "B" and "Q"
encoding by the RFC) for header-safe encoding of 8-bit strings, for
From:, To:, Subject:, and other fields.
It includes charset information within the encoded strings themselves,
which, along with the special line-wrapping algorithm needed for B and
Q encoding, make this a very useful general feature for
internationalized Python email programs.
Most MIME-aware mail readers in use today understand the RFC 2047
convention, and in the East Asian world, it's 100% necessary to send
subject and address fields in Base64 encoding.
Mailman needs this functionality in order to send out localized
emails from the virgin queue; without it, it's very possible that
8-bit characters will be blindly placed into the Subject: and To:
fields. This also allows localized List-Id fields, as a bonus!
This patch adds the following functions to email.Utils:
encode_address(real_name, address, charset="iso-8859-1", encoding=QP):
MIME-encode a header field intended for an address (from, to, cc, etc.)
encode_header(header, charset="iso-8859-1", encoding=QP):
MIME-encode a general email header field (eg. Subject).
encode_header_chunks(header_chunks):
MIME-encode a header with many different charsets and/or encodings.
It also adds the following support functions to email.Encoders.
header_qencode(header, charset="iso-8859-1", maxlinelen=75):
Encode a header line with quoted-printable (like) encoding.
header_bencode(header, charset, maxlinelen=75):
Encode a header line with Base64 encoding and a charset specification.
I needed to re-implement the quoted-printable algorithm in
header_qincode because the "Q" encoding specified by RFC 2045 is
different in a few key areas from the one implemented in quopri.py,
and the line-wrapping at 75 characters got too hairy with just
quopri.py.
Patch follows, against email 0.95. (Sorry, I tried CVS, but I didn't
want to install Python 2.2 beta just yet.)
I will work on integrating this into Mailman tomorrow.
diff -ruN email.orig/Encoders.py email/Encoders.py
--- email.orig/Encoders.py Tue Oct 2 04:29:38 2001
+++ email/Encoders.py Wed Nov 14 19:07:24 2001
@@ -6,8 +6,10 @@
import base64
import quopri
+from binascii import b2a_base64
from cStringIO import StringIO
+CRLFSPACE = "\015\012 "
# Helpers
@@ -24,6 +26,15 @@
return value[:-1]
return value
+def _max_append(list, str, maxlen):
+ if len(list) == 0:
+ list.append(str)
+ return
+
+ if len(list[-1] + str) < maxlen:
+ list[-1] += str
+ else:
+ list.append(str)
def _bencode(s):
# We can't quite use base64.encodestring() since it tacks on a "courtesy
@@ -78,3 +89,91 @@
def encode_noop(msg):
"""Do nothing."""
+
+
+def header_qencode(header, charset="iso-8859-1", maxlinelen=75):
+ """Encode a header line with quoted-printable (like) encoding.
+
+ Defined in RFC 2045, this "Q" encoding is similar to
+ quoted-printable, but used specifically for email header fields to
+ allow charsets with mostly 7 bit characters (and some 8 bit) to
+ remain more or less readable in non-RFC 2045 aware mail clients.
+
+ The resulting string will be in the form:
+
+ "=?charset?q?I_f=E2rt_in_your_g=E8n=E8ral_dire=E7tion?=\r\n
+ =?charset?q?Silly_=C8nglish_Kn=EEghts?="
+
+ with each line wrapped safely at, at most, maxlinelen characters.
+ It is safe to use verbatim in any email header field, as the
+ wrapping is performed in a quoted-printable aware way and each
+ linefeed is a \r\n.
+
+ charset defaults to "iso-8859-1", and maxlinelen defaults to 75
+ characters.
+ """
+ quoted = []
+
+ # =? plus ?q? plus ?= is 7 characters
+ maxlen = maxlinelen - len(charset) - 7
+
+ for c in header:
+ # Space may be represented as _ instead of =20 for readability
+ if c == ' ':
+ _max_append(quoted, "_", maxlen)
+ # These characters can be included verbatim
+ elif ((c >= 'a' and c <= 'z') or (c >= 'A' and c <= 'Z') or
+ (c >= '0' and c <= '9') or (c in ('!', '*', '+', '-', '/'))):
+ _max_append(quoted, c, maxlen)
+ # Otherwise, replace with hex value like =E2
+ else:
+ _max_append(quoted, "=%02X" % (ord(c)), maxlen)
+
+ encoded = ""
+
+ for q in quoted:
+ # Any chunk past the fir7st must start with "\r\n "
+ if len(encoded) > 0:
+ encoded += CRLFSPACE
+ encoded += "=?%s?q?%s?=" % (charset, q)
+
+ return encoded
+
+def header_bencode(header, charset, maxlinelen=75):
+ """Encode a header line with Base64 encoding and a charset specification.
+
+ Defined in RFC 2045, this Base64 encoding is identical to normal
+ Base64 encoding, except that each line must be intelligently
+ wrapped (respecting the Base64 encoding), and subsequent lines must
+ start with a space.
+
+ The resulting string will be in the form:
+
+ "=?charset?b?WW/5ciBtYXp66XLrIHf8eiBhIGhhbXBzdGHuciBBIFlv+XIgbWF6euly?=\r\n
+ =?charset?b?6yB3/HogYSBoYW1wc3Rh7nIgQkMgWW/5ciBtYXp66XLrIHf8eiBhIGhh?="
+
+ with each line wrapped at, at most, maxlinelen characters. It is
+ safe to use verbatim in any email header field, as the wrapping is
+ performed in a quoted-printable aware way and each linefeed is a
+ \r\n.
+
+ charset defaults to "iso-8859-1", and maxlinelen defaults to 75
+ characters.
+ """
+ base64ed = []
+
+ maxlen = ((maxlinelen - len(charset) - 7) / 4) * 3
+ num_lines = (len(header) / maxlen) + 1
+
+ for i in xrange(0, num_lines):
+ base64ed.append(b2a_base64(header[i*maxlen:(i+1)*maxlen]))
+
+ encoded = ""
+
+ for b in base64ed:
+ if len(encoded) > 0:
+ encoded += CRLFSPACE
+ # We ignore the last character of each line, which is a \n.
+ encoded += "=?%s?b?%s?=" % (charset, b[:-1])
+
+ return encoded
diff -ruN email.orig/Utils.py email/Utils.py
--- email.orig/Utils.py Sat Nov 10 02:07:44 2001
+++ email/Utils.py Wed Nov 14 19:16:00 2001
@@ -17,11 +17,16 @@
import base64
# Intrapackage imports
-from Encoders import _bencode, _qencode
+from Encoders import _bencode, _qencode, header_qencode, header_bencode
COMMASPACE = ', '
UEMPTYSTRING = u''
+CRLFSPACE = "\015\012 "
+
+# Flags for types of header encodings
+QP = 1 # Quoted-Printable
+BASE64 = 2 # Base64
# Helpers
@@ -56,6 +61,16 @@
return value[:-1]
return value
+def _chunk_append(chunks, header, goodlinelen=75):
+ if len(chunks) == 0:
+ chunks.append(header)
+ return
+
+ for chunk in header.split(CRLFSPACE):
+ if len(chunks[-1] + chunk) < goodlinelen:
+ chunks[-1] += " " + chunk
+ else:
+ chunks.append(chunk)
def getaddresses(fieldvalues):
@@ -156,3 +171,90 @@
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'][now[1] - 1],
now[0], now[3], now[4], now[5],
zone)
+
+def encode_address(real_name, address, charset="iso-8859-1", encoding=QP):
+ """MIME-encode a header field intended for an address (from, to, cc, etc.)
+
+ Given an 8-bit string containing a real name, an email address,
+ and optionally the real name's character set, and the encoding
+ you wish to use with it, return a 7-bit MIME-encoded string
+ suitable for use in a From, To, Cc, or other email header
+ field.
+
+ The encoding can be email.Utils.QP (quoted-printable, for
+ ASCII-like character sets like iso-8859-1), email.Utils.BASE64
+ (Base64, for non-ASCII like character sets like KOI8-R and
+ iso-2022-jp), or None (no encoding).
+
+ The charset defaults to "iso-8859-1", and the encoding defaults
+ to email.Utils.QP.
+
+ The resulting string will be in the format:
+
+ "=?charset?q?Kevin_Phillips_B=F6ng?= <philips@slightly.silly.party.go.uk>"
+
+ and can be included verbatim in an email header field. Even
+ very long addresses are handled properly with this method:
+
+ "=?charset?q?T=E4rquin_Fintimlinbinhinbimlim_Bus_St=F6p_Poontang_Poont?=\r\n
+ =?charset?q?ang_Ol=E9_Biscuit-Barrel?=\r\n
+ <tarquin@very.silly.party.go.uk>"
+ """
+
+ return encode_header_chunks([ [real_name, charset, encoding],
+ ["<%s>" % address, None, None] ])
+
+def encode_header(header, charset="iso-8859-1", encoding=QP):
+ """MIME-encode a general email header field (eg. Subject).
+
+ Given an 8-bit header string, and optionally its charset and the
+ encoding you wish to use, return a 7-bit MIME-encoded string
+ suitable for use in a general email header (but most useful for
+ the Subject: line).
+
+ The encoding can be email.Utils.QP (quoted-printable, for
+ ASCII-like character sets like iso-8859-1), email.Utils.BASE64
+ (Base64, for non-ASCII like character sets like KOI8-R and
+ iso-2022-jp), or None (no encoding).
+
+ The charset defaults to "iso-8859-1", and the encoding defaults
+ to email.Utils.QP.
+ """
+ return encode_header_chunks([[header, charset, encoding]])
+
+def encode_header_chunks(header_chunks):
+ """MIME-encode a header with many different charsets and/or encodings.
+
+ Given a list of triplets [ [string, charset, encoding] ], return a
+ MIME-encoded string suitable for use in a header field. Each triplet
+ may have different charsets and/or encodings, and the resulting header
+ will accurately reflect each setting.
+
+ Each encoding can be email.Utils.QP (quoted-printable, for
+ ASCII-like character sets like iso-8859-1), email.Utils.BASE64
+ (Base64, for non-ASCII like character sets like KOI8-R and
+ iso-2022-jp), or None (no encoding).
+
+ Each triplet will be represented on a separate line; the resulting
+ string will be in the format:
+
+ "=?charset1?q?Mar=EDa_Gonz=E1lez_Alonso?=\r\n
+ =?charset2?b?SvxyZ2VuIEL2aW5n?="
+ """
+ chunks = []
+
+ for header, charset, encoding in header_chunks:
+ encoded = ""
+ encoding_char = ""
+
+ if encoding is None:
+ _chunk_append(chunks, header)
+ else:
+ if encoding is QP:
+ _chunk_append(chunks, header_qencode(header, charset))
+
+ elif encoding is BASE64:
+ _chunk_append(chunks, header_bencode(header, charset))
+
+ return CRLFSPACE.join(chunks)
+
--
Brought to you by the letters A and H and the number 10.
"Wuzzle means to mix."
Debian GNU/Linux maintainer of Gimp and Nethack -- http://www.debian.org/