[Python-checkins] r57618 - in sandbox/trunk/emailpkg/5_0-exp/email: base64mime.py charset.py generator.py header.py message.py quoprimime.py test/test_email.py utils.py

barry.warsaw python-checkins at python.org
Tue Aug 28 15:42:06 CEST 2007

Author: barry.warsaw
Date: Tue Aug 28 15:42:06 2007
New Revision: 57618

Several more package fixes, reducing the failure rate to 8E/8F.

- Use bytes(s, 'raw-unicode-escape') as the idiom instead of
  bytes(ord(c) for c in s) in the few places where this is still necessary.
  Its use at all certainly doesn't feel ultimately correct.

- Headers now do a much better job of three important things:
  1. Properly handling linear whitespace between RFC 2027 encoded words.
  2. Preserving existing continuation whitespace as per RFC 2822 and only
     using the continuation_ws argument hint when nothing else is available
     (i.e. when wrapping lines full of encoded words).
  3. Completely filling header lines to the maxlinelen, even in the face of
     encoded words, quoted printable, and encoded whitespace.

  Some of the necessary changes caused updates to the test cases as the
  packing algorithm got more accurate.  The packing algorithm does more brute
  force checking now, so it is potentially less efficient, but it's more
  accurate and at the rate and size of encoded headers in the wild, I'm hoping
  the efficiency issues will not be important.

- utils.fix_eols() is gone, as is the manifestation of this function in the
  various other APIs.

- base64mime.base64_len() is renamed .header_length() as is the similar method
  in quoprimime.  Similarly, the header encoding method in both modules is now
  called .header_encode().  Other than the large benefit in making these
  consistent, it also helps with several internal implementations by letting
  us use the modules as quasi-interfaces.  Also the aliases for .encode() and
  .body_encode() are removed; only the latter is used now.

- header_encode() no longer deals in lines.  \r and \n get encoded just like
  anything else.  If you want to split lines you must do so above the call to
  header_encode(), e.g. use a Header instance.  This makes the function much
  simpler, and thus more useable and appropriate as a lower-level function.
  Likewise, the maxlinelen and eol arguments are gone.

- Along the same lines, header_quopri_check() and body_quopri_check() are
  renamed header_check() and body_check().

- Charset.to_splittable() and Charset.from_splittable() are also both
  removed.  Because we're encoding unicodes, we know they're splittable so we
  don't need these.  It makes the algorithm for fitting encoded words into
  header lengths a little more complicated, but we can brute force that for
  accurate albeit slower operation.  The previous algorithm was incorrect in
  several respects, so many of the test cases were cosmetically changed.
  Charset.convert() is also no longer necessary.  Charset does grow a
  header_encode_lines() method which splits and header-encodes a string.

- Header.encode() now has a maxlinelen argument.  While the Header constructor
  has a similar argument, it's needed here because of the way that the
  Generator calls .encode() and passes its own maximum header line length to
  the.encode() method (Generators don't create Header instances so it can't
  pass them in then).  The Generator never calls str() on the Header because
  of the change needed to consolidate Header.__str__() and

- I realized the previous 'splitting' code is only appropriate for ASCII
  headers so rename the method _ascii_split() and refactor.  The _Accumulator
  no longer needs its transform function, but it does need an is_onlyws()
  method to tell us whether the current line is empty or contains only
  whitespace (in which case, newlines should be suppressed).

- Make the quopri header and body algorithm more efficient by using a

There's still a problem with splitting at the highest syntactic break (as per
RFC 2822) for ascii splits, because this cannot be accurately modeled with
just a sequence of split characters.  I.e. for some headers you want to split
on space then comma and for other headers its comma then space. :(

Modified: sandbox/trunk/emailpkg/5_0-exp/email/base64mime.py
--- sandbox/trunk/emailpkg/5_0-exp/email/base64mime.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/base64mime.py	Tue Aug 28 15:42:06 2007
@@ -25,7 +25,6 @@
 __all__ = [
-    'base64_len',
@@ -33,12 +32,13 @@
+    'header_length',
 import re
+from base64 import b64encode
 from binascii import b2a_base64, a2b_base64
-from email.utils import fix_eols
 CRLF = '\r\n'
 NL = '\n'
@@ -50,11 +50,10 @@
 # Helpers
-def base64_len(s):
+def header_length(bytearray):
     """Return the length of s when it is encoded with base64."""
-    groups_of_3, leftover = divmod(len(s), 3)
+    groups_of_3, leftover = divmod(len(bytearray), 3)
     # 4 bytes out for each 3 bytes (or nonzero fraction thereof) in.
-    # Thanks, Tim!
     n = groups_of_3 * 4
     if leftover:
         n += 4
@@ -62,65 +61,21 @@
-def header_encode(header, charset='iso-8859-1', keep_eols=False,
-                  maxlinelen=76, eol=NL):
+def header_encode(header_bytes, charset='iso-8859-1'):
     """Encode a single header line with Base64 encoding in a given charset.
-    Defined in RFC 2045, this Base64 encoding is identical to normal Base64
-    encoding, except that each line must be intelligently wrapped (respecting
-    the Base64 encoding), and subsequent lines must start with a space.
     charset names the character set to use to encode the header.  It defaults
-    to iso-8859-1.
-    End-of-line characters (\\r, \\n, \\r\\n) will be automatically converted
-    to the canonical email line separator \\r\\n unless the keep_eols
-    parameter is True (the default is False).
-    Each line of the header will be terminated in the value of eol, which
-    defaults to "\\n".  Set this to "\\r\\n" if you are using the result of
-    this function directly in email.
-    The resulting string will be in the form:
-    "=?charset?b?WW/5ciBtYXp66XLrIHf8eiBhIGhhbXBzdGHuciBBIFlv+XIgbWF6euly?=\\n
-      =?charset?b?6yB3/HogYSBoYW1wc3Rh7nIgQkMgWW/5ciBtYXp66XLrIHf8eiBhIGhh?="
-    with each line wrapped at, at most, maxlinelen characters (defaults to 76
-    characters).
+    to iso-8859-1.  Base64 encoding is defined in RFC 2045.
     # Return empty headers unchanged
-    if not header:
-        return header
-    if not keep_eols:
-        header = fix_eols(header)
-    # Base64 encode each line, in encoded chunks no greater than maxlinelen in
-    # length, after the RFC chrome is added in.
-    base64ed = []
-    max_encoded = maxlinelen - len(charset) - MISC_LEN
-    max_unencoded = max_encoded * 3 // 4
-    for i in range(0, len(header), max_unencoded):
-        base64ed.append(b2a_base64(header[i:i+max_unencoded]))
-    # Now add the RFC chrome to each encoded chunk
-    lines = []
-    for line in base64ed:
-        # Ignore the last character of each line if it is a newline
-        if line.endswith(NL):
-            line = line[:-1]
-        # Add the chrome
-        lines.append('=?%s?b?%s?=' % (charset, line))
-    # Glue the lines together and return it.  BAW: should we be able to
-    # specify the leading whitespace in the joiner?
-    joiner = eol + ' '
-    return joiner.join(lines)
+    if not header_bytes:
+        return str(header_bytes)
+    encoded = b64encode(header_bytes)
+    return '=?%s?b?%s?=' % (charset, encoded)
-def encode(s, binary=True, maxlinelen=76, eol=NL):
+def body_encode(s, binary=True, maxlinelen=76, eol=NL):
     """Encode a string with base64.
     Each line will be wrapped at, at most, maxlinelen characters (defaults to
@@ -152,11 +107,6 @@
     return EMPTYSTRING.join(encvec)
-# For convenience and backwards compatibility w/ standard base64 module
-body_encode = encode
-encodestring = encode
 def decode(s, convert_eols=False):
     """Decode a raw base64 string, returning a bytes object.

Modified: sandbox/trunk/emailpkg/5_0-exp/email/charset.py
--- sandbox/trunk/emailpkg/5_0-exp/email/charset.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/charset.py	Tue Aug 28 15:42:06 2007
@@ -9,6 +9,8 @@
+from functools import partial
 import email.base64mime
 import email.quoprimime
@@ -23,9 +25,10 @@
 SHORTEST    = 3 # the shorter of QP and base64, but only for headers
 # In "=?charset?q?hello_world?=", the =?, ?q?, and ?= add up to 7
 DEFAULT_CHARSET = 'us-ascii'
@@ -259,63 +262,6 @@
             return encode_7or8bit
-    def convert(self, s):
-        """Convert a string from the input_codec to the output_codec."""
-        if self.input_codec != self.output_codec:
-            rawbytes = bytes(ord(c) for c in s)
-            decoded = rawbytes.decode(self.input_codec)
-            encoded = decoded.encode(self.output_codec)
-            return str(encoded)
-        else:
-            return s
-    def to_splittable(self, s):
-        """Convert a possibly multibyte string to a safely splittable format.
-        Uses the input_codec to try and convert the string to Unicode, so it
-        can be safely split on character boundaries (even for multibyte
-        characters).
-        Returns the string as-is if it isn't known how to convert it to
-        Unicode with the input_charset.
-        Characters that could not be converted to Unicode will be replaced
-        with the Unicode replacement character U+FFFD.
-        """
-        if isinstance(s, str) or self.input_codec is None:
-            return s
-        try:
-            return str(s, self.input_codec, 'replace')
-        except LookupError:
-            # Input codec not installed on system, so return the original
-            # string unchanged.
-            return s
-    def from_splittable(self, ustr, to_output=True):
-        """Convert a splittable string back into an encoded string.
-        Uses the proper codec to try and convert the string from Unicode back
-        into an encoded format.  Return the string as-is if it is not Unicode,
-        or if it could not be converted from Unicode.
-        Characters that could not be converted from Unicode will be replaced
-        with an appropriate character (usually '?').
-        If to_output is True (the default), uses output_codec to convert to an
-        encoded format.  If to_output is False, uses input_codec.
-        """
-        if to_output:
-            codec = self.output_codec
-        else:
-            codec = self.input_codec
-        if not isinstance(ustr, str) or codec is None:
-            return ustr
-        try:
-            return str(ustr.encode(codec, 'replace'))
-        except LookupError:
-            # Output codec not installed
-            return ustr
     def get_output_charset(self):
         """Return the output character set.
@@ -324,66 +270,114 @@
         return self.output_charset or self.input_charset
-    def encoded_header_len(self, s):
-        """Return the length of the encoded header string."""
-        cset = self.get_output_charset()
-        # The len(s) of a 7bit encoding is len(s)
-        if self.header_encoding == BASE64:
-            return email.base64mime.base64_len(s) + len(cset) + MISC_LEN
-        elif self.header_encoding == QP:
-            return email.quoprimime.header_quopri_len(s) + len(cset) + MISC_LEN
-        elif self.header_encoding == SHORTEST:
-            lenb64 = email.base64mime.base64_len(s)
-            lenqp = email.quoprimime.header_quopri_len(s)
-            return min(lenb64, lenqp) + len(cset) + MISC_LEN
-        else:
-            return len(s)
     def header_encode(self, string):
         """Header-encode a string by converting it first to bytes.
-        :param string: A unicode string for the header.  This must be
-        encodable to bytes using the current character set's `output_codec`.
         The type of encoding (base64 or quoted-printable) will be based on
         this charset's `header_encoding`.
+        :param string: A unicode string for the header.  It must be possible
+            to encode this string to bytes using the character set's
+            output codec.
+        :return: The encoded string, with RFC 2047 chrome.
         codec = self.output_codec or 'us-ascii'
         charset = self.get_output_charset()
         header_bytes = string.encode(codec)
         # 7bit/8bit encodings return the string unchanged (modulo conversions)
+        encoder_module = self._get_encoder(header_bytes)
+        if encoder_module is None:
+            return string
+        return encoder_module.header_encode(header_bytes, codec)
+    def header_encode_lines(self, string, maxlengths):
+        """Header-encode a string by converting it first to bytes.
+        This is similar to `header_encode()` except that the string is fit
+        into maximum line lengths as given by the arguments.
+        :param string: A unicode string for the header.  It must be possible
+            to encode this string to bytes using the character set's
+            output codec.
+        :param maxlengths: Maximum line length iterator.  Each element
+            returned from this iterator will provide the next maximum line
+            length.  This parameter is used as an argument to built-in next()
+            and should never be exhausted.  The maximum line lengths should
+            not count the RFC 2047 chrome.  These line lengths are only a
+            hint; the splitter does the best it can.
+        :param firstmaxlen: The maximum line length of the first line.  If
+            None (the default), then `maxlen` is used for the first line.
+        :return: Lines of encoded strings, each with RFC 2047 chrome.
+        """
+        # See which encoding we should use.
+        codec = self.output_codec or 'us-ascii'
+        header_bytes = string.encode(codec)
+        encoder_module = self._get_encoder(header_bytes)
+        encoder = partial(encoder_module.header_encode, charset=str(self))
+        # Calculate the number of characters that the RFC 2047 chrome will
+        # contribute to each line.
+        charset = self.get_output_charset()
+        extra = len(charset) + RFC2047_CHROME_LEN
+        # Now comes the hard part.  We must encode bytes but we can't split on
+        # bytes because some character sets are variable length and each
+        # encoded word must stand on its own.  So the problem is you have to
+        # encode to bytes to figure out this word's length, but you must split
+        # on characters.  This causes two problems: first, we don't know how
+        # many octets a specific substring of unicode characters will get
+        # encoded to, and second, we don't know how many ASCII characters
+        # those octets will get encoded to.  Unless we try it.  Which seems
+        # inefficient.  In the interest of being correct rather than fast (and
+        # in the hope that there will be few encoded headers in any such
+        # message), brute force it. :(
+        lines = []
+        current_line = []
+        maxlen = next(maxlengths) - extra
+        for character in string:
+            current_line.append(character)
+            this_line = EMPTYSTRING.join(current_line)
+            length = encoder_module.header_length(this_line.encode(charset))
+            if length > maxlen:
+                # This last character doesn't fit so pop it off.
+                current_line.pop()
+                # Does nothing fit on the first line?
+                if not lines and not current_line:
+                    lines.append(None)
+                else:
+                    joined_line = EMPTYSTRING.join(current_line)
+                    header_bytes = joined_line.encode(codec)
+                    lines.append(encoder(header_bytes))
+                current_line = [character]
+                maxlen = next(maxlengths) - extra
+        joined_line = EMPTYSTRING.join(current_line)
+        header_bytes = joined_line.encode(codec)
+        lines.append(encoder(header_bytes))
+        return lines
+    def _get_encoder(self, header_bytes):
         if self.header_encoding == BASE64:
-            encoder = email.base64mime.header_encode
+            return email.base64mime
         elif self.header_encoding == QP:
-            encoder = email.quoprimime.header_encode
+            return email.quoprimime
         elif self.header_encoding == SHORTEST:
-            lenb64 = email.base64mime.base64_len(header_bytes)
-            lenqp = email.quoprimime.header_quopri_len(header_bytes)
-            if lenb64 < lenqp:
-                encoder = email.base64mime.header_encode
+            len64 = email.base64mime.header_length(header_bytes)
+            lenqp = email.quoprimime.header_length(header_bytes)
+            if len64 < lenqp:
+                return email.base64mime
-                encoder = email.quoprimime.header_encode
+                return email.quoprimime
-            return string
-        return encoder(header_bytes, codec)
+            return None
-    def body_encode(self, s, convert=True):
-        """Body-encode a string and convert it to output_charset.
-        If convert is True (the default), the string will be converted from
-        the input charset to output charset automatically.  Unlike
-        header_encode(), there are no issues with byte boundaries and
-        multibyte charsets in email bodies, so this is usually pretty safe.
+    def body_encode(self, string):
+        """Body-encode a string by converting it first to bytes.
         The type of encoding (base64 or quoted-printable) will be based on
-        if convert:
-            s = self.convert(s)
         # 7bit/8bit encodings return the string unchanged (module conversions)
         if self.body_encoding is BASE64:
-            return email.base64mime.body_encode(s)
+            return email.base64mime.body_encode(string)
         elif self.body_encoding is QP:
-            return email.quoprimime.body_encode(s)
+            return email.quoprimime.body_encode(string)
-            return s
+            return string

Modified: sandbox/trunk/emailpkg/5_0-exp/email/generator.py
--- sandbox/trunk/emailpkg/5_0-exp/email/generator.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/generator.py	Tue Aug 28 15:42:06 2007
@@ -133,12 +133,8 @@
     def _write_headers(self, msg):
         for h, v in msg.items():
             print('%s:' % h, end=' ', file=self._fp)
-            if self._maxheaderlen == 0:
-                # Explicit no-wrapping
-                print(v, file=self._fp)
-            elif isinstance(v, Header):
-                # Header instances know what to do
-                print(v.encode(), file=self._fp)
+            if isinstance(v, Header):
+                print(v.encode(maxlinelen=self._maxheaderlen), file=self._fp)
                 # Header's got lots of smarts, so use it.
                 header = Header(v, maxlinelen=self._maxheaderlen,

Modified: sandbox/trunk/emailpkg/5_0-exp/email/header.py
--- sandbox/trunk/emailpkg/5_0-exp/email/header.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/header.py	Tue Aug 28 15:42:06 2007
@@ -109,7 +109,7 @@
     last_word = last_charset = None
     for word, charset in decoded_words:
         if isinstance(word, str):
-            word = bytes(ord(c) for c in word)
+            word = bytes(word, 'raw-unicode-escape')
         if last_word is None:
             last_word = word
             last_charset = charset
@@ -267,7 +267,7 @@
         output_string = input_bytes.decode(output_charset, errors)
         self._chunks.append((output_string, charset))
-    def encode(self, splitchars=';, \t'):
+    def encode(self, splitchars=';, \t', maxlinelen=None):
         """Encode a message header into an RFC-compliant format.
         There are many issues involved in converting a given string for use in
@@ -290,7 +290,14 @@
         syntactic breaks'.  This doesn't affect RFC 2047 encoded lines.
-        formatter = _ValueFormatter(self._headerlen, self._maxlinelen,
+        if maxlinelen is None:
+            maxlinelen = self._maxlinelen
+        # A maxlinelen of 0 means don't wrap.  For all practical purposes,
+        # choosing a huge number here accomplishes that and makes the
+        # _ValueFormatter algorithm much simpler.
+        if maxlinelen == 0:
+            maxlinelen = 1000000
+        formatter = _ValueFormatter(self._headerlen, maxlinelen,
                                     self._continuation_ws, splitchars)
         for string, charset in self._chunks:
             lines = string.splitlines()
@@ -348,24 +355,65 @@
         if len(encoded_string) + len(self._current_line) <= self._maxlen:
-        # Attempt to split the line at the highest-level syntactic break
-        # possible.  Note that we don't have a lot of smarts about field
+        # If the charset has no header encoding (i.e. it is an ASCII encoding)
+        # then we must split the header at the "highest level syntactic break"
+        # possible. Note that we don't have a lot of smarts about field
         # syntax; we just try to break on semi-colons, then commas, then
-        # whitespace.  Eventually, we'll allow this to be pluggable.
-        for ch in self._splitchars:
-            if ch in string:
-                break
-        else:
-            # We can't split the string to fit on the current line, so just
-            # put it on a line by itself.
-            self._lines.append(str(self._current_line))
-            self._current_line.reset(self._continuation_ws)
-            self._current_line.push(encoded_string)
+        # whitespace.  Eventually, this should be pluggable.
+        if charset.header_encoding is None:
+            for ch in self._splitchars:
+                if ch in string:
+                    break
+            else:
+                ch = None
+            # If there's no available split character then regardless of
+            # whether the string fits on the line, we have to put it on a line
+            # by itself.
+            if ch is None:
+                if not self._current_line.is_onlyws():
+                    self._lines.append(str(self._current_line))
+                    self._current_line.reset(self._continuation_ws)
+                self._current_line.push(encoded_string)
+            else:
+                self._ascii_split(string, ch)
+            return
+        # Otherwise, we're doing either a Base64 or a quoted-printable
+        # encoding which means we don't need to split the line on syntactic
+        # breaks.  We can basically just find enough characters to fit on the
+        # current line, minus the RFC 2047 chrome.  What makes this trickier
+        # though is that we have to split at octet boundaries, not character
+        # boundaries but it's only safe to split at character boundaries so at
+        # best we can only get close.
+        encoded_lines = charset.header_encode_lines(string, self._maxlengths())
+        # The first element extends the current line, but if it's None then
+        # nothing more fit on the current line so start a new line.
+        try:
+            first_line = encoded_lines.pop(0)
+        except IndexError:
+            # There are no encoded lines, so we're done.
+            return
+        if first_line is not None:
+            self._current_line.push(first_line)
+        self._lines.append(str(self._current_line))
+        self._current_line.reset(self._continuation_ws)
+        try:
+            last_line = encoded_lines.pop()
+        except IndexError:
+            # There was only one line.
-        self._spliterate(string, ch, charset)
+        self._current_line.push(last_line)
+        # Everything else are full lines in themselves.
+        for line in encoded_lines:
+            self._lines.append(self._continuation_ws + line)
+    def _maxlengths(self):
+        # The first line's length.
+        yield self._maxlen - len(self._current_line)
+        while True:
+            yield self._maxlen - self._continuation_ws_len
-    def _spliterate(self, string, ch, charset):
-        holding = _Accumulator(transformfunc=charset.header_encode)
+    def _ascii_split(self, string, ch):
+        holding = _Accumulator()
         # Split the line on the split character, preserving it.  If the split
         # character is whitespace RFC 2822 $2.2.3 requires us to fold on the
         # whitespace, so that the line leads with the original whitespace we
@@ -387,8 +435,7 @@
                     # line, watch out for the current line containing only
                     # whitespace.
-                    if len(self._current_line) == 0 and (
-                        len(holding) == 0 or str(holding).isspace()):
+                    if self._current_line.is_onlyws() and holding.is_onlyws():
                         # Don't start a new line.
                         part = None
@@ -492,12 +539,8 @@
 class _Accumulator:
-    def __init__(self, initial_size=0, transformfunc=None):
+    def __init__(self, initial_size=0):
         self._initial_size = initial_size
-        if transformfunc is None:
-            self._transformfunc = lambda string: string
-        else:
-            self._transformfunc = transformfunc
         self._current = []
     def push(self, string):
@@ -507,14 +550,18 @@
         return self._current.pop()
     def __len__(self):
-        return len(str(self)) + self._initial_size
+        return sum((len(string)
+                    for string in self._current),
+                   self._initial_size)
     def __str__(self):
-        return self._transformfunc(EMPTYSTRING.join(self._current))
+        return EMPTYSTRING.join(self._current)
     def reset(self, string=None):
         self._current = []
-        self._current_len = 0
         self._initial_size = 0
         if string is not None:
+    def is_onlyws(self):
+        return len(self) == 0 or str(self).isspace()

Modified: sandbox/trunk/emailpkg/5_0-exp/email/message.py
--- sandbox/trunk/emailpkg/5_0-exp/email/message.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/message.py	Tue Aug 28 15:42:06 2007
@@ -13,9 +13,9 @@
 from io import BytesIO, StringIO
 # Intrapackage imports
-import email.charset
 from email import utils
 from email import errors
+from email.charset import Charset
@@ -211,7 +211,7 @@
         # Is there a better way to do this?  We can't use the bytes
         # constructor.
-        return bytes(ord(c) for c in payload)
+        return bytes(payload, 'raw-unicode-escape')
     def set_payload(self, payload, charset=None):
         """Set the payload to the given value.
@@ -236,18 +236,13 @@
         and encoded properly, if needed, when generating the plain text
         representation of the message.  MIME headers (MIME-Version,
         Content-Type, Content-Transfer-Encoding) will be added as needed.
         if charset is None:
             self._charset = None
-        if isinstance(charset, basestring):
-            charset = email.charset.Charset(charset)
-        if not isinstance(charset, email.charset.Charset):
-            raise TypeError(charset)
-        # BAW: should we accept strings that can serve as arguments to the
-        # Charset constructor?
+        if not isinstance(charset, Charset):
+            charset = Charset(charset)
         self._charset = charset
         if 'MIME-Version' not in self:
             self.add_header('MIME-Version', '1.0')
@@ -256,7 +251,7 @@
             self.set_param('charset', charset.get_output_charset())
-        if str(charset) != charset.get_output_charset():
+        if charset != charset.get_output_charset():
             self._payload = charset.body_encode(self._payload)
         if 'Content-Transfer-Encoding' not in self:
             cte = charset.get_body_encoding()

Modified: sandbox/trunk/emailpkg/5_0-exp/email/quoprimime.py
--- sandbox/trunk/emailpkg/5_0-exp/email/quoprimime.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/quoprimime.py	Tue Aug 28 15:42:06 2007
@@ -29,16 +29,14 @@
 __all__ = [
-    'body_quopri_check',
-    'body_quopri_len',
+    'body_length',
-    'header_quopri_check',
-    'header_quopri_len',
+    'header_length',
@@ -46,51 +44,65 @@
 import re
 from string import ascii_letters, digits, hexdigits
-from email.utils import fix_eols
 CRLF = '\r\n'
 NL = '\n'
-# See also Charset.py
-HEADER_SAFE_BYTES = b'-!*+/ ' + bytes(ascii_letters) + bytes(digits)
-BODY_SAFE_BYTES   = (b' !"#$%&\'()*+,-./0123456789:;<>'
-                     b'?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`'
-                     b'abcdefghijklmnopqrstuvwxyz{|}~\t')
+# Build a mapping of octets to the expansion of that octet.  Since we're only
+# going to have 256 of these things, this isn't terribly inefficient
+# space-wise.  Remember that headers and bodies have different sets of safe
+# characters.  Initialize both maps with the full expansion, and then override
+# the safe bytes with the more compact form.
+_QUOPRI_HEADER_MAP = dict((c, '=%02X' % c) for c in range(256))
+# Safe header bytes which need no encoding.
+for c in b'-!*+/' + bytes(ascii_letters) + bytes(digits):
+    _QUOPRI_HEADER_MAP[c] = chr(c)
+# Headers have one other special encoding; spaces become underscores.
+_QUOPRI_HEADER_MAP[ord(' ')] = '_'
+# Safe body bytes which need no encoding.    
+for c in (b' !"#$%&\'()*+,-./0123456789:;<>'
+          b'abcdefghijklmnopqrstuvwxyz{|}~\t'):
+    _QUOPRI_BODY_MAP[c] = chr(c)
 # Helpers
-def header_quopri_check(c):
-    """Return True if the character should be escaped with header quopri."""
-    return c not in HEADER_SAFE_BYTES
+def header_check(octet):
+    """Return True if the octet should be escaped with header quopri."""
+    return chr(octet) != _QUOPRI_HEADER_MAP[octet]
-def body_quopri_check(c):
-    """Return True if the character should be escaped with body quopri."""
-    return c not in BODY_SAFE_BYTES
+def body_check(octet):
+    """Return True if the octet should be escaped with body quopri."""
+    return chr(octet) != _QUOPRI_BODY_MAP[octet]
-def header_quopri_len(bytearray):
-    """Return the length of bytearray when it is encoded with header quopri.
+def header_length(bytearray):
+    """Return a header quoted-printable encoding length.
     Note that this does not include any RFC 2047 chrome added by
+    :param bytearray: An array of bytes (a.k.a. octets).
+    :return: The length in bytes of the byte array when it is encoded with
+        quoted-printable for headers.
-    count = 0
-    for c in bytearray:
-        count += (3 if header_quopri_check(c) else 1)
-    return count
+    return sum(len(_QUOPRI_HEADER_MAP[octet]) for octet in bytearray)
-def body_quopri_len(bytearray):
-    """Return the length of bytearray when it is encoded with body quopri."""
-    count = 0
-    for c in bytearray:
-        count += (3 if body_quopri_check(c) else 1)
-    return count
+def body_length(bytearray):
+    """Return a body quoted-printable encoding length.
+    :param bytearray: An array of bytes (a.k.a. octets).
+    :return: The length in bytes of the byte array when it is encoded with
+        quoted-printable for bodies.
+    """
+    return sum(len(_QUOPRI_BODY_MAP[octet]) for octet in bytearray)
 def _max_append(L, s, maxlen, extra=''):
@@ -130,29 +142,17 @@
         return str(header_bytes)
     # Iterate over every byte, encoding if necessary.
     encoded = []
-    for character in header_bytes:
-        # Space may be represented as _ instead of =20 for readability
-        if character == ord(' '):
-            encoded.append('_')
-        # These characters can be included verbatim.
-        elif not header_quopri_check(character):
-            encoded.append(chr(character))
-        # Otherwise, replace with hex value like =E2
-        else:
-            encoded.append('=%02X' % character)
+    for octet in header_bytes:
+        encoded.append(_QUOPRI_HEADER_MAP[octet])
     # Now add the RFC chrome to each encoded chunk and glue the chunks
     # together.
     return '=?%s?q?%s?=' % (charset, EMPTYSTRING.join(encoded))
-def encode(body, binary=False, maxlinelen=76, eol=NL):
+def body_encode(body, maxlinelen=76, eol=NL):
     """Encode with quoted-printable, wrapping at maxlinelen characters.
-    If binary is False (the default), end-of-line characters will be converted
-    to the canonical email end-of-line sequence \\r\\n.  Otherwise they will
-    be left verbatim.
     Each line of encoded text will end with eol, which defaults to "\\n".  Set
     this to "\\r\\n" if you will be using the result of this function directly
     in an email.
@@ -165,9 +165,6 @@
     if not body:
         return body
-    if not binary:
-        body = fix_eols(body)
     # BAW: We're accumulating the body text by string concatenation.  That
     # can't be very efficient, but I don't have time now to rewrite it.  It
     # just feels like this algorithm could be more efficient.
@@ -192,7 +189,7 @@
         for j in range(linelen):
             c = line[j]
             prev = c
-            if body_quopri_check(c):
+            if body_check(c):
                 c = quote(c)
             elif j+1 == linelen:
                 # Check for whitespace at end of line; special case
@@ -228,11 +225,6 @@
     return encoded_body
-# For convenience and backwards compatibility w/ standard base64 module
-body_encode = encode
-encodestring = encode
 # BAW: I'm not sure if the intent was for the signature of this function to be
 # the same as base64MIME.decode() or not...

Modified: sandbox/trunk/emailpkg/5_0-exp/email/test/test_email.py
--- sandbox/trunk/emailpkg/5_0-exp/email/test/test_email.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/test/test_email.py	Tue Aug 28 15:42:06 2007
@@ -482,7 +482,7 @@
         msg['content-transfer-encoding'] = 'base64'
-                         bytes(ord(c) for c in x))
+                         bytes(x, 'raw-unicode-escape'))
@@ -674,9 +674,14 @@
     def test_no_split_long_header(self):
         eq = self.ndiffAssertEqual
         hstr = 'References: ' + 'x' * 80
-        h = Header(hstr, continuation_ws='\t')
+        h = Header(hstr)
+        # These come on two lines because Headers are really field value
+        # classes and don't really know about their field names.
         eq(h.encode(), """\
-References: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx""")
+ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx""")
+        h = Header('x' * 80)
+        eq(h.encode(), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
     def test_splitting_multiple_long_lines(self):
         eq = self.ndiffAssertEqual
@@ -722,10 +727,17 @@
         h = Header('Britische Regierung gibt', 'iso-8859-1',
         h.append('gr\xfcnes Licht f\xfcr Offshore-Windkraftprojekte')
+        eq(h.encode(), """\
+ =?iso-8859-1?q?hore-Windkraftprojekte?=""")
         msg['Subject'] = h
-        eq(msg.as_string(), """\
-Subject: =?iso-8859-1?q?Britische_Regierung_gibt_gr=FCnes_Licht_f=FCr?=
- =?iso-8859-1?q?Offshore-Windkraftprojekte?=
+        eq(msg.as_string(maxheaderlen=76), """\
+Subject: =?iso-8859-1?q?Britische_Regierung_gibt_gr=FCnes_Licht_f=FCr_Offs?=
+ =?iso-8859-1?q?hore-Windkraftprojekte?=
+        eq(msg.as_string(maxheaderlen=0), """\
+Subject: =?iso-8859-1?q?Britische_Regierung_gibt_gr=FCnes_Licht_f=FCr_Offshore-Windkraftprojekte?=
@@ -748,10 +760,10 @@
         msg = Message()
         msg['To'] = to
         eq(msg.as_string(maxheaderlen=78), '''\
-To: "Someone Test #A" <someone at eecs.umich.edu>, <someone at eecs.umich.edu>,
+To: "Someone Test #A" <someone at eecs.umich.edu>,<someone at eecs.umich.edu>,
 \t"Someone Test #B" <someone at umich.edu>,
-\t"Someone Test #C" <someone at eecs.umich.edu>,
-\t"Someone Test #D" <someone at eecs.umich.edu>
+ "Someone Test #C" <someone at eecs.umich.edu>,
+ "Someone Test #D" <someone at eecs.umich.edu>
@@ -775,14 +787,17 @@
     def test_long_field_name(self):
         eq = self.ndiffAssertEqual
         fn = 'X-Very-Very-Very-Long-Header-Name'
-        gs = "Die Mieter treten hier ein werden mit einem Foerderband komfortabel den Korridor entlang, an s\xfcdl\xfcndischen Wandgem\xe4lden vorbei, gegen die rotierenden Klingen bef\xf6rdert. "
+        gs = ('Die Mieter treten hier ein werden mit einem Foerderband '
+              'komfortabel den Korridor entlang, an s\xfcdl\xfcndischen '
+              'Wandgem\xe4lden vorbei, gegen die rotierenden Klingen '
+              'bef\xf6rdert. ')
         h = Header(gs, 'iso-8859-1', header_name=fn)
         # BAW: this seems broken because the first line is too long
         eq(h.encode(), """\
- =?iso-8859-1?q?ein_werden_mit_einem_Foerderband_komfortabel_den_Korridor_?=
- =?iso-8859-1?q?entlang=2C_an_s=FCdl=FCndischen_Wandgem=E4lden_vorbei=2C_g?=
- =?iso-8859-1?q?egen_die_rotierenden_Klingen_bef=F6rdert=2E_?=""")
+ =?iso-8859-1?q?in_werden_mit_einem_Foerderband_komfortabel_den_Korridor_e?=
+ =?iso-8859-1?q?ntlang=2C_an_s=FCdl=FCndischen_Wandgem=E4lden_vorbei=2C_ge?=
+ =?iso-8859-1?q?gen_die_rotierenden_Klingen_bef=F6rdert=2E_?=""")
     def test_long_received_header(self):
         h = ('from FOO.TLD (vizworld.acl.foo.tld [123.452.678.9]) '
@@ -811,9 +826,9 @@
         msg['Received-2'] = h
         self.ndiffAssertEqual(msg.as_string(maxheaderlen=78), """\
 Received-1: <15975.17901.207240.414604 at sgigritzmann1.mathematik.tu-muenchen.de>
-\t(David Bremner's message of "Thu, 6 Mar 2003 13:58:21 +0100")
+ (David Bremner's message of \"Thu, 6 Mar 2003 13:58:21 +0100\")
 Received-2: <15975.17901.207240.414604 at sgigritzmann1.mathematik.tu-muenchen.de>
-\t(David Bremner's message of "Thu, 6 Mar 2003 13:58:21 +0100")
+ (David Bremner's message of \"Thu, 6 Mar 2003 13:58:21 +0100\")
@@ -837,12 +852,12 @@
         eq = self.ndiffAssertEqual
         m = ('Received: from siimage.com '
              '([]) by zima.siliconimage.com with '
-             'Microsoft SMTPSVC(5.0.2195.4905);'
-             '\tWed, 16 Oct 2002 07:41:11 -0700')
+             'Microsoft SMTPSVC(5.0.2195.4905); '
+             'Wed, 16 Oct 2002 07:41:11 -0700')
         msg = email.message_from_string(m)
         eq(msg.as_string(maxheaderlen=78), '''\
 Received: from siimage.com ([]) by zima.siliconimage.com with
-\tMicrosoft SMTPSVC(5.0.2195.4905); Wed, 16 Oct 2002 07:41:11 -0700
+ Microsoft SMTPSVC(5.0.2195.4905); Wed, 16 Oct 2002 07:41:11 -0700
@@ -1519,7 +1534,7 @@
 # Test RFC 2047 header encoding and decoding
-class TestRFC2047(unittest.TestCase):
+class TestRFC2047(TestEmailBase):
     def test_rfc2047_multiline(self):
         eq = self.assertEqual
         s = """Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz
@@ -1533,9 +1548,9 @@
         header = make_header(dh)
            'Re: r\xe4ksm\xf6rg\xe5s baz foo bar r\xe4ksm\xf6rg\xe5s')
-        eq(header.encode(),
-           """Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz foo bar
- =?mac-iceland?q?r=8Aksm=9Arg=8Cs?=""")
+        self.ndiffAssertEqual(header.encode(), """\
+Re: =?mac-iceland?q?r=8Aksm=9Arg=8Cs?= baz foo bar =?mac-iceland?q?r=8Aksm?=
+ =?mac-iceland?q?=9Arg=8Cs?=""")
     def test_whitespace_eater_unicode(self):
         eq = self.assertEqual
@@ -2185,14 +2200,6 @@
             utils.formataddr(('A Silly; Person', 'person at dom.ain')),
             r'"A Silly; Person" <person at dom.ain>')
-    def test_fix_eols(self):
-        eq = self.assertEqual
-        eq(utils.fix_eols('hello'), 'hello')
-        eq(utils.fix_eols('hello\n'), 'hello\r\n')
-        eq(utils.fix_eols('hello\r'), 'hello\r\n')
-        eq(utils.fix_eols('hello\r\n'), 'hello\r\n')
-        eq(utils.fix_eols('hello\n\r'), 'hello\r\n\r\n')
     def test_charset_richcomparisons(self):
         eq = self.assertEqual
         ne = self.failIfEqual
@@ -2518,8 +2525,8 @@
 class TestBase64(unittest.TestCase):
     def test_len(self):
         eq = self.assertEqual
-        eq(base64mime.base64_len('hello'),
-           len(base64mime.encode('hello', eol='')))
+        eq(base64mime.header_length('hello'),
+           len(base64mime.body_encode('hello', eol='')))
         for size in range(15):
             if   size == 0 : bsize = 0
             elif size <= 3 : bsize = 4
@@ -2527,7 +2534,7 @@
             elif size <= 9 : bsize = 12
             elif size <= 12: bsize = 16
             else           : bsize = 20
-            eq(base64mime.base64_len('x'*size), bsize)
+            eq(base64mime.header_length('x' * size), bsize)
     def test_decode(self):
         eq = self.assertEqual
@@ -2538,13 +2545,13 @@
     def test_encode(self):
         eq = self.assertEqual
-        eq(base64mime.encode(''), '')
-        eq(base64mime.encode('hello'), 'aGVsbG8=\n')
+        eq(base64mime.body_encode(''), '')
+        eq(base64mime.body_encode('hello'), 'aGVsbG8=\n')
         # Test the binary flag
-        eq(base64mime.encode('hello\n'), 'aGVsbG8K\n')
-        eq(base64mime.encode('hello\n', 0), 'aGVsbG8NCg==\n')
+        eq(base64mime.body_encode('hello\n'), 'aGVsbG8K\n')
+        eq(base64mime.body_encode('hello\n', 0), 'aGVsbG8NCg==\n')
         # Test the maxlinelen arg
-        eq(base64mime.encode('xxxx ' * 20, maxlinelen=40), """\
+        eq(base64mime.body_encode('xxxx ' * 20, maxlinelen=40), """\
@@ -2562,26 +2569,11 @@
         eq = self.assertEqual
         he = base64mime.header_encode
         eq(he('hello'), '=?iso-8859-1?b?aGVsbG8=?=')
-        eq(he('hello\nworld'), '=?iso-8859-1?b?aGVsbG8NCndvcmxk?=')
+        eq(he('hello\r\nworld'), '=?iso-8859-1?b?aGVsbG8NCndvcmxk?=')
+        eq(he('hello\nworld'), '=?iso-8859-1?b?aGVsbG8Kd29ybGQ=?=')
         # Test the charset option
         eq(he('hello', charset='iso-8859-2'), '=?iso-8859-2?b?aGVsbG8=?=')
         eq(he('hello\nworld'), '=?iso-8859-1?b?aGVsbG8Kd29ybGQ=?=')
-        # Test the maxlinelen argument
-        eq(he('xxxx ' * 20, maxlinelen=40), """\
- =?iso-8859-1?b?eHggeHh4eCB4eHh4IHh4eHg=?=
- =?iso-8859-1?b?IHh4eHggeHh4eCB4eHh4IHg=?=
- =?iso-8859-1?b?eHh4IHh4eHggeHh4eCB4eHg=?=
- =?iso-8859-1?b?eCB4eHh4IHh4eHggeHh4eCA=?=
- =?iso-8859-1?b?eHh4eCB4eHh4IHh4eHgg?=""")
-        # Test the eol argument
-        eq(he('xxxx ' * 20, maxlinelen=40, eol='\r\n'), """\
- =?iso-8859-1?b?eHggeHh4eCB4eHh4IHh4eHg=?=\r
- =?iso-8859-1?b?IHh4eHggeHh4eCB4eHh4IHg=?=\r
- =?iso-8859-1?b?eHh4IHh4eHggeHh4eCB4eHg=?=\r
- =?iso-8859-1?b?eCB4eHh4IHh4eHggeHh4eCA=?=\r
- =?iso-8859-1?b?eHh4eCB4eHh4IHh4eHgg?=""")
@@ -2593,7 +2585,7 @@
             range(ord('a'), ord('z') + 1),
             range(ord('A'), ord('Z') + 1),
             range(ord('0'), ord('9') + 1),
-            (c for c in b'!*+-/ ')))
+            (c for c in b'!*+-/')))
         # Set of characters (as byte integers) that do need to be encoded in
         # headers.
         self.hnon = [c for c in range(256) if c not in self.hlit]
@@ -2608,46 +2600,53 @@
         self.bnon = [c for c in range(256) if c not in self.blit]
         assert len(self.blit) + len(self.bnon) == 256
-    def test_header_quopri_check(self):
+    def test_quopri_header_check(self):
         for c in self.hlit:
-            self.failIf(quoprimime.header_quopri_check(c))
+            self.failIf(quoprimime.header_check(c),
+                        'Should not be header quopri encoded: %s' % chr(c))
         for c in self.hnon:
-            self.failUnless(quoprimime.header_quopri_check(c))
+            self.failUnless(quoprimime.header_check(c),
+                            'Should be header quopri encoded: %s' % chr(c))
-    def test_body_quopri_check(self):
+    def test_quopri_body_check(self):
         for c in self.blit:
-            self.failIf(quoprimime.body_quopri_check(c))
+            self.failIf(quoprimime.body_check(c),
+                        'Should not be body quopri encoded: %s' % chr(c))
         for c in self.bnon:
-            self.failUnless(quoprimime.body_quopri_check(c))
+            self.failUnless(quoprimime.body_check(c),
+                            'Should be body quopri encoded: %s' % chr(c))
     def test_header_quopri_len(self):
         eq = self.assertEqual
-        eq(quoprimime.header_quopri_len(b'hello'), 5)
-        # RFC 2047 chrome is not included in header_quopri_len().
+        eq(quoprimime.header_length(b'hello'), 5)
+        # RFC 2047 chrome is not included in header_length().
         eq(len(quoprimime.header_encode(b'hello', charset='xxx')),
-           quoprimime.header_quopri_len(b'hello') +
+           quoprimime.header_length(b'hello') +
            # =?xxx?q?...?= means 10 extra characters
-        eq(quoprimime.header_quopri_len(b'h at e@l at l@o@'), 20)
-        # RFC 2047 chrome is not included in header_quopri_len().
+        eq(quoprimime.header_length(b'h at e@l at l@o@'), 20)
+        # RFC 2047 chrome is not included in header_length().
         eq(len(quoprimime.header_encode(b'h at e@l at l@o@', charset='xxx')),
-           quoprimime.header_quopri_len(b'h at e@l at l@o@') +
+           quoprimime.header_length(b'h at e@l at l@o@') +
            # =?xxx?q?...?= means 10 extra characters
         for c in self.hlit:
-            eq(quoprimime.header_quopri_len(bytes([c])), 1,
+            eq(quoprimime.header_length(bytes([c])), 1,
                'expected length 1 for %r' % chr(c))
         for c in self.hnon:
-            eq(quoprimime.header_quopri_len(bytes([c])), 3,
+            # Space is special; it's encoded to _
+            if c == ord(' '):
+                continue
+            eq(quoprimime.header_length(bytes([c])), 3,
                'expected length 3 for %r' % chr(c))
+        eq(quoprimime.header_length(b' '), 1)
     def test_body_quopri_len(self):
         eq = self.assertEqual
-        bql = quoprimime.body_quopri_len
         for c in self.blit:
-            eq(bql(c), 1)
+            eq(quoprimime.body_length(bytes([c])), 1)
         for c in self.bnon:
-            eq(bql(c), 3)
+            eq(quoprimime.body_length(bytes([c])), 3)
     def test_quote_unquote_idempotent(self):
         for x in range(256):
@@ -2672,22 +2671,23 @@
     def test_encode(self):
         eq = self.assertEqual
-        eq(quoprimime.encode(''), '')
-        eq(quoprimime.encode('hello'), 'hello')
+        eq(quoprimime.body_encode(''), '')
+        eq(quoprimime.body_encode('hello'), 'hello')
         # Test the binary flag
-        eq(quoprimime.encode('hello\r\nworld'), 'hello\nworld')
-        eq(quoprimime.encode('hello\r\nworld', 0), 'hello\nworld')
+        eq(quoprimime.body_encode('hello\r\nworld'), 'hello\nworld')
+        eq(quoprimime.body_encode('hello\r\nworld', 0), 'hello\nworld')
         # Test the maxlinelen arg
-        eq(quoprimime.encode('xxxx ' * 20, maxlinelen=40), """\
+        eq(quoprimime.body_encode('xxxx ' * 20, maxlinelen=40), """\
 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx=
  xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxx=
 x xxxx xxxx xxxx xxxx=20""")
         # Test the eol argument
-        eq(quoprimime.encode('xxxx ' * 20, maxlinelen=40, eol='\r\n'), """\
+        eq(quoprimime.body_encode('xxxx ' * 20, maxlinelen=40, eol='\r\n'),
+           """\
 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx=\r
  xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxx=\r
 x xxxx xxxx xxxx xxxx=20""")
-        eq(quoprimime.encode("""\
+        eq(quoprimime.body_encode("""\
 one line
 two line"""), """\
@@ -2706,17 +2706,16 @@
         except KeyError:
-    def test_idempotent(self):
+    def test_codec_encodeable(self):
         eq = self.assertEqual
         # Make sure us-ascii = no Unicode conversion
         c = Charset('us-ascii')
-        s = 'Hello World!'
-        sp = c.to_splittable(s)
-        eq(s, c.from_splittable(sp))
-        # test 8-bit idempotency with us-ascii
+        eq(c.header_encode('Hello World!'), 'Hello World!')
+        # Test 8-bit idempotency with us-ascii
         s = '\xa4\xa2\xa4\xa4\xa4\xa6\xa4\xa8\xa4\xaa'
-        sp = c.to_splittable(s)
-        eq(s, c.from_splittable(sp))
+        self.assertRaises(UnicodeError, c.header_encode, s)
+        c = Charset('utf-8')
+        eq(c.header_encode(s), '=?utf-8?b?wqTCosKkwqTCpMKmwqTCqMKkwqo=?=')
     def test_body_encode(self):
         eq = self.assertEqual
@@ -2805,20 +2804,22 @@
         h.append(utf8_head, utf8)
         enc = h.encode()
         eq(enc, """\
- =?iso-8859-1?q?mfortabel_den_Korridor_entlang=2C_an_s=FCdl=FCndischen_Wan?=
- =?iso-8859-1?q?dgem=E4lden_vorbei=2C_gegen_die_rotierenden_Klingen_bef=F6?=
- =?iso-8859-1?q?rdert=2E_?= =?iso-8859-2?q?Finan=E8ni_metropole_se_hroutily?=
+ =?iso-8859-1?q?fortabel_den_Korridor_entlang=2C_an_s=FCdl=FCndischen_Wand?=
+ =?iso-8859-1?q?gem=E4lden_vorbei=2C_gegen_die_rotierenden_Klingen_bef=F6r?=
+ =?iso-8859-1?q?dert=2E_?= =?iso-8859-2?q?Finan=E8ni_metropole_se_hroutily?=
  =?iso-8859-2?q?_pod_tlakem_jejich_d=F9vtipu=2E=2E_?= =?utf-8?b?5q2j56K6?=
- =?utf-8?q?_Nunstuck_git_und_Slotermeyer=3F_Ja!_Beiherhund_das_Oder_die_Fl?=
- =?utf-8?b?aXBwZXJ3YWxkdCBnZXJzcHV0LuOAjeOBqOiogOOBo+OBpuOBhOOBvuOBmQ==?=
- =?utf-8?b?44CC?=""")
-        eq(decode_header(enc),
-           [(g_head, "iso-8859-1"), (cz_head, "iso-8859-2"),
-            (utf8_head, "utf-8")])
+ =?utf-8?b?IE51bnN0dWNrIGdpdCB1bmQgU2xvdGVybWV5ZXI/IEphISBCZWloZXJodW5k?=
+ =?utf-8?b?IGRhcyBPZGVyIGRpZSBGbGlwcGVyd2FsZHQgZ2Vyc3B1dC7jgI3jgajoqIA=?=
+ =?utf-8?b?44Gj44Gm44GE44G+44GZ44CC?=""")
+        decoded = decode_header(enc)
+        eq(len(decoded), 3)
+        eq(decoded[0], (g_head, 'iso-8859-1'))
+        eq(decoded[1], (cz_head, 'iso-8859-2'))
+        eq(decoded[3], (utf8_head, 'utf-8'))
         ustr = str(h)
            'Die Mieter treten hier ein werden mit einem Foerderband '
@@ -2871,39 +2872,128 @@
         eq(h.encode(), hstr)
         eq(str(h), hstr)
-    def test_long_splittables_with_trailing_spaces(self):
+    def test_quopri_splittable(self):
         eq = self.ndiffAssertEqual
         h = Header(charset='iso-8859-1', maxlinelen=20)
-        h.append('xxxx ' * 20)
-        eq(h.encode(), """\
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx?=
- =?iso-8859-1?q?xxxx_?=""")
+        x = 'xxxx ' * 20
+        h.append(x)
+        s = h.encode()
+        eq(s, """\
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_x?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?x_?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?xx?=
+ =?iso-8859-1?q?_?=""")
+        eq(x, str(make_header(decode_header(s))))
         h = Header(charset='iso-8859-1', maxlinelen=40)
         h.append('xxxx ' * 20)
-        eq(h.encode(), """\
- =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx?=
- =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx?=
- =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx?=
- =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx_?=""")
+        s = h.encode()
+        eq(s, """\
+ =?iso-8859-1?q?x_xxxx_xxxx_xxxx_xxxx_?=
+ =?iso-8859-1?q?xxxx_xxxx_xxxx_xxxx_xx?=
+ =?iso-8859-1?q?xx_xxxx_xxxx_xxxx_xxxx?=
+ =?iso-8859-1?q?_xxxx_xxxx_?=""")
+        eq(x, str(make_header(decode_header(s))))
+    def test_base64_splittable(self):
+        eq = self.ndiffAssertEqual
+        h = Header(charset='koi8-r', maxlinelen=20)
+        x = 'xxxx ' * 20
+        h.append(x)
+        s = h.encode()
+        eq(s, """\
+ =?koi8-r?b?eCB4?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?IHh4?=
+ =?koi8-r?b?eHgg?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?eCB4?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?IHh4?=
+ =?koi8-r?b?eHgg?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?eCB4?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?IHh4?=
+ =?koi8-r?b?eHgg?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?eCB4?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?IHh4?=
+ =?koi8-r?b?eHgg?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?eCB4?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?IHh4?=
+ =?koi8-r?b?eHgg?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?eCB4?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?IHh4?=
+ =?koi8-r?b?eHgg?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?eCB4?=
+ =?koi8-r?b?eHh4?=
+ =?koi8-r?b?IA==?=""")
+        eq(x, str(make_header(decode_header(s))))
+        h = Header(charset='koi8-r', maxlinelen=40)
+        h.append(x)
+        s = h.encode()
+        eq(s, """\
+ =?koi8-r?b?eCB4eHh4IHh4eHggeHh4eCB4?=
+ =?koi8-r?b?eHh4IHh4eHggeHh4eCB4eHh4?=
+ =?koi8-r?b?IHh4eHggeHh4eCB4eHh4IHh4?=
+ =?koi8-r?b?eHggeHh4eCB4eHh4IHh4eHgg?=
+ =?koi8-r?b?eHh4eCB4eHh4IA==?=""")
+        eq(x, str(make_header(decode_header(s))))
     def test_us_ascii_header(self):
         eq = self.assertEqual

Modified: sandbox/trunk/emailpkg/5_0-exp/email/utils.py
--- sandbox/trunk/emailpkg/5_0-exp/email/utils.py	(original)
+++ sandbox/trunk/emailpkg/5_0-exp/email/utils.py	Tue Aug 28 15:42:06 2007
@@ -71,16 +71,6 @@
-def fix_eols(s):
-    """Replace all line-ending characters with \r\n."""
-    # Fix newlines with no preceding carriage return
-    s = re.sub(r'(?<!\r)\n', CRLF, s)
-    # Fix carriage returns with no following newline
-    s = re.sub(r'\r(?!\n)', CRLF, s)
-    return s
 def formataddr(pair):
     """The inverse of parseaddr(), this takes a 2-tuple of the form
     (realname, email_address) and returns the string value suitable
@@ -317,7 +307,7 @@
     # object.  We do not want bytes() normal utf-8 decoder, we want a straight
     # interpretation of the string as character bytes.
     charset, language, text = value
-    rawbytes = bytes(ord(c) for c in text)
+    rawbytes = bytes(text, 'raw-unicode-escape')
         return str(rawbytes, charset, errors)
     except LookupError:

More information about the Python-checkins mailing list