[issue9124] Mailbox module should use binary I/O, not text I/O

STINNER Victor report at bugs.python.org
Wed Jan 26 22:49:57 CET 2011


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

pitrou> There's a missing conversion in mailbox.patch.
pitrou> Running with -bb shows the issue.
pitrou> Here is an updated patch.

Good catch: test_mailbox now pass on Windows.

--

Some remarks on mailbox2.patch.

get_string() returns a bytes object: I propose to rename it to get_bytes(): """Return a *byte* string representation or raise a KeyError."""

The following comment is outdated, target have to be a *binary* file:
    def _dump_message(self, message, target, mangle_from_=False):
        # This assumes the target file is open in *text* mode ...

get_file(): should we specify that the file-like object is a binary file?

MH.get_sequences() and MH.set_sequences() opens .mh_sequences file in text mode from the locale encoding. I don't know if the locale encoding is a good choice. Does this file contain non-ASCII characters? Should we use ASCII or UTF-8 encoding instead, or parse the file in binary, and only decode requested values from ASCII?

Since all tests of test_mailbox now pass on Windows, it looks like the "universal newline" thing still work. But how can I be sure?

-            from_line = 'From MAILER-DAEMON %s' % time.asctime(time.gmtime())
+            from_line = b'From MAILER-DAEMON ' + time.asctime(time.gmtime()).encode()

Is UTF-8 the right encoding to encode a timestamp? Or should we use something like "=?UTF-8?q?...?=" ?

MH.set_sequences() does... sometimes... decode the sequence name from UTF-8. I don't understand why I had to add the following if:

-                f.write('%s:' % name)
+                if isinstance(name, bytes):
+                    name = name.decode()
+                f.write(name + ':')

Is it correct to decode the timestamp from UTF-8? And is the following change correct?
***********
-            maybe_date = ' '.join(self.get_from().split()[-5:])
+            maybe_date = b' '.join(self.get_from().split()[-5:])
             try:
+                maybe_date = maybe_date.decode('utf-8')
                 message.set_date(calendar.timegm(time.strptime(maybe_date,
                                                       '%a %b %d %H:%M:%S %Y')))
-            except (ValueError, OverflowError):
+            except (ValueError, OverflowError, UnicodeDecodeError):
                 pass
***********


The following change is just enough to fix mailbox. But it would maybe be better to inherit from RawIOBase instead and implement all methods. _PartialFile class might be moved into the io module. All of this can be done later.
****** 
+    def readable(self):
+        return self._file.readable()
 
+    def writable(self):
+        return self._file.writable()
+
+    def seekable(self):
+        return self._file.seekable()
+
+    def flush(self):
+        return self._file.flush()
+
+    @property
+    def closed(self):
+        return self._file.closed
******

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9124>
_______________________________________


More information about the Python-bugs-list mailing list