[issue9124] Mailbox module should use binary I/O, not text I/O
STINNER Victor
report at bugs.python.org
Wed Jan 26 22:49:57 CET 2011
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
pitrou> There's a missing conversion in mailbox.patch.
pitrou> Running with -bb shows the issue.
pitrou> Here is an updated patch.
Good catch: test_mailbox now pass on Windows.
--
Some remarks on mailbox2.patch.
get_string() returns a bytes object: I propose to rename it to get_bytes(): """Return a *byte* string representation or raise a KeyError."""
The following comment is outdated, target have to be a *binary* file:
def _dump_message(self, message, target, mangle_from_=False):
# This assumes the target file is open in *text* mode ...
get_file(): should we specify that the file-like object is a binary file?
MH.get_sequences() and MH.set_sequences() opens .mh_sequences file in text mode from the locale encoding. I don't know if the locale encoding is a good choice. Does this file contain non-ASCII characters? Should we use ASCII or UTF-8 encoding instead, or parse the file in binary, and only decode requested values from ASCII?
Since all tests of test_mailbox now pass on Windows, it looks like the "universal newline" thing still work. But how can I be sure?
- from_line = 'From MAILER-DAEMON %s' % time.asctime(time.gmtime())
+ from_line = b'From MAILER-DAEMON ' + time.asctime(time.gmtime()).encode()
Is UTF-8 the right encoding to encode a timestamp? Or should we use something like "=?UTF-8?q?...?=" ?
MH.set_sequences() does... sometimes... decode the sequence name from UTF-8. I don't understand why I had to add the following if:
- f.write('%s:' % name)
+ if isinstance(name, bytes):
+ name = name.decode()
+ f.write(name + ':')
Is it correct to decode the timestamp from UTF-8? And is the following change correct?
***********
- maybe_date = ' '.join(self.get_from().split()[-5:])
+ maybe_date = b' '.join(self.get_from().split()[-5:])
try:
+ maybe_date = maybe_date.decode('utf-8')
message.set_date(calendar.timegm(time.strptime(maybe_date,
'%a %b %d %H:%M:%S %Y')))
- except (ValueError, OverflowError):
+ except (ValueError, OverflowError, UnicodeDecodeError):
pass
***********
The following change is just enough to fix mailbox. But it would maybe be better to inherit from RawIOBase instead and implement all methods. _PartialFile class might be moved into the io module. All of this can be done later.
******
+ def readable(self):
+ return self._file.readable()
+ def writable(self):
+ return self._file.writable()
+
+ def seekable(self):
+ return self._file.seekable()
+
+ def flush(self):
+ return self._file.flush()
+
+ @property
+ def closed(self):
+ return self._file.closed
******
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9124>
_______________________________________
More information about the Python-bugs-list
mailing list