[Python-checkins] python/dist/src/Lib/email Header.py,1.15,1.16

bwarsaw@users.sourceforge.net bwarsaw@users.sourceforge.net
Mon, 14 Oct 2002 09:52:43 -0700


Update of /cvsroot/python/python/dist/src/Lib/email
In directory usw-pr-cvs1:/tmp/cvs-serv27982

Modified Files:
	Header.py 
Log Message:
append(): Fixing the test for convertability after consultation with
Ben.  If s is a byte string, make sure it can be converted to unicode
with the input codec, and from unicode with the output codec, or raise
a UnicodeError exception early.  Skip this test (and the unicode->byte
string conversion) when the charset is our faux 8bit raw charset.


Index: Header.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/email/Header.py,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** Header.py	14 Oct 2002 15:13:17 -0000	1.15
--- Header.py	14 Oct 2002 16:52:41 -0000	1.16
***************
*** 219,236 ****
          elif not isinstance(charset, Charset):
              charset = Charset(charset)
!         # Normalize and check the string
!         if isinstance(s, StringType):
!             # Possibly raise UnicodeError if it can't be encoded
!             unicode(s, charset.get_output_charset())
!         elif isinstance(s, UnicodeType):
!             # Convert Unicode to byte string for later concatenation
!             for charset in USASCII, charset, UTF8:
!                 try:
!                     s = s.encode(charset.get_output_charset())
!                     break
!                 except UnicodeError:
!                     pass
!             else:
!                 assert False, 'Could not encode to utf-8'
          self._chunks.append((s, charset))
  
--- 219,250 ----
          elif not isinstance(charset, Charset):
              charset = Charset(charset)
!         # If the charset is our faux 8bit charset, leave the string unchanged
!         if charset <> '8bit':
!             # We need to test that the string can be converted to unicode and
!             # back to a byte string, given the input and output codecs of the
!             # charset.
!             if isinstance(s, StringType):
!                 # Possibly raise UnicodeError if the byte string can't be
!                 # converted to a unicode with the input codec of the charset.
!                 incodec = charset.input_codec or 'us-ascii'
!                 ustr = unicode(s, incodec)
!                 # Now make sure that the unicode could be converted back to a
!                 # byte string with the output codec, which may be different
!                 # than the iput coded.  Still, use the original byte string.
!                 outcodec = charset.output_codec or 'us-ascii'
!                 ustr.encode(outcodec)
!             elif isinstance(s, UnicodeType):
!                 # Now we have to be sure the unicode string can be converted
!                 # to a byte string with a reasonable output codec.  We want to
!                 # use the byte string in the chunk.
!                 for charset in USASCII, charset, UTF8:
!                     try:
!                         outcodec = charset.output_codec or 'us-ascii'
!                         s = s.encode(outcodec)
!                         break
!                     except UnicodeError:
!                         pass
!                 else:
!                     assert False, 'utf-8 conversion failed'
          self._chunks.append((s, charset))