[Python-Dev] Issue 3745 backwards incompatibility

Karen Tracey kmtracey at gmail.com
Tue Dec 15 01:28:08 CET 2009


In testing some existing code with the 2.7 alpha release, I've run into:

    TypeError: Unicode-objects must be encoded before hashing

when the existing code tries to pass unicode objects to hashlib.sha1 and
hashlib.md5.  This is, I believe, due to changes made for issue 3745:

http://bugs.python.org/issue3745

The issue states the need to reject unencoded strings based on the fact that
one backend implementation (openssl) refused to accept them while another
(_sha256) assumed a utf-8 encoding.  The thing is, I cannot observe any such
difference using Python 2.5 or 2.6.  Instead of what is shown in the ticket
(which was done on a Python 3, I believe) I see, when I adjust the demo test
to use Python 2 syntax for "unencoded strings":

Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import _hashlib
>>> _hashlib.openssl_sha256(u"\xff")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position
0: ordinal not in range(128)
>>> import _sha256
>>> _sha256.sha256(u'\xff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position
0: ordinal not in range(128)
>>>

(Sample from Windows because that's the only place I can get import _sha256
to work.  The Ubuntu Linux I tried behaves the same way as above for the
_hashlib version, while it doesn't appear to have _sha256 as an option.)

So from what I can see the behavior wasn't inconsistent from
backend-to-backend in Python 2 but rather fell in line with what I'm
familiar with: if you pass unicode to some code that only wants bytes, the
unicode object will get encoded to a bytestring using the system default
encoding. No problems if the data can in fact always be encoded using that
encoding, the error above if the data can't be encoded. Changing these
functions to now require the caller to do the encoding explicitly ahead of
time strikes me as introducing an inconsistency. Plus it introduces a
backwards incompatibility in Python 2.7.  Is this really necessary?

Karen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20091214/fb269268/attachment.htm>


More information about the Python-Dev mailing list