[Python-3000] Should str and bytes hash equally?

Alexandre Vassalotti alexandre at peadrop.com
Fri Dec 14 00:50:54 CET 2007


On Dec 13, 2007 6:03 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>
> > In Python 2.x, having the byte string and unicode hash equally was
> > desirable, since u'' == ''. But since the bytes and str are always
> > considered unequal, in Python 3k, I think would be good idea to make
> > their hash unequal too. So, what do you think?
>
> To phrase Adam Olsen's observation in a different way: *Why* do you
> think it would be good idea? Do you think it would make things more
> correct, or more efficient? If neither, what other desirable effect
> would that change have?
>

I first thought that would avoid the somehow odd behavior that appears
when mixing unicode and byte strings in dictionaries:

   >>> d = {}
   >>> d = {'spam': 0}
   >>> d[u'spam'] = 1
   >>> d
   {'spam': 1}

But then, I realized this wasn't a problem anymore, in Python 3k,
since unicode string (str) and byte string (bytes) are always unequal.

However, that is not why I proposed to make the hashes unequal. I was
worry that people would be tempted to use this equality property as an
easy way (but wrong) to compare strings:

   >>> hash('hello') == hash(b'hello')
   True

I do realize now that it is really a weak argument. And, I don't think
anymore that it justifies changing the hashing functions.

-- Alexandre


More information about the Python-3000 mailing list