[Python-3000] Should str and bytes hash equally?
Alexandre Vassalotti
alexandre at peadrop.com
Fri Dec 14 00:50:54 CET 2007
On Dec 13, 2007 6:03 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>
> > In Python 2.x, having the byte string and unicode hash equally was
> > desirable, since u'' == ''. But since the bytes and str are always
> > considered unequal, in Python 3k, I think would be good idea to make
> > their hash unequal too. So, what do you think?
>
> To phrase Adam Olsen's observation in a different way: *Why* do you
> think it would be good idea? Do you think it would make things more
> correct, or more efficient? If neither, what other desirable effect
> would that change have?
>
I first thought that would avoid the somehow odd behavior that appears
when mixing unicode and byte strings in dictionaries:
>>> d = {}
>>> d = {'spam': 0}
>>> d[u'spam'] = 1
>>> d
{'spam': 1}
But then, I realized this wasn't a problem anymore, in Python 3k,
since unicode string (str) and byte string (bytes) are always unequal.
However, that is not why I proposed to make the hashes unequal. I was
worry that people would be tempted to use this equality property as an
easy way (but wrong) to compare strings:
>>> hash('hello') == hash(b'hello')
True
I do realize now that it is really a weak argument. And, I don't think
anymore that it justifies changing the hashing functions.
-- Alexandre
More information about the Python-3000
mailing list