unicode and hashlib
Terry Reedy
tjreedy at udel.edu
Fri Nov 28 15:03:32 EST 2008
Jeff H wrote:
> hashlib.md5 does not appear to like unicode,
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa6' in
> position 1650: ordinal not in range(128)
It is the (default) ascii encoder that does not like non-ascii chars.
I suspect that is you encode to bytes first with an encoder that does
work (latin-???), md5 will be happy.
Reports like this should include Python version.
> After googling, I've found BDFL and others on Py3K talking about the
> problems of hashing non-bytes (i.e. buffers)
> http://www.mail-archive.com/python-3000@python.org/msg09824.html
>
> So what is the canonical way to hash unicode?
> * convert unicode to local
> * hash in current local
> ???
> but what if local has ordinals outside of 128?
>
> Is this just a problem for md5 hashes that I would not encounter using
> a different method? i.e. Should I just use the built-in hash function?
> --
> http://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list