
Hello everyone and Benjamin, Currently, memoryview objects are unhashable:
hash(memoryview(b"")) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'memoryview'
Compare with Python 2.7:
hash(buffer("")) 0
memoryviews already support equality comparison:
b"" == memoryview(b"") True
If the original object providing the buffer is hashable, then it seems to make sense for the memoryview object to be hashable. This came while porting Twisted to Python 3. What do you think? Regards Antoine.

Aren't memoryview objects mutable? I think that the underlying memory can change, so it shouldn't be hashable. On Sat, Nov 12, 2011 at 4:23 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Hello everyone and Benjamin,
Currently, memoryview objects are unhashable:
hash(memoryview(b"")) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'memoryview'
Compare with Python 2.7:
hash(buffer("")) 0
memoryviews already support equality comparison:
b"" == memoryview(b"") True
If the original object providing the buffer is hashable, then it seems to make sense for the memoryview object to be hashable. This came while porting Twisted to Python 3.
What do you think?
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

On Sat, 12 Nov 2011 17:15:08 -0800 Guido van Rossum <guido@python.org> wrote:
Aren't memoryview objects mutable? I think that the underlying memory can change, so it shouldn't be hashable.
Only if the original object is itself mutable, otherwise the memoryview is read-only. I would propose the following algorithm: 1) try to calculate the original object's hash; if it fails, consider the memoryview unhashable (the buffer is probably mutable) 2) otherwise, calculate the memoryview's hash with the same algorithm as bytes objects (so that it's compatible with equality comparisons) Regards Antoine.

On Sun, Nov 13, 2011 at 11:19 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 12 Nov 2011 17:15:08 -0800 Guido van Rossum <guido@python.org> wrote:
Aren't memoryview objects mutable? I think that the underlying memory can change, so it shouldn't be hashable.
Only if the original object is itself mutable, otherwise the memoryview is read-only.
I would propose the following algorithm: 1) try to calculate the original object's hash; if it fails, consider the memoryview unhashable (the buffer is probably mutable) 2) otherwise, calculate the memoryview's hash with the same algorithm as bytes objects (so that it's compatible with equality comparisons)
Having a memory view be hashable if the object it references is hashable seems analogous to the way tuples are hashable if everything they reference is hashable, so +0 from me. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Nov 12, 2011 at 5:40 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sun, Nov 13, 2011 at 11:19 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 12 Nov 2011 17:15:08 -0800 Guido van Rossum <guido@python.org> wrote:
Aren't memoryview objects mutable? I think that the underlying memory can change, so it shouldn't be hashable.
Only if the original object is itself mutable, otherwise the memoryview is read-only.
I would propose the following algorithm: 1) try to calculate the original object's hash; if it fails, consider the memoryview unhashable (the buffer is probably mutable) 2) otherwise, calculate the memoryview's hash with the same algorithm as bytes objects (so that it's compatible with equality comparisons)
Having a memory view be hashable if the object it references is hashable seems analogous to the way tuples are hashable if everything they reference is hashable, so +0 from me.
Yeah, that's ok with me too. -- --Guido van Rossum (python.org/~guido)

Antoine Pitrou <solipsis@pitrou.net> wrote:
Only if the original object is itself mutable, otherwise the memoryview is read-only.
I would propose the following algorithm: 1) try to calculate the original object's hash; if it fails, consider the memoryview unhashable (the buffer is probably mutable)
With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects:
b1 = bytes([1,2,3,4]) b2 = bytes([4,3,2,1]) m1 = memoryview(b1) m2 = memoryview(b2)[::-1] m1 == m2 True hash(b1) 4154562130492273536 hash(b2) -1828484551660457336
Or:
a = array.array('L', [0]) b = b'\x00\x00\x00\x00\x00\x00\x00\x00' m_array = memoryview(a) m_bytes = memoryview(b) m_cast = m_array.cast('B') m_bytes == m_cast True hash(b) == hash(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'array.array'
Stefan Krah

On Sun, 13 Nov 2011 11:39:46 +0100 Stefan Krah <stefan@bytereef.org> wrote:
Antoine Pitrou <solipsis@pitrou.net> wrote:
Only if the original object is itself mutable, otherwise the memoryview is read-only.
I would propose the following algorithm: 1) try to calculate the original object's hash; if it fails, consider the memoryview unhashable (the buffer is probably mutable)
With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects:
b1 = bytes([1,2,3,4]) b2 = bytes([4,3,2,1]) m1 = memoryview(b1) m2 = memoryview(b2)[::-1]
I don't understand this feature. How do you represent a reversed buffer using the buffer API, and how do you ensure that consumers (especially those written in C) see the buffer reversed? Regardless, it's simply a matter of getting the hash algorithm right (i.e. iterate in logical order rather than memory order).
a = array.array('L', [0]) b = b'\x00\x00\x00\x00\x00\x00\x00\x00' m_array = memoryview(a) m_bytes = memoryview(b) m_cast = m_array.cast('B') m_bytes == m_cast True hash(b) == hash(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'array.array'
In this case, the memoryview wouldn't be hashable either. Regards Antoine.

On Sun, Nov 13, 2011 at 8:49 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't understand this feature. How do you represent a reversed buffer using the buffer API, and how do you ensure that consumers (especially those written in C) see the buffer reversed?
The values in the strides array are signed, so presumably just by specifying a "-1" for the relevant dimension (triggering all the usual failures if you encounter a buffer API consumer that can only handle C contiguous arrays). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou <solipsis@pitrou.net> wrote:
I would propose the following algorithm: 1) try to calculate the original object's hash; if it fails, consider the memoryview unhashable (the buffer is probably mutable)
With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects:
b1 = bytes([1,2,3,4]) b2 = bytes([4,3,2,1]) m1 = memoryview(b1) m2 = memoryview(b2)[::-1]
I don't understand this feature. How do you represent a reversed buffer using the buffer API, and how do you ensure that consumers (especially those written in C) see the buffer reversed?
In this case, view->buf points to the last memory location and view->strides is -1. In general, any PEP-3118 compliant consumer must only access elements of a buffer either directly via PyBuffer_GetPointer() or in an equivalent manner. Basically, this means that you start at view->buf (which may be *any* location in the memory block) and follow the strides until you reach the desired element. Objects/abstract.c: =================== void* PyBuffer_GetPointer(Py_buffer *view, Py_ssize_t *indices) { char* pointer; int i; pointer = (char *)view->buf; for (i = 0; i < view->ndim; i++) { pointer += view->strides[i]*indices[i]; if ((view->suboffsets != NULL) && (view->suboffsets[i] >= 0)) { pointer = *((char**)pointer) + view->suboffsets[i]; } } return (void*)pointer; }
Regardless, it's simply a matter of getting the hash algorithm right (i.e. iterate in logical order rather than memory order).
If you know how the original object computes the hash then this would work. It's not obvious to me how this would work beyond bytes objects though.
a = array.array('L', [0]) b = b'\x00\x00\x00\x00\x00\x00\x00\x00' m_array = memoryview(a) m_bytes = memoryview(b) m_cast = m_array.cast('B') m_bytes == m_cast True hash(b) == hash(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'array.array'
In this case, the memoryview wouldn't be hashable either.
Hmm, the point was that one could take the hash of m_bytes but not of m_cast, even though they are equal. Perhaps I misunderstood your proposal. I assumed that hash requests would be redirected to the original exporting object. As above, it would be possible to write a custom hash function for objects with type 'B'. Stefan Krah

On Sun, Nov 13, 2011 at 8:39 PM, Stefan Krah <stefan@bytereef.org> wrote:
Antoine Pitrou <solipsis@pitrou.net> wrote:
Only if the original object is itself mutable, otherwise the memoryview is read-only.
I would propose the following algorithm: 1) try to calculate the original object's hash; if it fails, consider the memoryview unhashable (the buffer is probably mutable)
With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects:
Note that Antoine isn't suggesting that the underlying hash be *used* as the memoryview's hash (that would be calculated according to the same rules as the equality comparison). Instead, the ability to hash the underlying object would just gate whether or not you could hash the memoryview at all. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan <ncoghlan@gmail.com> wrote:
With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects:
Note that Antoine isn't suggesting that the underlying hash be *used* as the memoryview's hash (that would be calculated according to the same rules as the equality comparison). Instead, the ability to hash the underlying object would just gate whether or not you could hash the memoryview at all.
I think they necessarily have to use the same hash, since: exporter = m1 ==> hash(exporter) = hash(m1) m1 = m2 ==> hash(m1) = hash(m2) Am I missing something? Stefan Krah

On Sun, 13 Nov 2011 13:05:24 +0100 Stefan Krah <stefan@bytereef.org> wrote:
Nick Coghlan <ncoghlan@gmail.com> wrote:
With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects:
Note that Antoine isn't suggesting that the underlying hash be *used* as the memoryview's hash (that would be calculated according to the same rules as the equality comparison). Instead, the ability to hash the underlying object would just gate whether or not you could hash the memoryview at all.
I think they necessarily have to use the same hash, since:
exporter = m1 ==> hash(exporter) = hash(m1) m1 = m2 ==> hash(m1) = hash(m2)
Am I missing something?
The hash must simply be calculated using the same algorithm (which can even be shared as a subroutine). It's already the case for more complicated types:
hash(1) == hash(1.0) == hash(Decimal(1)) == hash(Fraction(1)) True
Also, I think it's reasonable to limit hashability to one-dimensional memoryviews. Regards Antoine.

Antoine Pitrou <solipsis@pitrou.net> wrote:
Stefan Krah <stefan@bytereef.org> wrote:
I think they necessarily have to use the same hash, since:
exporter = m1 ==> hash(exporter) = hash(m1) m1 = m2 ==> hash(m1) = hash(m2)
Am I missing something?
The hash must simply be calculated using the same algorithm (which can even be shared as a subroutine). It's already the case for more complicated types:
hash(1) == hash(1.0) == hash(Decimal(1)) == hash(Fraction(1)) True
Yes, but we control those types. I was thinking more about third-party exporters. Then again, it would be possible to publish the unified hash function as part of the PEP. Perhaps we could simply use: PyBuffer_Hash = hash(obj.tobytes()) Since tobytes() follows the logical structure, it should work for non-contiguous and multidimensional arrays as well. Stefan Krah

Stefan Krah, 13.11.2011 13:05:
Nick Coghlan wrote:
With slices or the new casts (See: http://bugs.python.org/issue5231, implemented in http://hg.python.org/features/pep-3118#memoryview ), it is possible to have different hashes for equal objects:
Note that Antoine isn't suggesting that the underlying hash be *used* as the memoryview's hash (that would be calculated according to the same rules as the equality comparison). Instead, the ability to hash the underlying object would just gate whether or not you could hash the memoryview at all.
I think they necessarily have to use the same hash, since:
exporter = m1 ==> hash(exporter) = hash(m1) m1 = m2 ==> hash(m1) = hash(m2)
You can't expect the memoryview() to magically know what the underlying hash function is. The only guarantee you get is that iff two memoryview instances are looking at the same (subset of) data from two hashable objects (or the same object), you will get the same hash value for both. It may or may not correspond with the hash value that the buffer exporting objects would give you. Stefan

You can't expect the memoryview() to magically know what the underlying hash function is.
Hashable objects implementing the buffer interface could be required to make their hash implementation consistent with bytes hashing. IMO, that wouldn't be asking too much. There is already the issue that equality may not be transitive wrt. to buffer objects (e.g. a == memoryview(a) == memoryview(b) == b, but a != b). As that would be a bug in either a or b, failure to hash consistently would be a bug as well. Regards, Martin

Thinking of it, an alternative would be to implement lazy slices of bytes objects (Twisted uses buffer() for zero-copy slices). Regards Antoine. On Sun, 13 Nov 2011 01:23:59 +0100 Antoine Pitrou <solipsis@pitrou.net> wrote:
Hello everyone and Benjamin,
Currently, memoryview objects are unhashable:
hash(memoryview(b"")) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'memoryview'
Compare with Python 2.7:
hash(buffer("")) 0
memoryviews already support equality comparison:
b"" == memoryview(b"") True
If the original object providing the buffer is hashable, then it seems to make sense for the memoryview object to be hashable. This came while porting Twisted to Python 3.
What do you think?
Regards
Antoine.
participants (6)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Guido van Rossum
-
Nick Coghlan
-
Stefan Behnel
-
Stefan Krah