<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 29 December 2016 at 19:20, Steven D'Aprano <span dir="ltr"><<a href="mailto:steve@pearwood.info" target="_blank">steve@pearwood.info</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>
</div></div>With respect Josh, I feel that this thread is based on premature<br>
optimization. It seems to me that you're *assuming* that anything less<br>
than some theoretically ideal O(1) space O(N) time hash function is<br>
clearly and obviously unsuitable.<br>
<br>
Of course I might be completely wrong. Perhaps you have implemented your<br>
own __hash__ methods as suggested by the docs, as well as Ned's version,<br>
and profiled your code and discovered that __hash__ is a significant<br>
bottleneck. In which case, I'll apologise for doubting you, but in my<br>
defence I'll say that the language you have used in this thread so far<br>
gives no hint that you've actually profiled your code and found the<br>
bottleneck.<br></blockquote><div><br></div><div>In Josh's defence, the initial use case he put forward is exactly the kind of scenario where it's worthwhile optimising ahead of time. Quite often a poorly implemented hash function doesn't manifest as a problem until you scale up massively - and a developer may not have the capability to scale up to a suitable level in-house, resulting in performance issues at customer sites.</div><div><br></div><div>I had one particular case (fortunately discovered before going to customers) where a field was included in the equality check, but wasn't part of the hash. Unfortunately, the lack of this one field resulted in objects only being allocated to a few buckets (in a Java HashMap), resulting in every access having to walk a potentially very long chain doing equality comparisons - O(N) behaviour from an amortised O(1) data structure.</div><div><br></div><div>Unit tests - no worries. Small-scale tests - everything looked fine. Once we started our load tests though everything slowed to a crawl. 100% CPU, throughput at a standstill ... it didn't look good.</div><div><br></div><div>Adding that one field to the hash resulted in the ability to scale up to hundreds of thousands of objects with minimal CPU. I can't remember if it was millions we tested to (it was around 10 years ago ...).</div><div><br></div><div>Tim Delaney </div></div></div></div>