<div class="gmail_quote">On 11 September 2012 10:51, Duncan Booth <span dir="ltr"><<a href="mailto:duncan.booth@invalid.invalid" target="_blank">duncan.booth@invalid.invalid</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">Oscar Benjamin <<a href="mailto:oscar.j.benjamin@gmail.com">oscar.j.benjamin@gmail.com</a>> wrote:<br>
<br>
>> What interning buys you is that "s == t" is an O(1) pointer compare<br>
>> if they are equal. But if s and t differ in the last character,<br>
>> __eq__ will still inspect every character. There is no way to tell<br>
>> Python "all strings are interned, if s is not t then s != t as well".<br>
>><br>
><br>
</div><div class="im">> I thought that if *both* strings were interned then a pointer<br>
> comparison could decide if they were unequal without needing to check<br>
> the characters.<br>
><br>
> Have I misunderstood how intern() works?<br>
><br>
<br>
</div>I don't think you've misunderstood how it work, but so far as I can see the<br>
code doesn't attempt to short circuit the "not equal but interned" case.<br>
The comparison code doesn't look at interning at all, it only looks for<br>
identity as a shortcut.</blockquote><div><br></div><div>It also doesn't seem to check if the hash values have been set. I guess the cached hash value is only used in contexts where the hash is explicitly desired.</div>
<div><br></div><div>That makes two optimisations that can bring worst case string comparison down to O(1) in many contexts that are available to cpython but unused. But then if full string comparison is already on average O(1) then the cost of checking the interned and hash flags for every string comparison would outweigh the benefits of avoiding the rare worst case O(N) comparisons.</div>
<div><br></div><div>Oscar</div></div>