[IronPython] Question about dictionaries in general

Thu Feb 12 19:30:43 CET 2009

Thanks, Dino that was an interesting read. What amazed me was by how
much the new implementation beat the old locking one. The percentages
given are maybe not the most telling, -47% doesn't have the same
impact as "twice as fast" :) I totally agree with you on the COW, it
generally only works well for small collections or collections that
are rarely mutated. And your take on the ReaderWriterLock confirms my
experience with it, it just adds too much overhead (about 5 times as
much as Monitor) to be worth it for cheap operations. Probably it was
designed for more expensive IO related things. I wasn't aware of the
new ReaderWriterLockSlim, but the fact that it requires full trust
doesn't make it very appealing.

Thanks again for sharing your experiences and data on the subject.

-Dan

On Wed, Feb 11, 2009 at 5:39 PM, Dino Viehland <dinov at microsoft.com> wrote:
> Here are the original PyBench results from when we switched to the new dictionary implementation back in February of last year:
>
> Test                             minimum run-time        average  run-time
>                                 this    other   diff    this    other   diff
> -------------------------------------------------------------------------------
>            DictWithFloatKeys:   156ms   217ms  -28.1%   159ms   223ms  -28.9%
>          DictWithIntegerKeys:   105ms   198ms  -47.0%   107ms   200ms  -46.6%
>           DictWithStringKeys:   159ms   253ms  -37.2%   161ms   256ms  -37.1%
>       SimpleDictManipulation:   108ms   176ms  -38.7%   110ms   180ms  -38.7%
>
>
> (great timing, in 2 weeks the e-mail that had this data would have self destructed).
>
> "this" is the new implementation, and "other" was the old implementation based upon System.Collections.Generic.Dictionary<object, object> where we just locked on all operations.   The implementation has undergone frequent tweaks to get it tuned to a more broad range of scenarios since then but I believe the PyBench numbers have held within 10% or so.  The one possible exception is float keys which will be much faster in 2.0.1 and in the current sources - although I don't know why you'd ever hash on floats :).
>
> The big problem with the BCL dictionary is of course that it's not thread safe.  But we also solved another problem at the same time in that we used to have many different types that would pretend to by Python's dictionary type but weren't really.  So type({}) is type(someOtherObjectYouExpectedToReallyBeADict) was frequently false - but type({}) == type(...) would still be true due to this yucky type impersonation feature we had going on.  And we've gained a few other optimizations from having our own dictionary type - for example we can now quickly check whether or not a dictionary contains string keys for doing keyword splatting (an idea which even recently came up on python-dev).  So this wasn't entirely just about speeding it up via thread safety lock-free reads.
>
> As for the individual options I'll add a 4th one to your list - a copy on write dictionary.  But I didn't investigate either COW or a reader-writer lock implementation.  COW I didn't look at just because it seems likely that we'll be dealing with big dictionaries frequently enough that it is likely to lose.  Maybe a hybrid COW/lock free reads will eventually make sense but will also be much more complex.  I didn't look at reader/writer lock simply because I've looked at using reader/writer locks in the past (I believe I was looking at it for our list implementation).  .NET's basic ReaderWriterLock class has a lot of overhead when compared to just doing simple operations like a dictionary read/write.  It's really meant to be used when there's a lot of work being done under the lock.  In 3.0 they added ReaderWriterLockSlim which performs better but requires full trust!  Not that we can depend on System.Core yet anyway...  But I'd suspect that needing to take the lock on every r
>  ead will likely lose - and is only going to become a worse proposition over time as we get into a more and more multi-core world.
>
> My biggest take away is that I almost wish we made dict/list not thread safe in IronPython and said that you had to lock if you planned on using them concurrently.  But that's probably impractical as people probably want modules to not being corrupted under multiple readers/writers :)  Ahh, if we only implemented from __future__ import GIL :)
>
> -----Original Message-----
> From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Dan Eloff
> Sent: Wednesday, February 11, 2009 1:56 PM
> To: Discussion of IronPython
> Subject: [IronPython] Question about dictionaries in general
>
> Hi,
>
> This is not an IronPython question really, but I notice the dictionary
> architecture used by IronPython allows lock free read access. With
> hashtables being so important to dynamic languages this has always
> been a subject of interest to me. Did you do any comparisons with the
> System.Collections.Generic dictionary? There seems to be three basic
> strategies for making a thread-safe dictionary, dictionary with simple
> locking on reads & writes, reader-writer lock, and lock free reads. I
> would expect that the latter approach performs best, but it can depend
> on the ratio of reads to writes and how much contention there is. I'd
> be most interested to hear about the experiences of the IronPython
> developers on this subject.
>
> Thanks,
> -Dan
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>