a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?
Antoine Pitrou
solipsis at pitrou.net
Fri Dec 11 09:00:46 EST 2009
Le Wed, 09 Dec 2009 06:58:11 -0800, Valery a écrit :
>
> I have a huge data structure that takes >50% of RAM. My goal is to have
> many computational threads (or processes) that can have an efficient
> read-access to the huge and complex data structure.
>
> "Efficient" in particular means "without serialization" and "without
> unneeded lockings on read-only data"
I was going to suggest memcached but it probably serializes non-atomic
types. It doesn't mean it will be slow, though. Serialization implemented
in C may well be faster than any "smart" non-serializing scheme
implemented in Python.
> 2. multi-threading
> => d. CPython is told to have problems here because of GIL -- any
> comments?
What do you call "problems because of the GIL"? It is quite a vague
statement, and an answer would depend on your OS, the number of threads
you're willing to run, and whether you want to extract throughput from
multiple threads or are just concerned about latency.
In any case, you have to do some homework and compare the various
approaches on your own data, and decide whether the numbers are
satisfying to you.
> I am a big fan of parallel map() approach
I don't see what map() has to do with accessing data. map() is for
*processing* of data. In other words, whether or not you use a map()-like
primitive does not say anything about how the underlying data should be
accessed.
More information about the Python-list
mailing list