a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

Valery khamenya at gmail.com
Sat Dec 26 08:07:28 EST 2009


Hi Antoine

On Dec 11, 3:00 pm, Antoine Pitrou <solip... at pitrou.net> wrote:
> I was going to suggest memcached but it probably serializes non-atomic
> types. It doesn't mean it will be slow, though. Serialization implemented
> in C may well be faster than any "smart" non-serializing scheme
> implemented in Python.

No serializing could be faster than NO serializing at all :)

If child process could directly read the parent RAM -- what could be
better?

> What do you call "problems because of the GIL"? It is quite a vague
> statement, and an answer would depend on your OS, the number of threads
> you're willing to run, and whether you want to extract throughput from
> multiple threads or are just concerned about latency.

it seems to be a known fact, that only one CPython iterpreter will be
running at a time, because a thread is aquiring the GIL during the
execution and other threads within same process are then just waiting
for GIL to be released.


> In any case, you have to do some homework and compare the various
> approaches on your own data, and decide whether the numbers are
> satisfying to you.

well, I the least evil is to pack-unpack things into array.array and/
or similarly NumPy.

I do hope that Klauss' patch will be accepted, because it will let me
to forget a lot of those unneeded packing-unpacking.


> > I am a big fan of parallel map() approach
>
> I don't see what map() has to do with accessing data. map() is for
> *processing* of data. In other words, whether or not you use a map()-like
> primitive does not say anything about how the underlying data should be
> accessed.

right. However, saying "a big fan" has had another focus here: if you
write your code based on maps then you have a tiny effort to convert
your code into a MULTIprocessing one :)

just that.

Kind regards.
Valery



More information about the Python-list mailing list