It's best to avoid those synchronization barriers if possible. If you have all of the data in SHM (RAM) on one node, and you need to notify processes / wait for other workers to be available to perform a task that requires that data, you need a method for IPC: a queue, channel subscriptions, a source/sink, over-frequent polling that's more resilient against dropped messages. (But you only need to scale to one node). There needs to be a shared structure that tracks allocations, right? What does it need to do lookups by. [ [obj_id_or_shm_pointer, [subscribers]] ] Does the existing memory pool solve for that? And there also needs to be an instruction pipeline; a queue/channel/source of messages for each worker or only some workers to process. ... https://distributed.dask.org/en/latest/journey.html https://distributed.dask.org/en/latest/work-stealing.html "Accelerate intra-node IPC with shared memory" https://github.com/dask/dask/issues/6267 On Sun, Aug 2, 2020, 3:21 AM Vinay Sharma <vinay04sharma@icloud.com> wrote:
I understand that I won’t need locks with immutable objects at some level, but I don’t understand how they can be used to synchronise shared memory segments.
For every change in an immutable object, a copy is created which will have a different address. Now, for processes to use this updated object they will have to remap a new address in their address space for them to see any changes, and this remap will have to occur whenever a change takes place, which is obviously not feasible.
So, changes in the shared memory segment should be done in the shared memory segment itself, therefore shared memory segments should be mutable.
On 02-Aug-2020, at 5:11 AM, Wes Turner <wes.turner@gmail.com> wrote:
https://docs.dask.org/en/latest/shared.html#known-limitations :
Known Limitations The shared memory scheduler has some notable limitations:
- It works on a single machine - The threaded scheduler is limited by the GIL on Python code, so if your operations are pure python functions, you should not expect a multi-core speedup - The multiprocessing scheduler must serialize functions between workers, which can fail - The multiprocessing scheduler must serialize data between workers and the central process, which can be expensive - The multiprocessing scheduler cannot transfer data directly between worker processes; all data routes through the master process.
... https://distributed.dask.org/en/latest/memory.html#difference-with-dask-comp...
(... https://github.com/dask/dask-labextension )
On Sat, Aug 1, 2020 at 7:34 PM Wes Turner <wes.turner@gmail.com> wrote:
PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent
https://arrow.apache.org/docs/python/plasma.html#object-ids https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer
Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer.
To create an object for Plasma, you need to create an object ID, as well as give the object’s maximum size in bytes. ```python # Create an object buffer. object_id = plasma.ObjectID(20 * b"a") object_size = 1000 buffer = memoryview(client.create(object_id, object_size))
# Write to the buffer. for i in range(1000): buffer[i] = i % 128 ```
When the client is done, the client seals the buffer, making the object immutable, and making it available to other Plasma clients.
```python # Seal the object. This makes the object immutable and available to other clients. client.seal(object_id) ```
https://pypi.org/project/pyrsistent/ also supports immutable structures
On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith <eric@trueblade.com> wrote:
On 8/1/2020 1:25 PM, Marco Sulla wrote:
You don't need locks with immutable objects. Since they're immutable, any operation that usually will mutate the object, generate another immutable instead. The most common example is str: the sum of two strings in Python (and in many other languages) produces a new string.
While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You
need to consider this if you're sharing objects without locking or other
synchronization.
Eric
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHF... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IRDFSJ... Code of Conduct: http://python.org/psf/codeofconduct/