Mailman 3 How to prevent shared memory from being corrupted ? - Python-ideas

How to prevent shared memory from being corrupted ?

older
Default behavior for random.sample...

Vinay Sharma

26 Jul 2020 26 Jul '20

7:09 a.m.

Problem: Currently, let’s say I create a shared_memory segment using mulitprocessing.shared_memory.SharedMemory <https://docs.python.org/3.10/library/multiprocessing.shared_memory.html> in Process 1 and open the same in Process 2. Then, I try to write some data to the shared memory segment using both the processes, so for me to prevent any race condition (data corruption), either these operations must be atomic, or I should be able to lock / unlock shared memory segment, which I cannot at the moment. I earlier posted a solution <https://mail.python.org/archives/list/python-ideas@python.org/thread/X4AKFFMYEKW6GFOUMXMOJ2OBINNY2Q6L/> to this problem, which received positive response, but there weren’t many responses to it, despite the fact this problem makes shared_memory practically unusable if there are simultaneous writes. So, the purpose of this post is to have discussion about the solution of the same. Some Solutions: 1. Support for shared semaphores across unrelated processes to lock/unlock shared memory segment. --> More Details <https://mail.python.org/archives/list/python-ideas@python.org/thread/X4AKFFMYEKW6GFOUMXMOJ2OBINNY2Q6L/> 2. Let the first bit in the shared memory segment be the synchronisation bit used for locking/unlocking. --> A process can only lock the shared memory segment if this bit is unset, and will set this bit after acquiring lock, and similarly unset this bit after unlocking. Although set/unset operations must be atomic. Therefore, the following tools can be used: type __sync_add_and_fetch (type *ptr, type value, ...) type __sync_sub_and_fetch (type *ptr, type value, ...) type __sync_or_and_fetch (type *ptr, type value, ...) Documentation of these can be found at below links: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005... <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#g_t_005f_005fsync-Builtins> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html#g_t_005f_0... <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html#g_t_005f_005fatomic-Builtins> Any other ideas/solutions are very welcome.

Attachments:

attachment.htm (text/html — 3.4 KB)

Show replies by date

Robert Collins

27 Jul 27 Jul

10:15 a.m.

On Sun, 26 Jul 2020 at 19:11, Vinay Sharma via Python-ideas <python-ideas@python.org> wrote:

...

Problem: Currently, let’s say I create a shared_memory segment using mulitprocessing.shared_memory.SharedMemory in Process 1 and open the same in Process 2. Then, I try to write some data to the shared memory segment using both the processes, so for me to prevent any race condition (data corruption), either these operations must be atomic, or I should be able to lock / unlock shared memory segment, which I cannot at the moment.

I earlier posted a solution to this problem, which received positive response, but there weren’t many responses to it, despite the fact this problem makes shared_memory practically unusable if there are simultaneous writes. So, the purpose of this post is to have discussion about the solution of the same.

One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API. A safe API shouldn't allow writes to the memory segment while the mutex is unlocked, rather than allowing one to build a safe API from the various pieces. (There may / will be lower level primitives that are unsafe). We can look at a lot of the APIs in the Rust community for examples of this sort of thing. Python doesn't have the borrow checker to enforce usage, but we could still work from the same basic principle - given there are multiple processes involved that make it easier to have safe outcomes. For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another) For instance, https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut is an in-process API that is very similar. -Rob

Vinay Sharma

11:24 a.m.

Hi, Thanks for replying.

...

One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

Agreed. That’s why I am more inclined to the second solution that I mentioned.

...

For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

Where will this memory object be stored ? Locking a particular range instead of the whole memory segment will be relatively efficient because processes using different ranges can write simultaneously. Since, this object will also be shared across multiple processes, there must be a safe way to update it. Any thoughts on that ?

...

On 27-Jul-2020, at 3:50 PM, Robert Collins <robertc@robertcollins.net> wrote:

On Sun, 26 Jul 2020 at 19:11, Vinay Sharma via Python-ideas <python-ideas@python.org> wrote:

...
Problem: Currently, let’s say I create a shared_memory segment using mulitprocessing.shared_memory.SharedMemory in Process 1 and open the same in Process 2. Then, I try to write some data to the shared memory segment using both the processes, so for me to prevent any race condition (data corruption), either these operations must be atomic, or I should be able to lock / unlock shared memory segment, which I cannot at the moment.

I earlier posted a solution to this problem, which received positive response, but there weren’t many responses to it, despite the fact this problem makes shared_memory practically unusable if there are simultaneous writes. So, the purpose of this post is to have discussion about the solution of the same.

One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

A safe API shouldn't allow writes to the memory segment while the mutex is unlocked, rather than allowing one to build a safe API from the various pieces. (There may / will be lower level primitives that are unsafe).

We can look at a lot of the APIs in the Rust community for examples of this sort of thing.

Python doesn't have the borrow checker to enforce usage, but we could still work from the same basic principle - given there are multiple processes involved that make it easier to have safe outcomes.

For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

For instance, https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut is an in-process API that is very similar.

-Rob

Robert Collins

11:49 p.m.

On Mon, 27 Jul 2020 at 23:24, Vinay Sharma <vinay04sharma@icloud.com> wrote:

...

Hi, Thanks for replying.

...
One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

Agreed. That’s why I am more inclined to the second solution that I mentioned.

The second approach isn't clearly specified yet: is 'sync' in the name implying a mutex, an RW lock, or dependent on pointers to atomic types (which then becomes a portability challenge in some cases). The C++ atomics documentation you linked to documents a similar but differently named set of methods, so you'll need to clarify the difference you intend.> > For instance, we could have an object representing a memory range that

...

...
doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

Where will this memory object be stored ?

There are a few options. The most obvious one given that bookkeeping data is required, is to build a separate layer offering this functionality, which uses the now batteries-included SHM facilities as part of its implementation, but doesn't directly surface it.

...

Locking a particular range instead of the whole memory segment will be relatively efficient because processes using different ranges can write simultaneously.

Since, this object will also be shared across multiple processes, there must be a safe way to update it.

There's a lot of prior art on named locks of various sorts, I'd personally be inclined to give the things a name that can be used across different processes in some form and bootstrap from there.

...

Any thoughts on that ?

...
On 27-Jul-2020, at 3:50 PM, Robert Collins <robertc@robertcollins.net> wrote:

On Sun, 26 Jul 2020 at 19:11, Vinay Sharma via Python-ideas <python-ideas@python.org> wrote:

...
Problem: Currently, let’s say I create a shared_memory segment using mulitprocessing.shared_memory.SharedMemory in Process 1 and open the same in Process 2. Then, I try to write some data to the shared memory segment using both the processes, so for me to prevent any race condition (data corruption), either these operations must be atomic, or I should be able to lock / unlock shared memory segment, which I cannot at the moment.

I earlier posted a solution to this problem, which received positive response, but there weren’t many responses to it, despite the fact this problem makes shared_memory practically unusable if there are simultaneous writes. So, the purpose of this post is to have discussion about the solution of the same.

One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

A safe API shouldn't allow writes to the memory segment while the mutex is unlocked, rather than allowing one to build a safe API from the various pieces. (There may / will be lower level primitives that are unsafe).

We can look at a lot of the APIs in the Rust community for examples of this sort of thing.

Python doesn't have the borrow checker to enforce usage, but we could still work from the same basic principle - given there are multiple processes involved that make it easier to have safe outcomes.

For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

For instance, https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut is an in-process API that is very similar.

-Rob

Vinay Sharma

30 Jul 30 Jul

10:55 a.m.

...

On 28-Jul-2020, at 5:19 AM, Robert Collins <robertc@robertcollins.net> wrote:

On Mon, 27 Jul 2020 at 23:24, Vinay Sharma <vinay04sharma@icloud.com> wrote:

...
Hi, Thanks for replying.

...
One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

Agreed. That’s why I am more inclined to the second solution that I mentioned.

The second approach isn't clearly specified yet: is 'sync' in the name implying a mutex, an RW lock, or dependent on pointers to atomic types (which then becomes a portability challenge in some cases). The C++ atomics documentation you linked to documents a similar but differently named set of methods, so you'll need to clarify the difference you intend.

Python has support for atomic types, I guess: Atomic Int: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... Atomic Store: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... <https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L94> And, these methods don’t use any locks, they are just atomic operations. So, my approach was to lock the whole shared memory segment at once, and to do that we can store an integer at the beginning of every shared memory segment, which will denote whether this segment is locked (1), or unlocked (0), and atomic operations can be used to update this integer ( 0 -> 1) lock, (1 -> 0) unlock. Although, `wait` function will have to be implemented like in semaphores, which will wait until the segment is free (becomes 0).

...

...
...
For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

Where will this memory object be stored ?

There are a few options. The most obvious one given that bookkeeping data is required, is to build a separate layer offering this functionality, which uses the now batteries-included SHM facilities as part of its implementation, but doesn't directly surface it.

Can you please elaborate more on this ? I understand that shared memory will be used to store ranges and whether they are being locked/unlocked, etc. But if multiple process can update this data, then we will also have to think about the synchronisation of this book-keeping data. So, I guess you mean to say that all processes will be allotted shared memory using a separate API/layer, which will take care of book-keeping, and since this separate API/layer will be only responsible for book-keeping, there will be no need to synchronise book-keeping data. But, then the question arises how will unrelated processes communicate with this layer/API to request shared memory. One way could be that a separate process managing this book-keeping could be created, and other process will request access/lock/unlock using this separate process. And the communication between between this layer (separate process) and the other processes (using shared memory) will be using some form of IPC.

...

...
Locking a particular range instead of the whole memory segment will be relatively efficient because processes using different ranges can write simultaneously.

Since, this object will also be shared across multiple processes, there must be a safe way to update it.

There's a lot of prior art on named locks of various sorts, I'd personally be inclined to give the things a name that can be used across different processes in some form and bootstrap from there.

...
Any thoughts on that ?

...
On 27-Jul-2020, at 3:50 PM, Robert Collins <robertc@robertcollins.net> wrote:

On Sun, 26 Jul 2020 at 19:11, Vinay Sharma via Python-ideas <python-ideas@python.org> wrote:

...
Problem: Currently, let’s say I create a shared_memory segment using mulitprocessing.shared_memory.SharedMemory in Process 1 and open the same in Process 2. Then, I try to write some data to the shared memory segment using both the processes, so for me to prevent any race condition (data corruption), either these operations must be atomic, or I should be able to lock / unlock shared memory segment, which I cannot at the moment.

I earlier posted a solution to this problem, which received positive response, but there weren’t many responses to it, despite the fact this problem makes shared_memory practically unusable if there are simultaneous writes. So, the purpose of this post is to have discussion about the solution of the same.

One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

A safe API shouldn't allow writes to the memory segment while the mutex is unlocked, rather than allowing one to build a safe API from the various pieces. (There may / will be lower level primitives that are unsafe).

We can look at a lot of the APIs in the Rust community for examples of this sort of thing.

Python doesn't have the borrow checker to enforce usage, but we could still work from the same basic principle - given there are multiple processes involved that make it easier to have safe outcomes.

For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

For instance, https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut is an in-process API that is very similar.

-Rob

Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7BDCJY... Code of Conduct: http://python.org/psf/codeofconduct/

Barry Scott

4:40 p.m.

...

On 30 Jul 2020, at 11:55, Vinay Sharma via Python-ideas <python-ideas@python.org> wrote:

...
On 28-Jul-2020, at 5:19 AM, Robert Collins <robertc@robertcollins.net <mailto:robertc@robertcollins.net>> wrote:

On Mon, 27 Jul 2020 at 23:24, Vinay Sharma <vinay04sharma@icloud.com <mailto:vinay04sharma@icloud.com>> wrote:

...
Hi, Thanks for replying.

...
One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

Agreed. That’s why I am more inclined to the second solution that I mentioned.

The second approach isn't clearly specified yet: is 'sync' in the name implying a mutex, an RW lock, or dependent on pointers to atomic types (which then becomes a portability challenge in some cases). The C++ atomics documentation you linked to documents a similar but differently named set of methods, so you'll need to clarify the difference you intend.

Python has support for atomic types, I guess: Atomic Int: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... <https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L80> Atomic Store: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... <https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L94>

And, these methods don’t use any locks, they are just atomic operations. So, my approach was to lock the whole shared memory segment at once, and to do that we can store an integer at the beginning of every shared memory segment, which will denote whether this segment is locked (1), or unlocked (0), and atomic operations can be used to update this integer ( 0 -> 1) lock, (1 -> 0) unlock. Although, `wait` function will have to be implemented like in semaphores, which will wait until the segment is free (becomes 0).

Surely you need locks and semaphores that work between processes? Both unix and Windows have these primitives. The atomics are great for lockless changing of single ints, but anything more complex needs locks and semaphores. Surely you do not want to be implementing your own locks with the OS support that works well with OS scheduling? Barry

...

...
...
...
For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

Where will this memory object be stored ?

There are a few options. The most obvious one given that bookkeeping data is required, is to build a separate layer offering this functionality, which uses the now batteries-included SHM facilities as part of its implementation, but doesn't directly surface it.

Can you please elaborate more on this ? I understand that shared memory will be used to store ranges and whether they are being locked/unlocked, etc. But if multiple process can update this data, then we will also have to think about the synchronisation of this book-keeping data.

So, I guess you mean to say that all processes will be allotted shared memory using a separate API/layer, which will take care of book-keeping, and since this separate API/layer will be only responsible for book-keeping, there will be no need to synchronise book-keeping data.

But, then the question arises how will unrelated processes communicate with this layer/API to request shared memory.

One way could be that a separate process managing this book-keeping could be created, and other process will request access/lock/unlock using this separate process.

And the communication between between this layer (separate process) and the other processes (using shared memory) will be using some form of IPC.

...
...
Locking a particular range instead of the whole memory segment will be relatively efficient because processes using different ranges can write simultaneously.

Since, this object will also be shared across multiple processes, there must be a safe way to update it.

There's a lot of prior art on named locks of various sorts, I'd personally be inclined to give the things a name that can be used across different processes in some form and bootstrap from there.

...
Any thoughts on that ?

...
On 27-Jul-2020, at 3:50 PM, Robert Collins <robertc@robertcollins.net <mailto:robertc@robertcollins.net>> wrote:

On Sun, 26 Jul 2020 at 19:11, Vinay Sharma via Python-ideas <python-ideas@python.org <mailto:python-ideas@python.org>> wrote:

...
Problem: Currently, let’s say I create a shared_memory segment using mulitprocessing.shared_memory.SharedMemory in Process 1 and open the same in Process 2. Then, I try to write some data to the shared memory segment using both the processes, so for me to prevent any race condition (data corruption), either these operations must be atomic, or I should be able to lock / unlock shared memory segment, which I cannot at the moment.

I earlier posted a solution to this problem, which received positive response, but there weren’t many responses to it, despite the fact this problem makes shared_memory practically unusable if there are simultaneous writes. So, the purpose of this post is to have discussion about the solution of the same.

One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

A safe API shouldn't allow writes to the memory segment while the mutex is unlocked, rather than allowing one to build a safe API from the various pieces. (There may / will be lower level primitives that are unsafe).

We can look at a lot of the APIs in the Rust community for examples of this sort of thing.

Python doesn't have the borrow checker to enforce usage, but we could still work from the same basic principle - given there are multiple processes involved that make it easier to have safe outcomes.

For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

For instance, https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut <https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut> is an in-process API that is very similar.

-Rob

Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7BDCJY... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B2HLK4... <https://mail.python.org/archives/list/python-ideas@python.org/message/B2HLK4N7TYHO535W5HEHVZL3E6SYANOI/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>

Vinay Sharma

31 Jul 31 Jul

8:07 a.m.

I think you are talking about shared/named semaphores. The problem with them is that python doesn’t have support for shared semaphores, so the first step would be to build an API providing access to shared semaphores, and then this API can be used to synchronise shared memory.

...

On 30-Jul-2020, at 10:10 PM, Barry Scott <barry@barrys-emacs.org> wrote:

...
On 30 Jul 2020, at 11:55, Vinay Sharma via Python-ideas <python-ideas@python.org <mailto:python-ideas@python.org>> wrote:

...
On 28-Jul-2020, at 5:19 AM, Robert Collins <robertc@robertcollins.net <mailto:robertc@robertcollins.net>> wrote:

On Mon, 27 Jul 2020 at 23:24, Vinay Sharma <vinay04sharma@icloud.com <mailto:vinay04sharma@icloud.com>> wrote:

...
Hi, Thanks for replying.

...
One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

Agreed. That’s why I am more inclined to the second solution that I mentioned.

The second approach isn't clearly specified yet: is 'sync' in the name implying a mutex, an RW lock, or dependent on pointers to atomic types (which then becomes a portability challenge in some cases). The C++ atomics documentation you linked to documents a similar but differently named set of methods, so you'll need to clarify the difference you intend.

Python has support for atomic types, I guess: Atomic Int: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... <https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L80> Atomic Store: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... <https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L94>

And, these methods don’t use any locks, they are just atomic operations. So, my approach was to lock the whole shared memory segment at once, and to do that we can store an integer at the beginning of every shared memory segment, which will denote whether this segment is locked (1), or unlocked (0), and atomic operations can be used to update this integer ( 0 -> 1) lock, (1 -> 0) unlock. Although, `wait` function will have to be implemented like in semaphores, which will wait until the segment is free (becomes 0).

Surely you need locks and semaphores that work between processes? Both unix and Windows have these primitives.

The atomics are great for lockless changing of single ints, but anything more complex needs locks and semaphores.

Surely you do not want to be implementing your own locks with the OS support that works well with OS scheduling?

Barry

...
...
...
...
For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

Where will this memory object be stored ?

There are a few options. The most obvious one given that bookkeeping data is required, is to build a separate layer offering this functionality, which uses the now batteries-included SHM facilities as part of its implementation, but doesn't directly surface it.

Can you please elaborate more on this ? I understand that shared memory will be used to store ranges and whether they are being locked/unlocked, etc. But if multiple process can update this data, then we will also have to think about the synchronisation of this book-keeping data.

So, I guess you mean to say that all processes will be allotted shared memory using a separate API/layer, which will take care of book-keeping, and since this separate API/layer will be only responsible for book-keeping, there will be no need to synchronise book-keeping data.

But, then the question arises how will unrelated processes communicate with this layer/API to request shared memory.

One way could be that a separate process managing this book-keeping could be created, and other process will request access/lock/unlock using this separate process.

And the communication between between this layer (separate process) and the other processes (using shared memory) will be using some form of IPC.

...
...
Locking a particular range instead of the whole memory segment will be relatively efficient because processes using different ranges can write simultaneously.

Since, this object will also be shared across multiple processes, there must be a safe way to update it.

There's a lot of prior art on named locks of various sorts, I'd personally be inclined to give the things a name that can be used across different processes in some form and bootstrap from there.

...
Any thoughts on that ?

...
On 27-Jul-2020, at 3:50 PM, Robert Collins <robertc@robertcollins.net <mailto:robertc@robertcollins.net>> wrote:

On Sun, 26 Jul 2020 at 19:11, Vinay Sharma via Python-ideas <python-ideas@python.org <mailto:python-ideas@python.org>> wrote:

...
Problem: Currently, let’s say I create a shared_memory segment using mulitprocessing.shared_memory.SharedMemory in Process 1 and open the same in Process 2. Then, I try to write some data to the shared memory segment using both the processes, so for me to prevent any race condition (data corruption), either these operations must be atomic, or I should be able to lock / unlock shared memory segment, which I cannot at the moment.

I earlier posted a solution to this problem, which received positive response, but there weren’t many responses to it, despite the fact this problem makes shared_memory practically unusable if there are simultaneous writes. So, the purpose of this post is to have discussion about the solution of the same.

One thing that is worth thinking about is the safety of the API that is put together. A memory segment plus a separate detached semaphore or mutex can be used to build a safe API, but is not itself a safe API.

A safe API shouldn't allow writes to the memory segment while the mutex is unlocked, rather than allowing one to build a safe API from the various pieces. (There may / will be lower level primitives that are unsafe).

We can look at a lot of the APIs in the Rust community for examples of this sort of thing.

Python doesn't have the borrow checker to enforce usage, but we could still work from the same basic principle - given there are multiple processes involved that make it easier to have safe outcomes.

For instance, we could have an object representing a memory range that doesn't offer read/write at all, but allows: - either one process write access over the range - or any number of readers read access over the range - allows subdividing the range (so that you can e.g. end one write lock and keep another)

For instance, https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut <https://doc.rust-lang.org/std/vec/struct.Vec.html#method.split_at_mut> is an in-process API that is very similar.

-Rob

Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7BDCJY... <https://mail.python.org/archives/list/python-ideas@python.org/message/7BDCJYNXUJY6S3H3B3EDZZV5ZIUJOWD5/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B2HLK4... <https://mail.python.org/archives/list/python-ideas@python.org/message/B2HLK4N7TYHO535W5HEHVZL3E6SYANOI/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PPNQX3... <https://mail.python.org/archives/list/python-ideas@python.org/message/PPNQX3JDYBSFYVWC3EPRALLTJSCDHO6G/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>

Marco Sulla

8:01 p.m.

On Thu, 30 Jul 2020 at 12:57, Vinay Sharma via Python-ideas < python-ideas@python.org> wrote:

...

Python has support for atomic types, I guess: Atomic Int: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... Atomic Store: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic...

You could also use immutables: https://nextjournal.com/schmudde/adventures-in-immutable-python

Vinay Sharma

1 Aug 1 Aug

11:08 a.m.

...

On 01-Aug-2020, at 1:31 AM, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:

On Thu, 30 Jul 2020 at 12:57, Vinay Sharma via Python-ideas <python-ideas@python.org <mailto:python-ideas@python.org>> wrote: Python has support for atomic types, I guess: Atomic Int: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... <https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L80> Atomic Store: https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic... <https://github.com/python/cpython/blob/master/Include/internal/pycore_atomic.h#L94>

You could also use immutables: https://nextjournal.com/schmudde/adventures-in-immutable-python <https://nextjournal.com/schmudde/adventures-in-immutable-python> Could you please elaborate a bit more on this ? I think your idea is to store data in Plasma store, but what exactly are you suggesting I store ? As far as I understand plasma store is used to store immutable objects, but neither python’s shared_memory API stored immutable objects, not the locking mechanism discussed would store immutable locks.

...

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/URLHX7... Code of Conduct: http://python.org/psf/codeofconduct/

Marco Sulla

5:25 p.m.

You don't need locks with immutable objects. Since they're immutable, any operation that usually will mutate the object, generate another immutable instead. The most common example is str: the sum of two strings in Python (and in many other languages) produces a new string. This is usually slower than modifying a mutable object (as atomic types), but they allow you to remove the bottleneck of a lock. See also immutables.Map: https://github.com/MagicStack/immutables

Eric V. Smith

6:43 p.m.

On 8/1/2020 1:25 PM, Marco Sulla wrote:

...

You don't need locks with immutable objects. Since they're immutable, any operation that usually will mutate the object, generate another immutable instead. The most common example is str: the sum of two strings in Python (and in many other languages) produces a new string.

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization. Eric

Wes Turner

11:34 p.m.

PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent https://arrow.apache.org/docs/python/plasma.html#object-ids https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer

...

Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer.

To create an object for Plasma, you need to create an object ID, as well as give the object’s maximum size in bytes. ```python # Create an object buffer. object_id = plasma.ObjectID(20 * b"a") object_size = 1000 buffer = memoryview(client.create(object_id, object_size))

# Write to the buffer. for i in range(1000): buffer[i] = i % 128 ```

When the client is done, the client seals the buffer, making the object immutable, and making it available to other Plasma clients.

```python # Seal the object. This makes the object immutable and available to other clients. client.seal(object_id) ```

https://pypi.org/project/pyrsistent/ also supports immutable structures On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith <eric@trueblade.com> wrote:

...

On 8/1/2020 1:25 PM, Marco Sulla wrote:

...
You don't need locks with immutable objects. Since they're immutable, any operation that usually will mutate the object, generate another immutable instead. The most common example is str: the sum of two strings in Python (and in many other languages) produces a new string.

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

Eric

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHF... Code of Conduct: http://python.org/psf/codeofconduct/

Wes Turner

11:41 p.m.

...

Known Limitations The shared memory scheduler has some notable limitations:

- It works on a single machine - The threaded scheduler is limited by the GIL on Python code, so if your operations are pure python functions, you should not expect a multi-core speedup - The multiprocessing scheduler must serialize functions between workers, which can fail - The multiprocessing scheduler must serialize data between workers and

https://docs.dask.org/en/latest/shared.html#known-limitations : the central process, which can be expensive

...

- The multiprocessing scheduler cannot transfer data directly between worker processes; all data routes through the master process.

... https://distributed.dask.org/en/latest/memory.html#difference-with-dask-comp... (... https://github.com/dask/dask-labextension ) On Sat, Aug 1, 2020 at 7:34 PM Wes Turner <wes.turner@gmail.com> wrote:

...

PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent

https://arrow.apache.org/docs/python/plasma.html#object-ids https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer

...
Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer.

To create an object for Plasma, you need to create an object ID, as well as give the object’s maximum size in bytes. ```python # Create an object buffer. object_id = plasma.ObjectID(20 * b"a") object_size = 1000 buffer = memoryview(client.create(object_id, object_size))

# Write to the buffer. for i in range(1000): buffer[i] = i % 128 ```

When the client is done, the client seals the buffer, making the object immutable, and making it available to other Plasma clients.

```python # Seal the object. This makes the object immutable and available to other clients. client.seal(object_id) ```

https://pypi.org/project/pyrsistent/ also supports immutable structures

On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/1/2020 1:25 PM, Marco Sulla wrote:

...
You don't need locks with immutable objects. Since they're immutable, any operation that usually will mutate the object, generate another immutable instead. The most common example is str: the sum of two strings in Python (and in many other languages) produces a new string.

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

Eric

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHF... Code of Conduct: http://python.org/psf/codeofconduct/

Vinay Sharma

2 Aug 2 Aug

7:21 a.m.

I understand that I won’t need locks with immutable objects at some level, but I don’t understand how they can be used to synchronise shared memory segments. For every change in an immutable object, a copy is created which will have a different address. Now, for processes to use this updated object they will have to remap a new address in their address space for them to see any changes, and this remap will have to occur whenever a change takes place, which is obviously not feasible. So, changes in the shared memory segment should be done in the shared memory segment itself, therefore shared memory segments should be mutable.

...

On 02-Aug-2020, at 5:11 AM, Wes Turner <wes.turner@gmail.com> wrote:

https://docs.dask.org/en/latest/shared.html#known-limitations <https://docs.dask.org/en/latest/shared.html#known-limitations> :

...
Known Limitations The shared memory scheduler has some notable limitations:

- It works on a single machine - The threaded scheduler is limited by the GIL on Python code, so if your operations are pure python functions, you should not expect a multi-core speedup - The multiprocessing scheduler must serialize functions between workers, which can fail - The multiprocessing scheduler must serialize data between workers and the central process, which can be expensive - The multiprocessing scheduler cannot transfer data directly between worker processes; all data routes through the master process.

... https://distributed.dask.org/en/latest/memory.html#difference-with-dask-comp... <https://distributed.dask.org/en/latest/memory.html#difference-with-dask-compute>

(... https://github.com/dask/dask-labextension <https://github.com/dask/dask-labextension> )

On Sat, Aug 1, 2020 at 7:34 PM Wes Turner <wes.turner@gmail.com <mailto:wes.turner@gmail.com>> wrote: PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent

https://arrow.apache.org/docs/python/plasma.html#object-ids <https://arrow.apache.org/docs/python/plasma.html#object-ids> https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer <https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer>

...
Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer.

To create an object for Plasma, you need to create an object ID, as well as give the object’s maximum size in bytes. ```python # Create an object buffer. object_id = plasma.ObjectID(20 * b"a") object_size = 1000 buffer = memoryview(client.create(object_id, object_size))

# Write to the buffer. for i in range(1000): buffer[i] = i % 128 ```

When the client is done, the client seals the buffer, making the object immutable, and making it available to other Plasma clients.

```python # Seal the object. This makes the object immutable and available to other clients. client.seal(object_id) ```

https://pypi.org/project/pyrsistent/ <https://pypi.org/project/pyrsistent/> also supports immutable structures

On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote: On 8/1/2020 1:25 PM, Marco Sulla wrote:

...
You don't need locks with immutable objects. Since they're immutable, any operation that usually will mutate the object, generate another immutable instead. The most common example is str: the sum of two strings in Python (and in many other languages) produces a new string.

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

Eric

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHF... <https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHFKBK7TMH6KIYJBPLBYBDU4IA4EB/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IRDFSJ... <https://mail.python.org/archives/list/python-ideas@python.org/message/IRDFSJP7CIQRPQQEP54T42HN33BUOOOV/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>

Wes Turner

3:59 p.m.

It's best to avoid those synchronization barriers if possible. If you have all of the data in SHM (RAM) on one node, and you need to notify processes / wait for other workers to be available to perform a task that requires that data, you need a method for IPC: a queue, channel subscriptions, a source/sink, over-frequent polling that's more resilient against dropped messages. (But you only need to scale to one node). There needs to be a shared structure that tracks allocations, right? What does it need to do lookups by. [ [obj_id_or_shm_pointer, [subscribers]] ] Does the existing memory pool solve for that? And there also needs to be an instruction pipeline; a queue/channel/source of messages for each worker or only some workers to process. ... https://distributed.dask.org/en/latest/journey.html https://distributed.dask.org/en/latest/work-stealing.html "Accelerate intra-node IPC with shared memory" https://github.com/dask/dask/issues/6267 On Sun, Aug 2, 2020, 3:21 AM Vinay Sharma <vinay04sharma@icloud.com> wrote:

...

I understand that I won’t need locks with immutable objects at some level, but I don’t understand how they can be used to synchronise shared memory segments.

For every change in an immutable object, a copy is created which will have a different address. Now, for processes to use this updated object they will have to remap a new address in their address space for them to see any changes, and this remap will have to occur whenever a change takes place, which is obviously not feasible.

So, changes in the shared memory segment should be done in the shared memory segment itself, therefore shared memory segments should be mutable.

On 02-Aug-2020, at 5:11 AM, Wes Turner <wes.turner@gmail.com> wrote:

https://docs.dask.org/en/latest/shared.html#known-limitations :

...
Known Limitations The shared memory scheduler has some notable limitations:

- It works on a single machine - The threaded scheduler is limited by the GIL on Python code, so if your operations are pure python functions, you should not expect a multi-core speedup - The multiprocessing scheduler must serialize functions between workers, which can fail - The multiprocessing scheduler must serialize data between workers and the central process, which can be expensive - The multiprocessing scheduler cannot transfer data directly between worker processes; all data routes through the master process.

... https://distributed.dask.org/en/latest/memory.html#difference-with-dask-comp...

(... https://github.com/dask/dask-labextension )

On Sat, Aug 1, 2020 at 7:34 PM Wes Turner <wes.turner@gmail.com> wrote:

...
PyArrow Plasma object ids, "sealing" makes an object immutable, pyristent

https://arrow.apache.org/docs/python/plasma.html#object-ids https://arrow.apache.org/docs/python/plasma.html#creating-an-object-buffer

...
Objects are created in Plasma in two stages. First, they are created, which allocates a buffer for the object. At this point, the client can write to the buffer and construct the object within the allocated buffer.

To create an object for Plasma, you need to create an object ID, as well as give the object’s maximum size in bytes. ```python # Create an object buffer. object_id = plasma.ObjectID(20 * b"a") object_size = 1000 buffer = memoryview(client.create(object_id, object_size))

# Write to the buffer. for i in range(1000): buffer[i] = i % 128 ```

When the client is done, the client seals the buffer, making the object immutable, and making it available to other Plasma clients.

```python # Seal the object. This makes the object immutable and available to other clients. client.seal(object_id) ```

https://pypi.org/project/pyrsistent/ also supports immutable structures

On Sat, Aug 1, 2020 at 4:44 PM Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/1/2020 1:25 PM, Marco Sulla wrote:

...
You don't need locks with immutable objects. Since they're immutable, any operation that usually will mutate the object, generate another immutable instead. The most common example is str: the sum of two strings in Python (and in many other languages) produces a new string.

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You

need to consider this if you're sharing objects without locking or other

synchronization.

Eric

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FEJEHF... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IRDFSJ... Code of Conduct: http://python.org/psf/codeofconduct/

Marco Sulla

4:12 p.m.

There's also the possibility to use shared ctypes: https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects Operations like += which involve a read and write are not atomic. So if,

...

for instance, you want to atomically increment a shared value it is insufficient to just do

counter.value += 1

Assuming the associated lock is recursive (which it is by default) you can instead do with counter.get_lock(): counter.value += 1

Notice that they use a lock anyway. Maybe the solution of Wes Turner is better. See also RLock: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.RLock

On Sat, 1 Aug 2020 at 22:42, Eric V. Smith <eric@trueblade.com> wrote:

...

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

This is interesting. What if you want to have a language that uses only immutable objects and garbage collection? Could smart pointers address this problem?

Vinay Sharma

4:16 p.m.

sharedctypes can only be used with related processes. There is no way that you can pass a sharedctype to an unrelated process. multiprocessing.shared_memory was created to handle this i.e. allow usage of shared memory IPC across unrelated processes.

...

On 02-Aug-2020, at 9:42 PM, Marco Sulla <Marco.Sulla.Python@gmail.com> wrote:

There's also the possibility to use shared ctypes: https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects <https://docs.python.org/3/library/multiprocessing.html#shared-ctypes-objects>

Operations like += which involve a read and write are not atomic. So if, for instance, you want to atomically increment a shared value it is insufficient to just do counter.value += 1

Assuming the associated lock is recursive (which it is by default) you can instead do

with counter.get_lock(): counter.value += 1

Notice that they use a lock anyway. Maybe the solution of Wes Turner is better. See also RLock: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.RLock <https://docs.python.org/3/library/multiprocessing.html#multiprocessing.RLock> On Sat, 1 Aug 2020 at 22:42, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote: While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

This is interesting. What if you want to have a language that uses only immutable objects and garbage collection? Could smart pointers address this problem? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QVQWSN... Code of Conduct: http://python.org/psf/codeofconduct/

Eric V. Smith

4:20 p.m.

...

On Sat, 1 Aug 2020 at 22:42, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

This is interesting. What if you want to have a language that uses only immutable objects and garbage collection? Could smart pointers address this problem?

Yes, garbage collection changes the picture entirely, with or without immutable objects. But the original topic was cross-processs shared memory, and I don't know of any cross-process aware garbage collectors that support shared memory. Although such a thing could easily exist without my knowledge. Eric

Eric V. Smith

6:18 p.m.

On 8/2/2020 12:20 PM, Eric V. Smith wrote:

...

...
On Sat, 1 Aug 2020 at 22:42, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

This is interesting. What if you want to have a language that uses only immutable objects and garbage collection? Could smart pointers address this problem?

Yes, garbage collection changes the picture entirely, with or without immutable objects. But the original topic was cross-processs shared memory, and I don't know of any cross-process aware garbage collectors that support shared memory. Although such a thing could easily exist without my knowledge.

Note that I'm talking about putting Python objects into this shared memory. If that's not what people are contemplating, then my observations don't apply. Eric

Wes Turner

9:01 p.m.

How is this a different problem than the cache coherency problem? https://en.wikipedia.org/wiki/Cache_coherence Perhaps that's an unhelpful abstraction? This hasn't gone anywhere: https://en.wikipedia.org/wiki/Distributed_shared_memory#Directory_memory_coh... Here's a great comparison chart for message passing vs distributed shared memory: https://en.wikipedia.org/wiki/Distributed_shared_memory#Message_Passing_vs._... Could there be a multiprocessing.MemoryPool that tracks allocations, refcounts, and also locks? A combined approach might have an IPC channel/stream/source/sinks for messages that instruct workers to invalidate/re-fetch object_id/references, but consistency and ordering result in the same issues encountered with the cache coherence problem. Then, what is the best way to enqueue changes to shared global state (in shared memory on one node, in this instance)? (... "Ask HN: Learning about distributed systems?" https://news.ycombinator.com/item?id=23931730 ) A solution for this could help accelerate dask and dask.distributed (which already address many parallel issues in multiprocess and distributed systems in pure python) "Accelerate intra-node IPC with shared memory" https://github.com/dask/dask/issues/6267 On Sun, Aug 2, 2020, 3:11 PM Eric V. Smith <eric@trueblade.com> wrote:

...

On 8/2/2020 12:20 PM, Eric V. Smith wrote:

On Sat, 1 Aug 2020 at 22:42, Eric V. Smith <eric@trueblade.com> wrote:

...
While they're immutable at the Python level, strings (and all other objects) are mutated at the C level, due to reference count updates. You need to consider this if you're sharing objects without locking or other synchronization.

This is interesting. What if you want to have a language that uses only immutable objects and garbage collection? Could smart pointers address this problem?

Yes, garbage collection changes the picture entirely, with or without immutable objects. But the original topic was cross-processs shared memory, and I don't know of any cross-process aware garbage collectors that support shared memory. Although such a thing could easily exist without my knowledge.

Note that I'm talking about putting Python objects into this shared memory. If that's not what people are contemplating, then my observations don't apply.

Eric _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KT7URL... Code of Conduct: http://python.org/psf/codeofconduct/

Marco Sulla

4 Aug 4 Aug

9:30 p.m.

I forgot that there's also Ray: https://github.com/ray-project/ray Ray uses Apache Arrow (and Plasma) under the hood. It seems Plasma was originally developed by Ray team. Don't know how they solve the GC problem. Maybe they disable it.

1557

Age (days ago)

1566

Last active (days ago)

List overview

Download

20 comments

6 participants

participants (6)

Barry Scott
Eric V. Smith
Marco Sulla
Robert Collins
Vinay Sharma
Wes Turner

How to prevent shared memory from being corrupted ?

tags

participants (6)