[Python-ideas] easy thread-safety [was: fork]

Tue Aug 18 22:32:03 CEST 2015

On 18.08.2015 19:27, Chris Angelico wrote:
> On Wed, Aug 19, 2015 at 3:17 AM, Sven R. Kunze <srkunze at mail.de> wrote:
>> On 18.08.2015 18:55, Chris Angelico wrote:
>>> The notion of "a completely separate heap for each thread" is talking
>>> about part B - you need a completely separate pile of objects.
>>
>> Right. However, only if needed. As long as, threads only reads from a common
>> variable, there is no need to interfere.
> Sure, but as soon as you change something, you have to
> thread-local-ify it. So I suppose what you would have is three
> separate pools of objects:
>
> 1) Thread-specific objects, which are referenced only from one thread.
> These can be read and written easily and cheaply.
> 2) Global objects which have never been changed in any way since
> threading began. These can be read easily, but if written, must be
> transformed into...
> 3) Thread-local objects, which exist for all threads, but are
> different. The id() of such an object depends on which thread is
> asking.
>
> Conceptually, all three types behave the same way - changes are
> visible only within the thread that made them. But the implementation
> could have these magic "instanced" objects for only those ones which
> have actually been changed, and save a whole lot of memory for the
> others.

Indeed. I think that is sensible approach here. Speaking of an 
implementation though, I don't know where I would start when looking at 
CPython.

Thinking more about id(). Consider a complex object like an instance of 
a class. Is it really necessary to deep copy it? It seems to me that we 
actually just need to hide the atomic/immutable values (e.g. strings, 
integers etc.) of that object. The object itself can remain the same.

# first thread
class X:
     a = 0
class Y:
     x = X

#thread spawned by first thread
Y.x.a = 3  # should leave id(X) and id(Y) alone

Maybe, that example is too simple, but I cannot think of an issue here. 
As long as the current thread is the only one being able to change the 
values of its variables, all is fine.

>> I agree, behavior-wise, processes behave almost as desired (relevant data is
>> copied over and there are no shared variables).
>>
>> However, regarding the cpu/memore/communication footprint for a new process
>> (using spawn) is enormous compared to a thread. So, threading still have its
>> merits (IMHO).
> So really, you're asking for process semantics, with some
> optimizations to take advantage of the fact that most of the processes
> are going to be just reading and not writing. That may well be
> possible, using something like the above three-way-split, but I'd want
> the opinion of someone who's actually implemented something like this
> - from me, it's just "hey, wouldn't this be cool".

If you put it this way, maybe yes. I also look forward to more feedback 
on this.

To me, a process/thread or any other concurrency solution, is basically 
a function that I can call but runs in the background. Later, when I am 
ready, I can collect its result. In the meantime, the main thread 
continues. (Again) to me, that is the only sensible way to approach 
concurrency. When I recall the details of locks, semaphores etc. and 
compare it to what real-world applications really need... You can create 
huge tables of all the possible cases that might happen just in order to 
find out that you missed an important one.

Even worse, as soon as you change something about your program, you are 
doomed to redo the complete case analysis, find a dead/live-lock-free 
solution and so forth. It's a time sink; costly and dangerous from a 
company's point of view.

Best,
Sven