[pypy-dev] FW: Would the following shared memory model be possible?

Michael Sparks sparks.m at gmail.com
Sat Jul 31 19:43:32 CEST 2010


[ cc'ing the list in case anyone else took my words the same way as Kevin :-( ]

On Sat, Jul 31, 2010 at 5:26 PM, Kevin Ar18 <kevinar18 at hotmail.com> wrote:
> I have no idea what I did you warrant you hateful replies towards me, but
> they really are not appropriate (in public or private email).

I had absolutely no intention of offending you, and am deeply sorry
for any offense that I may have caused you.

In my reply I merely wanted to flag that I don't have time to go into
everything (like most people), that asking questions in a public realm
is better because you may then get answers from multiple people, and
that people who appear to do some research first tend to get better
answers. I also tried to give an example, but that doesn't appear to
have been helpful. (I'm fallible like everyone else)

My intention there was to be helpful and to explain why I have that
view of only replying on list, and it appears to have offended you
instead, and I apologise. (one person's direct and helpful speech in
one place can be a mortal insult somewhere else)

After those couple of paragraphs, I tried to add to your discussion by
replying to your specific points which you asked about parallel
execution, noting places and examples where it is possible today. (to
varying degrees of satisfaction) I then also tried to answer your
point of "if something extra could be done, what would probably be
generally useful". To that I noted that *my* talk there was cheap, and
that execution was hard.

Somehow along the way, my intent to try to be helpful to you has
resulted in offending and upsetting you, and for that I am truly sorry
- life is simply too short for people to upset each other, and in no
way was my post intended as "hateful", and once again, my apologies.
In future please assume good intentions - I assumed good intentions on
your part.

I'll bow out at this point.

Best Regards,


Michael.

>
>> Date: Sat, 31 Jul 2010 02:08:49 +0100
>> Subject: Re: [pypy-dev] FW: Would the following shared memory model be
>> possible?
>> From: sparks.m at gmail.com
>> To: kevinar18 at hotmail.com
>> CC: pypy-dev at codespeak.net
>>
>> On Thu, Jul 29, 2010 at 6:44 PM, Kevin Ar18 <kevinar18 at hotmail.com> wrote:
>> > You brought up a lot of topics. I went ahead and sent you a private
>> > email.
>> > There's always lots of interesting things I can add to my list of things
>> > to
>> > learn about. :)
>>
>> Yes, there are lots of interesting things. I have a limited amount of
>> time however (I should be in bed, it's very late here, but I do /try/
>> to reply to on-list mails), so cannot spood feed you. Mailing me
>> directly rather than a (relevant) list precludes you getting answers
>> from someone other than me. Not being on lists also precludes you
>> getting answers to questions by chance. Changing emails and names in
>> email headers also makes keeping track of people hard...
>>
>> (For example you asked off list last year about Kamaelia's license
>> from a different email address. Since it wasn't searchable I
>> completely forgot. You also asked all sorts of questions but didn't
>> want the answers public, so I didn't reply. If instead you'd
>> subscribed to the list, and asked there, you'd've found out that
>> Kamaelia's license changed - to the Apache Software License v2 ...)
>>
>> If I mention something you find interesting, please Google first and
>> then ask publicly somewhere relevant. (the answer and question are
>> then googleable, and you're doing the community a service IMO if you
>> ask q's that way - if you're question is somewhere relevant and shows
>> you've already googled prior work as far as you can... People are
>> time however (I should be in bed, it's very late here, but I do /try/
>> to reply to on-list mails), so cannot spood feed you. Mailing me
>> directly rather than a (relevant) list precludes you getting answers
>> from someone other than me. Not being on lists also precludes you
>> getting answers to questions by chance. Changing emails and names in
>> email headers also makes keeping track of people hard...
>>
>> (For example you asked off list last year about Kamaelia's license
>> from a different email address. Since it wasn't searchable I
>> completely forgot. You also asked all sorts of questions but didn't
>> want the answers public, so I didn't reply. If instead you'd
>> subscribed to the list, and asked there, you'd've found out that
>> Kamaelia's license changed - to the Apache Software License v2 ...)
>>
>> always willing to help people who show willing to help themselves in
>> my experience.)
>>
>> >> just looks to me that you're tieing yourself up in knots over things
>> >> that aren't problems, when there are some things which could be useful
>> >> (in practice) & interesting in this space.
>> > The particular issue in this situation is that there is no way to make
>> > Kamaelia, FBP, or other concurrency concepts run in parallel (unless you
>> > are
>> > willing to accept lots of overhead like with the multiprocessing
>> > queues).
>> >
>> > Since you have worked with Kamaelia code a lot... you understand a lot
>> > more
>> > about implementation details. Do you think the previous shared memory
>> > concept or something like it would let you make Kamaelia parallel?
>> > If not, can you think of any method that would let you make Kamaelia
>> > parallel?
>>
>> Kamaelia already CAN run components in parallel in different processes
>> (has been able to do so for quite some time) or on different
>> processors. Indeed, all you do is use a ProcessPipeline or
>> ProcessGraphline rather than Pipeline or Graphline, and the components
>> in the top level are spread across processes. I still view the code as
>> experimental, but it does work, and when needed is very useful.
>>
>> Kamaelia running on Iron Python can run on seperate processors sharing
>> data efficiently (due to lack of GIL there) happily too. Threaded
>> components there do that naturally - I don't use IronPython, but it
>> does run on Iron Python. On windows this is easiest, though Mono works
>> just as well.
>>
>> I believe Jython also is GIL free, and Kamaelia's Axon runs there
>> cleanly too. As a result because Kamaelia is pure python, it runs
>> truly in parallel there too (based on hearing from people using
>> kamaelia on jython). Cpython is the exception (and a rather big one at
>> that). (Pypy has a choice IIUC)
>>
>> Personally, I think if PyPy worked with generators better (which is
>> why I keep an eye on PyPy) and cpyext was improved, it'd provide a
>> really compelling platform for me. (I was rather gutted at Europython
>> to hear that PyPy's generator support was still ... problematic)
>>
>> Regarding the *efficiency* and *enforcement* of the approach taken, I
>> feel you're chasing the wrong tree, but let's go there.
>>
>> What approach does baseline (non-Iron Python running) kamaelia take
>> for multi-process work?
>>
>> For historical reasons, it builds on top of pprocess rather than
>> multiprocessing module based. This means for interprocess
>> communications objects are pickled before being sent over operating
>> system pipes.
>>
>> This provides an obvious communications overhead - and this isn't
>> really kamaelia specific at this point.
>>
>> However, shifting data from one CPU to another is expensive, and only
>> worth doing in some circumstances. (Consider a machine with several
>> physical CPUs - each has a local CPU cache, and the data needs to be
>> transferred from one to another, which is why partly people worry
>> about thread/CPU affinity etc)
>>
>> Basically, if you can manage it, you don't want to shift data between
>> CPUs, you want to partition the processing.
>>
>> ie you may want to start caring about the size of messages and number
>> of messages going between processes. Sending small and few between
>> processes is going to be preferable to sending large and many for
>> throughput purposes.
>>
>> In the case of small and few, the approach of pickling and sending
>> across OS pipes isn't such a bad idea. It works.
>>
>> If you do want to share data between CPUs, and it sounds like you do,
>> then most OSs already provide a means of doing that - threads. The
>> conventions people use for using threads are where they become
>> unpicked, but as a mechanism, threads do generally work, and work
>> well.
>>
>> As well as channels/boxes, you can use an STM approach, such as than
>> in Axon.STM ...
>> * http://www.kamaelia.org/STM.html
>> *
>> http://code.google.com/p/kamaelia/source/browse/trunk/Code/Python/Bindings/STM/
>>
>> ...which is logically very similar to version control for variables. A
>> downside of STM (at least with this approach) however, is that for it
>> to work, you need either copy on write semantics for objects, or full
>> copying of objects or similar. Personally I use a biological metaphor
>> here, in that channels/boxes and components, and similar perform a
>> similar function to axons and neurons in the body, and that STM is
>> akin to the hormonal system for maintaining and controlling system
>> state. (I modelled biological tree growth many moons ago)
>>
>> Anyhow, coming back to threads, that brings us back to python, and
>> implementations with a GIL, and those without.
>>
>> For implementations with a GIL, you then have a choice: do I choose to
>> try and implement a memory model that _enforces_ data locality? that
>> is if a piece of data is in use inside a single "process" or "thread"
>> (from hereon I'll use "task" as a generic phrase) that trying to use
>> it inside another causes a problem for the task attempting to breach
>> the model.
>>
>> In order to enforce this, I personally believe you'd need to use
>> multiple processes, and only share data through dedicated code
>> managing shared memory. You could of course do this outside user code.
>> To do this you'd need an abstraction that made sense, and something
>> like stackless' channels or kamaelia's (in/out) box model makes sense
>> there. (The CELL API uses a mailbox metaphor as well for reference)
>>
>> In that case, you have a choice. You either copy the data into shared
>> memory, or you share the data in situ. The former gives you back
>> precisely the same overhead previously described, or the latter
>> fragments your memory (since you can no longer access it). You could
>> also have compaction.
>>
>> However, personally, I think any possible benefits here are outweighed
>> by the costs and complexity.
>>
>> The alternative is to _encourage_ data locality. That is encourage the
>> usage and sharing of data such that whilst you could share data
>> between tasks and cause corruption that the common way of using the
>> system discourages such actions. In essence that's what I try to do in
>> Kamaelia, and it seems to work. Specifically, the model says:
>>
>> * If I take a piece of data from an inbox, I own it and can do anything
>> with it that I like. If you think of a physical piece of paper and
>> I take it from an intray, then that really is the case.
>>
>> * If I put a piece of data in an outbox, I no longer own it and should
>> not attempt to do anything more with it. Again, using a physical
>> metaphor, and naming scheme helps here. In particular, if I put a
>> piece of paper in the post, I can no longer modify it. How it gets
>> to its recipient is not my concern either.
>>
>> In practice this does actually work. If you add in immutable tuples,
>> and immutable strings then it becomes a lot clearer how this can work.
>>
>> Is there a risk here of accidental modification? Yes. However, the
>> size and general simplicity of components tends to lead to such
>> problems being picked up early. It also enables component level
>> acceptance tests. (We tend to build small examples of usage, which in
>> turn effectively form acceptance tests)
>>
>> [ An alternative is to make the "send" primitive make a copy on send.
>> That would be quite an overhead, and also limit the types of data you
>> can send. ]
>>
>> In practical terms, it works. (Stackless proves this as well IMO,
>> since despite some differences, there's also lots of similarities)
>>
>> The other question that arises, is "isn't the GIL a problem with
>> threads?". Well, the answer to that really depends on what you're
>> doing. David Beazely's talk on what happens on mixing different sorts
>> of threads shows that it isn't ideal, and if you're hitting that
>> behaviour, then actually switching to real processes makes sense.
>> However if you're doing CPU intensive work inside a C extension which
>> releases the GIL (eg numpy), then it's less of an issue in practice.
>> Custom extensions can do the same.
>>
>> So, for example, picking something which I know colleagues [1] at work
>> do, you can use a DVS broadcast capture card to capture video frames,
>> pass those between threads which are doing processing on them, and
>> inside those threads use c extensions to process the data efficiently
>> (since image processing does take time...), and those release the GIL
>> boosting throughput.
>>
>> [1] On this project :
>> http://www.bbc.co.uk/rd/projects/2009/10/i3dlive.shtml
>>
>> So, that makes it all sound great - ie things can, after various
>> fashions, run in parallel on various versions of python, to practical
>> benefit. But obviously it could be improved.
>>
>> Personally, I think the project most likely to make a difference here
>> is actually pypy. Now, talk is very cheap, and easy, and I'm not
>> likely to implement this, so I'll aim to be brief. Execution is hard.
>>
>> In particular, what I think is most likely to be beneficial is
>> something _like_ this:
>>
>> Assume pypy runs without a GIL. Then allow the creation of a green
>> process. A green process is implemented using threads, but with data
>> created on the heap such that it defaults to being marked private to
>> the thread (ie ala thread local storage, but perhaps implemented
>> slightly differently - via references from the thread local storage
>> into the heap) rather than shared. Sharing between green processes
>> (for channels or boxes) would "simply" be detagged as being owned by
>> one thread, and passed to another.
>>
>> In particular this would mean that you need a mechanism for doing
>> this. Simply attempting to call another green process (or thread) from
>> another with mutable data types would be sufficient to raise the
>> equivalent of a segmentation fault.
>>
>> Secondly, improve cpyext to the extent that each cpython extension
>> gets it's own version of the GIL. (ie each extension runs with its own
>> logical runtime, and thinks that it has its own GIL which it can lock
>> and release. In practice it's faked by the PyPy runtime. This is
>> essentially similar conceptually to creating green processes.
>>
>> It's worth considering that the Linux kernel went through similar
>> changes, in that in the 2.0 days there was a large single big lock,
>> which was replaced by ever granular locks. I personally think that
>> since there are so many extensions that rely on the existence of the
>> GIL simply waving a wand to get rid of it isn't likely. However
>> logically providing a GIL per C-Extension may be plausible, and _may_
>> be sufficient.
>>
>> However, I don't know - it might well not - I've not looked at the
>> code, and talk is cheap - execution is hard.
>>
>> Hopefully the above (cheap :) comments are in some small way useful.
>>
>> Regards,
>>
>>
>> Michael.
>



More information about the Pypy-dev mailing list