![](https://secure.gravatar.com/avatar/9e08987176fbb4776b8665c17e735cb8.jpg?s=120&d=mm&r=g)
I don't mind replying to the mailing list unless it annoys someone? Maybe some people could be interested by this discussion. You have a lot of questions! :) My answers are inline. 2010/8/5 Kevin Ar18 <kevinar18@hotmail.com>
Note: Gabriel, do you think we should discuss this on another mailing list (or in private) as I'm not sure this related to PyPy dev anymore?
Anywyas, what are your future plans for the project? Is it just an experiment for school ... maybe in the hopes that others would maintaining it if it was found to be interesting? ... are you planning actual future development, maintenance, promotion of it yourself?
Based on the interest and time I'll and and other people will have I plan to debug this as much as possible. If people are interested to join in after my thesis, I'll be more than open to welcome then in the project. Right now, I'm writing my report and I'm also looking for a job. I won't have much time to touch again to the code before next month to prepare it for my presentation, along with a lot of examples and use cases.
-----------
On a personal note... the concept has a lot of similarities to what I am exploring. However, I would have to make so many additional modifications. Perhaps you can give some thoughts on whether it would take me a long time to add such things?
Allright, my plan was to make all the needed lower level constructs that can be used to build more complex things. For example, a mix of tasklet and sync channels could be wrapped in an API to create async channels. I know this is far from complete and I have a few ideas on how it could be improved in the future but it's currently not needed for my project. For now, the idea was to stay as close as possible to standard Stackless Python and only add the needed APIs and functionalities to support distributing tasklets between multiple interpreters.
Some examples:
* Two additional message passing styles (in addition to your own) Queues - multiple tasklets can push onto queue, only one tasklet can pop.... multiple tasklets can access the property to find out if there is any data in the queue. Queues can be set to an infite size or set with a max # of entries allowed.
This could easily be implemented using a standard channel and by starting multiple tasklets to send data. With some helper methods on a channel it could be possible to know how many tasklets are waiting to send their data. A channel already have a built-in queue for send/receive requests. This queue contains a list of all tasklets waiting for a send/receive operation. Tasklets are supposed to be lightweight enough to support something like this.
Streams - I'm not sure of the exact name, but kind of like an infinite
stream/buffer ... useful for passing infinite amounts of data. Only one tasklet can write/add data. Only one tasklet can read/extract data.
Like a UNIX pipe()? Async? Again, some code wrapping standard channels could be used for this.
* Message passing When you create a tasklet, you assign a set number of queues or streams to it (it can have many) and whether they extract data from them or write to them (they can only either extract or write to it as noted above). The tasklet's global namespace has access to these queues or streams and can extract or add data to them.
In my case, I look at message passing from the perspective of the tasklet. A tasklet can either be assigned a certain number of "in ports" and a certain number of "out ports." In this case the "in ports" are the .read() end of a queue or stream and the "out ports" are the .send() part of a queue or stream.
Sorry, I don't really understand what you're trying to explain here. Maybe an example could be helpful? :)
* Scheduler For the scheduler, I would need to control when a tasklet runs. Currently, I am thinking that I would look at all the "in ports" that a tasklet has and make sure each one has some data. Only then would the tasklet be scheduled to run by the scheduler.
Couldn't all those ports (channels) be read one at a time, then the processing could be done? I don't exactly see the need to play with the scheduler. Channels are blocking. A tasklet will be anyway unscheduled when it tries to read on a channel in which no data is available.
------------ On another note, I am curious how you handled the issue of "nested" objects. Consider send() and receive() that you use to pass objects around in your project. Am I correct in that these objects cannot contain references outside of themselves? Also, how do you handle extracting out of the tree and making sure there are not references outside the object?
Right now, I did not really dig too far with this problem. With a local communication, a reference to the object is sent through a channel. The receiver tasklet will have the same access to the object and all the sub-object as the sender tasklet. For remote communications, pickling is involved. The object to send must be picklable. It excludes any I/O object unless the programmer creates its own pickling protocol for those. A copy of all the object tree will then be made. Sometime it's good (small objects), sometime it's bad (really complex, big objects, I/O objects, etc.). This is why I added the concept of ref_object() using PyPy's proxy object space. For such objects, a proxy can be made and only a reference object will be sent to the remote side. This object will have the same type as the original object but all operations will be forwarded to the host node. All replies will also be wrapped by proxies when sent back to the remote reference object. The only case where a proxy object is not created is with atomic types (string, int, float, etc). It's useless for those because they are immutable anyway. A remote access to those would introduce useless latency. With ref_object(), the object tree always stay on the initial node. A move() operation will also be added to those ref_object()s to be able to move them between interpreters if needed.
For example, consider the following object, where "->" means it has a reference to that object
Object 1 -> Object 2
Object 2 -> Object 3
Object 2 -> Object 4
Object 4 -> Object 2
Now, let's say I have a tasklet like the following:
.... -> incoming data = pointer/reference to Object 1
1. read incoming data (get Object 1 reference) 2. remove Object 3 3. send Object 3 to tasklet B 4. send Object 1 to tasklet C
Result: tasklet B now has this object: pointer/reference to Object 1, which contains the following tree:
Object 1 -> Object 2
Object 2 -> Object 4 Object 4 -> Object 2
tasklet C now has this object: pointer/reference to Object 3, which contains the following tree: Object 3
I think you swapped tasklet B and tasklet C for the end result! ;)
On the other hand, consider the following scenario:
1. read incoming data (get Object 1 reference) 2. remove Object 4 ERROR: this would not be possible, as it refers to Object 2
Why isn't it possible? By removing "Object 4" I guess you mean removing this link: Object 2 -> Object 4? This is the only way Object 4 could be removed.
Sorry for the late answer, I was unavailable in the last few days.
About send() and receive(), it depends on if the communication is local or not. For a local communication, anything can be passed since only the reference is sent. This is the base model for Stackless channels. For a remote communication (between two interpreters), any picklable object (a copy will then be made) and it includes channels and tasklets (for which a reference will automatically be created).
The use of the PyPy proxy object space is to make remote communication more Stackless like by passing object by reference. If a ref_object is made, only a reference will be passed when a tasklet is moved or the object is sent on a channel. The object always resides where it was created. A move() operation will also be implemented on those objects so they can be moved around like tasklets.
I hope it helps,
Gabriel
2010/7/29 Kevin Ar18>
Hello Kevin, I don't know if it can be a solution to your problem but for my Master Thesis I'm working on making Stackless Python distributed. What I did is working but not complete and I'm right now in the process of writing the thesis (in french unfortunately). My code currently works with PyPy's "stackless" module onlyis and use some PyPy specific things. Here's what I added to Stackless:
- Possibility to move tasklets easily (ref_tasklet.move(node_id)). A node is an instance of an interpreter. - Each tasklet has its global namespace (to avoid sharing of data). The state is also easier to move to another interpreter this way. - Distributed channels: All requests are known by all nodes using the channel. - Distributed objets: When a reference is sent to a remote node, the object is not copied, a reference is created using PyPy's proxy object space. - Automated dependency recovery when an object or a tasklet is loaded on another interpreter
With a proper scheduler, many tasklets could be automatically spread in multiple interpreters to use multiple cores or on multiple computers. A bit like the N:M threading model where N lightweight threads/coroutines can be executed on M threads.
Was able to have a look at the API... If others don't mind my asking this on the mailing list:
* .send() and .receive() What type of data can you send and receive between the tasklets? Can you pass entire Python objects?
* .send() and .receive() memory model When you send data between tasklets (pass messages) or whateve you want to call it, how is this implemented under the hood? Does it use shared memory under the hood or does it involve a more costly copying of the data? I realize that if it is on another machine you have to copy the data, but what about between two threads? You mentioned PyPy's proxy object.... guess I'll need to read up on that. _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
-- Gabriel Lavoie glavoie@gmail.com
By the way, if you come to #pypy on FreeNode, I'm WildChild! I'm always there though not alway available. I'm in the EST timezone (UTC-5). See ya, Gabriel -- Gabriel Lavoie glavoie@gmail.com