[pypy-dev] Stackless vs Erlang benchmarks

Fri Aug 10 03:21:21 CEST 2007

I made a library not long ago for stackless cpython meant to partially
emulate that aspect of Erlang (called the library Pylang aptly enough,
no I have no website, mostly personal use, I have never seen anyone
in the community actively interested in such things from my Python
experience, although if anyone voices opinions here then I could be
convinced to release what I have).  But you write your actors as if they
are mini programs with their own even loops and so forth.  They should
not communicate by direct interaction of objects (unless you know
beyond any doubt they will be in the same process, still better to just
purely use message passing as I do with it) but rather strictly by
message passing.  Here is one of my tests that show the event loop
and so forth (this test tests the message loop, does not show PID
usage, network interaction, or anything else):
_______________________________________________________
from ..core import getMessage # Yes, yes, I know, castrate me now
# for using relative imports, this file is pylang/tests/test0.py and I do
# need to do some file renaming and class shuffling if I release this
from ..nameserver import NameServer # If I release, I need to rename
# this as well since it does far more then the original design called for

 # This vars and this class are just to make ease of message passing
 # in this first test more simple, it is not necessary though
TEST0_CONFIRMATION=0
TEST0_TESTSUBROUTINE=1
TEST0_ONLYSUBROUTINE=2
TEST0_EXIT=-1
class test0Message(object):
    def __init__(self, type):
        self.type=type
    def __str__(self):
        return "Message type: %i" % (self.type)

def test0SubRoutine():
    print "\t 0:Subroutine entered, waiting for signal only for me"
    e=getMessage(lambda e: isinstance(e, test0Message) and
e.type==TEST0_ONLYSUBROUTINE)
    print "\t 0:Subroutine message receieved (should be
%i):"%(TEST0_ONLYSUBROUTINE), e
    print "\t 0:Exiting subroutine"

def test0Main():
    while True:
        e=getMessage(timeout=1)
        if e is None:
            print "\t0:Timeout receieved"
        elif not isinstance(e, test0Message):
            print "\t0:Unknown message receieved, not of this test type"
        elif e.type==TEST0_CONFIRMATION:
            print "\t0:Confirmation received"
        elif e.type==TEST0_TESTSUBROUTINE:
            print "\t0:Testing subroutine"
            test0SubRoutine()
        elif e.type==TEST0_ONLYSUBROUTINE:
            print "0:ERROR: Only Subroutine message received when not
in a callback"
        elif e.type==TEST0_EXIT:
            print "\t0:Exit request received, exiting process"
            return
        else:
            print "\t0:Unknown message type received, either need to
set a condition to only get what you want, or dump this:", e

def test0Generator(p):
    global exitScheduler
    print "Generator started, sending confirmation"
    p.sendMessage(test0Message(TEST0_CONFIRMATION))
    Sleep(0.01)
    print "Generator sleeping for 1.5 seconds"
    Sleep(1.5)
    print "Sending test subroutine"
    p.sendMessage(test0Message(TEST0_TESTSUBROUTINE))
    Sleep(0.01)
    print "Sending confirmation"
    p.sendMessage(test0Message(TEST0_CONFIRMATION))
    Sleep(0.01)
    print "Sending only subroutine"
    p.sendMessage(test0Message(TEST0_ONLYSUBROUTINE))
    Sleep(0.01)
    print "Sending confirmation"
    p.sendMessage(test0Message(TEST0_CONFIRMATION))
    Sleep(0.01)
    print "Sending untyped test0Message"
    p.sendMessage(test0Message(10))
    Sleep(0.01)
    print "Sending exit message"
    p.sendMessage(test0Message(TEST0_EXIT))
    Sleep(0.01)
    print "Generator exiting"
    NameServer.__nameserver__.shutdown() # This sends a close message
to all open
    # Processes and continues for a bit, if they have not all died by
the timeout then they
    # are sent the taskletkill exception, which will kill them
regardless.  Also shuts down
    # the server and other such things

def runTest0():
    print "\t0:Testing local process and local message communication"
    NameServer(serverNode=NullServer()) # NullServer does not host
squat.  You can
    # create servernodes to handle tcp, udp, some other communication
type, equiv to
    # a driver in Erlang, everything on the mesh network needs to use
the same thing
    # without using a gateway inside the mesh of some sort to connect
different ones
    p=Process(test0Main)()
    Process(test0Generator)(p) # Not the proper way to spawn processes now,
    # should call spawn(), but this was an early testcase that is
still useful as-is
    NameServer.__nameserver__.run()
_______________________________________________________

And it works as expected.  The testcase that just tests network setup
and destruction (no message passing or anything, just creation,
verification, and destruction) is this:
_______________________________________________________

from ..nameserver import NameServer
from ..node import TCPServer
from ..core import Process, getMessage

def tempKillAll(waitTime=50, ns=None):
	if not ns:
		ns = NameServer.__nameserver__
	Sleep(waitTime%10)
	for i in xrange(int(waitTime)-(waitTime%10), 0, -10):
		print "%i seconds until death" %(i)
		Sleep(10)
	print "dieing"
	ns.shutdown()

def testServer(timeout=30, cookie='nocookie', localname='myhostA'):
	NameServer(serverNode=TCPServer(localname=localname, cookie=cookie,
host='', port=42586, backlog=5))
	Process(tempKillAll)(timeout)
	NameServer.__nameserver__.run()

def testClient(timeout=3, cookie='nocookie', localname='myhostB',
remoteURL='pylang://myhostA:42586/'): # no the port in the url is not
        # necessary as it is the default port used by this anyway
	NameServer(serverNode=TCPServer(localname=localname, cookie=cookie,
host='', port=42587, backlog=5))
	Process(tempKillAll)(timeout)
	Sleep(0.1)
	NameServer.__nameserver__.createNewNode(remoteURL)
	NameServer.__nameserver__.run()
_______________________________________________________

The URL in full would actually be
"pylang://myhostB:nocookie@myhostA:42586/".  But if any are omitted
then it uses the local hostname, the cookie, the remote host, the
default port, etc...   To connect to a remote named process (not an
anonymous process) then you could get its PID with:
PID("pylang://myhostB:nocookie@myhostA:42586/someNamedProcess") and if
the connection to the remote node is not made then it will be made.
If you have a Node pointer to an already connected node then you can
just call a method on it to get a PID to a remote named process.  If
you send a message through a PID (the main way of sending messages)
then they may or may not arrive because, although you can get a PID to
a named process for example, the process may not actually exist.  You
can query a process to see if it exists which involves calling a ping
method on the PID which will actually perform a test on the remote
node (which may be on the same process/node, or a networked computer
or what-not) and send back a specially crafted message saying pong or
pang for success or fail with the PID as the param.  A PID can always
be converted to a pylang URL and vice-versa (as even anonymous
processes have a guid generated for them).  There can also be a type
param of the url (example:
"pylang://myhostB:nocookie@myhostA:42586;tcp/", then 'tcp' would be
the type, standard url syntax) and if so then it will try to create a
connection to the remote node using that connection (tcp, udp, ssl,
pipe, raknet, whateverisMade) instead of the default registered to the
nameserver by the servernode param, useful for connecting to a mesh
that uses a different server type..

Processes can choose to save themselves for transmission across a
network, save to a db or file, etc..., and they will retain their
GUID.  When they are serialized they are not serialized as a whole but
they send a structure with the method call which is returned as an
initial message to a process when it is restarted so it can
reconstruct itself from it, the guid is always included and handled
transparently.  Thus, a process can be 'pickled' quite easilly, but it
must be built for it (not hard to do, just needs to be done), reason
being is that although stackless allows full tasklets to be pickled,
some process may have some rather... adverse effects to being pickled
(like one that handled any form of resources, db, file, etc...) hence
why I have it do it this other, more 'erlangish' method.

The message loop function (I do apologize for jumping around in
thoughts...) is not just a standard pull the first off the queue like
in a normal OS message loop, but has a signature of: "message
getMessage(conditionFn=lambda e:True, timeout=-1)".  When getMessage
is called then it starts testing the messages in the queue for the
process it was called from in order they were received, testing each
with the conditionFn (allows for nice complex testing for something
like "lambda e: isinstance(e, test0Message) and
e.type==TEST0_ONLYSUBROUTINE" or "lambda e: isinstance(e, tuple) and
len(e)==2 and e[0]==14" or whatever to simple things like just
returning true to get the next top thing as is the default to passing
in another function.  If it good to get and empty out the queue on
occasion to make sure it does not fill up with messages you do not
want, slowing it down slightly.  The other param is timeout (in
seconds).  If timeout equals -1 then it will wait forever for a
matching message.  If timeout equals 0 then it just test and returns a
message if one matches that it already has, or it returns None (due to
zero timeout).  If greater then zero then it tests for a matching
message, if none, then it waits up to the timeout for a message, if
one arrives by that time then it returns it immediately, if not then
it returns None when it reaches the timeout.  It is not a spin-wait
nor does it delay receiving the message or anything of so forth.  The
nameserver runs a pure tasklet (no message queue, no nothing like
that) that just checks a list, if the top-most item on it is past the
time delay then it sends a message to it and removes it then tests the
next and repeats, else it inserts itself back into the stackless queue
and waits for the next cycle to come by.  A Process, when it has a
timeout but no messages match, then it adds itself to the
afore-mentioned list as a tuple of (timeTheTimeoutOccurs,
processWaitChannel) then sorts it then blocks on its wait channel.  If
a message arrives at the process it tests the wait channel to see if
it is blocked, if so it returns the message on that channel so it can
then be tested, if it does not match then the process re-blocks, if it
does match then it removes itself from the wait list and returns the
message.  If an exception occurs (taskletkill for example), it still
properly removes itself from the list before it re-raises.

Many other things done, but those are the major parts.  All listed
above is either working (near all of it) but may still need to be
refined in its interface and so forth, or is partially done (pickling
is more... hands-on currently, the interface described above is the
new one I am partially moved to).  The purpose of this was just to
describe that erlangs strength can also be a strength for python as
well.  I really would like to move this library to PyPy when PyPy
becomes usable.  I have written that linked test above as well (not in
this library, but pure stackless, a vastly different way then the
article did though) and erlang still beat stackless cpython, hopefully
pypy will fare better.

On 8/5/07, Carl Friedrich Bolz <cfbolz at gmx.de> wrote:
> Hi Maciek
>
> Maciek Fijalkowski wrote:
> > http://muharem.wordpress.com/2007/07/31/erlang-vs-stackless-python-a-first-benchmark/
> >
> > Christian: with a dedication for you :)
> >
> > We should try pypy on this btw.
>
> seems a bit meaningless, given that one of erlang's most important
> strengths is the possibility of using it to transparently across
> multiple processes and especially multiple machines.
>
> Cheers,
>
> Carl Friedrich
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>