[Numpy-discussion] Numpy/Scipy for EC2

Thu Nov 19 20:45:21 EST 2009

Hi all:

I'm just writing to report on my experience using Starcluster, which enables
the use of NumPy and Scipy in the Amazon EC2 cloud computing environment.
The purpose of my email is to extol Starcluster's qualities, and suggest
that the NumPy community be aware of its development.    I suspect there are
others in the community who find cloud computing an attractive idea but a
little daunting to get into, and would be pleasantly surprised out how easy
Starcluster makes it to get started using NumPy on Amazon EC2.

For those of you who aren't familiar with AMIs and the Amazon EC2 service,
see e.g. http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud.   Three
of the basic concepts are  "Amazon Machine Images" (AMIs),  "machine
instances" of AMIs, and the Elastic Block Storage (EBS) service.   AMIs are
disk images containing a virtual machine, including an operating system and
other software you add on.  Instances are temporarily allocated computers,
booted with your chosen virtual machine, that you start up on demand, use
for computations with software from the AMI, and then terminate.   EBS is a
persistent storage service, also from Amazon, that serves as permanent
file-systems in the cloud.   You allocate an EBS volume of a given size,
attach the EBS volume(s) to a running machine instance just like any other
hard-drive, and use it to store the files  you use/create during
computation, both during the computation and then for later use whenever you
start up a new instance.

A couple of weeks ago I wrote to this list asking for advice on finding a
good Amazon Machine Instance (AMI) for using NumPy and Scipy on Amazon
cloud.   I didn't want to have to build a linux machine image with optimized
blas and lapack myself, and I figured that there might be good existing
publicly-available AMIs that I could use as a base.   Robert Kern suggested
that I look into the Starcluster project (
http://web.mit.edu/stardev/cluster/).

I have found Starcluster extremely useful.  It made it possible for me to,
in the course of one day, go from knowing essentially nothing about cloud
something, to being able to run large-scale parallel clusters with my
favorite NumPy/SciPy-scripts.

The basis of what Starcluster offers are two solidly-build AMIs.  The
operating system is Ubuntu Jaunty, and comes with prebuilt optimized blas
and lapack, numpy, Scipy, matplotlib, ipython, and several other useful
packages for scientific computing in python.   It uses Python 2.6, and comes
in both 32-bit and 64-bit flavors.  The AMIs are based on AMIs from Alestic
(http://alestic.com/), and are built with best-practices for ensuring
stability and good interaction with Amazon's system.    They have proved
very stable and extensible.

In addition to these AMIs, Starcluster has three extremely useful features:

    -- Built-in support for mounting EBS drives as NFS filesystems**, and
then administering the shared drive across multiple machine instances.
    -- The Sun Grid Engine (SGE), a queuing system for scheduling jobs to be
run in parallel across instances
    -- A python module with a few commands that give you an incredibly
simple interface for automating the process of starting/terminating a
cluster of instances, mounting the shared drive, starting the grid engine,
&c -- and configuring your cluster needs (e.g. how many nodes it will
contain, which AMIs to use, which EBS volumes to mount etc.).

As a result, all you have to do to have a NumPy-enabled cluster-on-demand
is:
    1) Get an amazon EC2 account, and the accompanying security credentials
(.501 certificates and PGP keypair) for your account.
    2) Install starcluster ("easy_install starcluster")
    3) Follow the installation procedure on the starcluster website for
getting, attaching, and formatting an EBS volume as an NFS drive.
    4) Set up your starcluster configuration file.
    5) Start a 1-node cluster, modify the installation as you see fit, and
re-bundle the result into a new AMI as described on the Amazon website
http://docs.amazonwebservices.com/AWSEC2/latest/GettingStartedGuide/.
(Don't forget to edit your starcluster configuration file to reflect your
new AMI.)   This step is optional -- If you don't need anything else
special, you can just use Starcluster's base images.

After that, starting a cluster is as easy as typing single command
("starcluster -s").  To submit parallel jobs on your cluster, you can learn
to use the Sun Grid Engine "qsub" command (
http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/htmlman/htmlman1/qsub.html<http://gridengine.sunsource.net/nonav/source/browse/%7Echeckout%7E/gridengine/doc/htmlman/htmlman1/qsub.html>)
or use the python bindings to the SGE interface (
http://code.google.com/p/drmaa-python/).     Or, if you like Parallel
Python, that works perfectly well on these clusters too.

Overall, in my experience, Starcluster has been easy, stable and powerful,
and I encourage anyone who is curious about cloud computing with Numpy to
look into it.

Starcluster is by no means a finished project.  At the moment, you can only
administer one cluster at a time from your given local machine, since
starcluster has no notion of a "session" and it can't distinguish between
different clusters you've started up (you can *start* multiple clusters, but
then any starcluster commands that you type in your local terminal might get
confused about which amazon machine instances you're referring to, so it has
trouble administering them.)    Also, there's no dynamic load balancing, so
once you've started a cluster with a certain number of nodes, you're stuck
with that number of computers while the cluster is running, even if you're
only using a few of them or suddenly need more.

The developer of the project (*Justin Riley)* says on his website that he's
planning to add these features in the next release.    Now, I'm not the
creator or developer or maintainer of Starcluster, and I have no affiliation
with Justin Riley or the project whatsoever, so I want to make it clear I
don't speak for them in any way except as a satisfied user.  I don't know
what his commitment to his development plans are, either -- however, I hope
he sticks to his timeline, as I think continuing the vigorous development of
his project would be a real plus for the NumPy community.  I'm hoping that
if others in the NumPy community like his project and start using it, that
will make add to the likelihood of continued development. (If anyone from
the NumPy community is interesting in helping the developer out, perhaps you
should consider shooting him an email.)

Anyhow, I apologize for this long email, and hope it may be of use to
somebody!

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091119/d4dce3bc/attachment.html>