[Baypiggies] clustering

Wed Aug 30 23:42:36 CEST 2006

Hey Guys,

I need to do some data processing, and I'd like to use a cluster so
that I don't have to grow old waiting for my computer to finish.  I'm
thinking about using the servers I have locally.  I'm completely new
to clustering.  I understand how to break a problem up into
paralizable pieces, but I don't understand the admin side of it.  My
current data set is about 16 gigs, and I need to do things like run
filters over strings, make sure strings are unique, etc.  I'll be
using Python wherever possible.

* Do I have to run a particular Linux distro?  Do they all have to be
the same, or can I just setup a daemon on each machine?

* What does "Beowulf" do for me?

* How do I admin all the boxes without having to enter the same command n times?

* I've heard that MPI is good and standard.  Should I use it?  Can I
use it with Python programs?

* Is there anything better than NFS that I could use to access the data?

* What hip, slick, and cool these days?

I just need you point me in the right direction and tell me what's
good and what's a waste of time.

Thanks,
-jj