Shannon -jj Behrens
jjinux at gmail.com
Wed Aug 30 23:42:36 CEST 2006
I need to do some data processing, and I'd like to use a cluster so
that I don't have to grow old waiting for my computer to finish. I'm
thinking about using the servers I have locally. I'm completely new
to clustering. I understand how to break a problem up into
paralizable pieces, but I don't understand the admin side of it. My
current data set is about 16 gigs, and I need to do things like run
filters over strings, make sure strings are unique, etc. I'll be
using Python wherever possible.
* Do I have to run a particular Linux distro? Do they all have to be
the same, or can I just setup a daemon on each machine?
* What does "Beowulf" do for me?
* How do I admin all the boxes without having to enter the same command n times?
* I've heard that MPI is good and standard. Should I use it? Can I
use it with Python programs?
* Is there anything better than NFS that I could use to access the data?
* What hip, slick, and cool these days?
I just need you point me in the right direction and tell me what's
good and what's a waste of time.
More information about the Baypiggies