[pydotorg-www] [Tracker-discuss] offline 10-10-18

Izak Burger izak at upfrontsystems.co.za
Tue Oct 19 09:30:11 CEST 2010


On 18/10/2010 23:29, anatoly techtonik wrote:
> Thanks.
> But I am still curious about what happened and how many hours tracker was down.
> Is there a monitor to mail here if anything went offline?

We have nagios monitoring on that machine. The only problem is, nagios 
runs on nina, and nina is the host to the psf virtual host. So sometimes 
they go down at the same time, although it usually goes for months 
without any problems at all.

Also, it sometimes goes down at night, and I will only notice the next 
morning.

But... we need to do something about that machine anyway. We need to 
upgrade it, which basically means we need to get a new one, migrate it, 
change the ip address, etc etc. Its on my todo list, but I need to do 
the costing again and discuss it with Roché.

I can answer the first of your questions immediately. The instability is 
not repeatable. The only real thing I did find is that there appears to 
be issues with the ext3 file system driver, that will occasionally cause 
a corrupted journal and cause segfaults or it will report a full file 
system when it isn't. I therefore dropped the journal, as it is in fact 
more stable on ext2. To make matters worse, the same thing happened on 
my laptop just a few days ago, with a fairly new kernel, so it seems 
there is an as yet unreported bug in ext3. But I'm just speculating.

regards,
Izak


More information about the pydotorg-www mailing list