[Baypiggies] Silent data corruption paper...

Stephen McInerney spmcinerney at hotmail.com
Tue Mar 17 09:22:33 CET 2009



On a sidebar you see the same concept in processors for use in satellites or 
other rad-hard application, anywhere where alpha particles are likely to
flip a register in the hardware.
The common design technique is Triple-Mode Redundancy (TMR) i.e.
replicating the circuit three times, rule-of-three-voting etc.
(Read the legal disclaimers on what applications Xilinx FPGA chips
(which use CMOS memory for configuration) cannot be certified for if you
want a good laugh... that's why those applications use antifuse instead)

As for memory in enterprise computing, you always use ECC protection on it.

Another amusing story from hardware certification was that in occasional cases
the actual semiconductor package of a chip (such as packages containing lead)
contained trace heavy metals which themselves produced a non-negligible level
of alpha-particle radiation (possibly higher than background radiation, due to 
proximity), and that was hard to certify. (But sometimes you're stuck with
using lead for its superior thermal conductivity)

Stephen

> Date: Mon, 16 Mar 2009 23:59:14 -0700
> From: jjinux at gmail.com
> To: charles.merriam at gmail.com
> CC: Baypiggies at python.org
> Subject: Re: [Baypiggies] Silent data corruption paper...
> 
> On Mon, Mar 16, 2009 at 11:09 PM, Charles Merriam
> <charles.merriam at gmail.com> wrote:
> > I owed this to someone at the last meeting:
> > http://fuji.web.cern.ch/fuji/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf
> >
> > Highlights:   Desktop disks get a random silent corruption 10x more
> > than enterprise hard drives.  Everyone gets them occasionally.  Figure
> > a bit gets switched every couple of months in a data center, even with
> > all the backups, checksums, etc.
> 
> I remember hearing that Google operated at such a large scale that
> these sorts of things tended to catch up with them.  Their approach
> was to use more redundancy.
> 
> I'm regurgitating things I've heard.
> 
> -jj

_________________________________________________________________
Windows Live™: Life without walls.
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_allup_1a_explore_032009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20090317/b422bf46/attachment.htm>


More information about the Baypiggies mailing list