[Tutor] Corrupt file(s) likelihood and prevention?

Alan Gauld alan.gauld at yahoo.co.uk
Fri Jan 31 07:15:40 EST 2020


On 31/01/2020 03:56, boB Stepp wrote:
> I just finished reading a thread on the main list that got a little
> testy.  The part that is piquing my interest was some discussion of
> data files perhaps getting corrupted.  I have wondered about this off
> and on.  First what is the likelihood nowadays of a file becoming
> corrupted, 

The likelihood is 100% - if you wait long enough!
Corruption is caused by several things including the physical
deterioration of the media upon which it is stored. It can be
due to sun spot activity, local radiation sources, high
magnetic fields and heat related stress on the platters.

In addition there is corruption due to software misoperation.
Buffers filling or overflowing, power outages at the wrong
moment(either to the computer itself or to a card due to vibration)

The bottom line is that corruption is inevitable eventually.

But the average life of a file on a single device is not infinite,
and most files get moved around (another potential cause of
corruption of course!) Every time you copy or move a file to
a new location you effectively start the clock from zero.

And most files time out in their value too. So you get to
a point where nobody cares.

But if you do care, take backups. Corruption is real and not
going to go away any time soon. Modern storage devices and
technologies are vastly more reliable than they were 50 years
ago, or even 20 years ago. But nothing is foolproof.

> and, second, what steps should one take to prevent/minimize
> this from happening?  

Reliable disks, RAID configured, use checksums to detect it.
And backup, then backup your backups. And test that your
retrieval strategy works. (It is astonishing how many times
I've found sites where they took regular backups but never
actually tried restoring anything until it was too late.
Then discovered their retrieval mechanism was broken...)

> I notice that one of the applications I work
> with at my job apparently uses checksums to detect when one of its
> files gets modified from outside of the application.

A wise precaution for anything that needs to be checked, either for
modification by external systems or where any corruption is critical.
Remember that corruption is not always (not often!) fatal to a file.
If its just a single byte getting twiddled then it might just change
a value - Your bank balance switches from $1000 to $9000 or vice versa,
say! Checksums are about the only reliable way to detect those kinds of
corruption.
 But in the real world don't stress too much. It is a rare
occurrence and I've only had it cause real problems a dozen
or so times in 40 years. But when computers hold millions of
files, many of which are millions or even billions of bytes,
some corruption is bound to occur eventually.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list