comparing multiple copies of terrabytes of data?

Dan Stromberg strombrg at dcs.nac.uci.edu
Mon Oct 25 14:08:29 EDT 2004


We will soon have 3 copies, for testing purposes, of what should be about
4.5 terrabytes of data.

Rather than cmp'ing twice, to verify data integrity, I was thinking we
could speed up the comparison a bit, by using a python script that does 3
reads, instead of 4 reads, per disk block - with a sufficiently large
blocksize, of course.

My question then is, does python have a high-level API that would
facilitate this sort of thing, or should I just code something up based on
open and read?

Thanks!




More information about the Python-list mailing list