[Tutor] a par2 creator and verification program
Adam Gold
awg1 at gmx.com
Tue Jul 23 17:17:00 EDT 2019
Hello everyone. I'm thinking through a short program I want to write
that will 'par2'/generate ECCs for all of my work files which branch out
from a single directory and number approximately 15,000. Specifically:
1) day one:
- create a mirror copy of the directory tree empty of all files (there
are a bunch of ways in bash of doing this).
- recurse down the directory tree which has the files and run a par2
create calculation on each file which generates approximately 10 *.par2
fileblocks. I will then copy the *.par2 fileblocks to the mirror
directory tree into the same position as the 'principal file. Therefore
assuming 10 *.par2 fileblocks for every actual file, the mirror tree
will have around 150,000 *.par2 fileblocks (space and CPU time are a
non-issue).
2) day two:
- for each file in the primary directory, par2 verify it with respect
to its corresponding *.par2 fileblocks in the mirror tree. If it's ok,
move on to the next file, if not, repair it, generate a new set of
*.par2 fileblocks and copy them over to the mirror.
3) day three:
- same as day two, ongoing.
I'm aware that most par2 programs need the file and *.par blocks to be
in the same location but let's assume I find a way around this. Also, I
believe it would be possible to par2 the top directory (which will give
me work1.par2 - work10.par2) but the problem is performed this way, the
blocks treat all files as a single whole so if I detect corruption, I
have no way of locating which file.
I'm considering two ways of doing this:
Option A:
- This seems the most obvious if somewhat inelegant: define a few
functions, and incorporate them into a for loop which will be applied to
each file as described in 1) - 3) above.
Option B:
- I'm afraid my thinking is not entirely clear regards this option but
somehow I import metadata for every (primary) file into a list (I think
all that's needed is file name and location), perhaps even a nested list
although I'm not sure if that provides an advantage. Then I apply the
operations for 1) - 3) above sequentially per list item, the assumption
being the list data and my home made functions will be sufficient.
I've found various par2 programs on PyPi and possibly pyFileFixity could
be used but in this instance I'd rather give it a go myself. For
various reasons I can't use ZFS which would, of course, negate the need
for doing any of this. It seems this would be my consolation prize :)
More information about the Tutor
mailing list