[Tutor] a par2 creator and verification program

Adam Gold awg1 at gmx.com
Tue Jul 23 17:17:00 EDT 2019


Hello everyone.  I'm thinking through a short program I want to write 
that will 'par2'/generate ECCs for all of my work files which branch out 
from a single directory and number approximately 15,000.  Specifically:
1) day one:
  - create a mirror copy of the directory tree empty of all files (there 
are a bunch of ways in bash of doing this).
  - recurse down the directory tree which has the files and run a par2 
create calculation on each file which generates approximately 10 *.par2 
fileblocks.  I will then copy the *.par2 fileblocks to the mirror 
directory tree into the same position as the 'principal file.  Therefore 
assuming 10 *.par2 fileblocks for every actual file, the mirror tree 
will have around 150,000 *.par2 fileblocks (space and CPU time are a 
non-issue).
2) day two:
  - for each file in the primary directory, par2 verify it with respect 
to its corresponding *.par2 fileblocks in the mirror tree.  If it's ok, 
move on to the next file, if not, repair it, generate a new set of 
*.par2 fileblocks and copy them over to the mirror.
3) day three:
  - same as day two, ongoing.

I'm aware that most par2 programs need the file and *.par blocks to be 
in the same location but let's assume I find a way around this.  Also, I 
believe it would be possible to par2 the top directory (which will give 
me work1.par2 - work10.par2) but the problem is performed this way, the 
blocks treat all files as a single whole so if I detect corruption, I 
have no way of locating which file.

I'm considering two ways of doing this:

Option A:
- This seems the most obvious if somewhat inelegant: define a few 
functions, and incorporate them into a for loop which will be applied to 
each file as described in 1) - 3) above.

Option B:
- I'm afraid my thinking is not entirely clear regards this option but 
somehow I import metadata for every (primary) file into a list (I think 
all that's needed is file name and location), perhaps even a nested list 
although I'm not sure if that provides an advantage.  Then I apply the 
operations for 1) - 3) above sequentially per list item, the assumption 
being the list data and my home made functions will be sufficient.

I've found various par2 programs on PyPi and possibly pyFileFixity could 
be used but in this instance I'd rather give it a go myself.  For 
various reasons I can't use ZFS which would, of course, negate the need 
for doing any of this.  It seems this would be my consolation prize :)


More information about the Tutor mailing list