[borgbackup] Chunker params for very large files
Thomas Waldmann
tw at waldmann-edv.de
Fri Aug 21 07:48:12 EDT 2015
Hi Alex,
> Hello, what would be a good chunker setting for a handful (15) files
> with sizes from 20 GB to 150 GB to a total of 2.3 TB per day? These
> are database backups that cannot be made incremental.
That depends a bit on your goals.
If you have enough space and you rather care for good speed, little
management overhead (but not so much about deduplicating with very fine
grained blocks), use a higher value for HASH_MASK_BITS, like 20 or 21,
so it creates larger chunks in the statistical medium. It sounds like
this matches your case.
If you care for very fine grained deduplication and you maybe don't have
that much data and you can live with the management overhead, use a
small chunksize (small HASH_MASK_BITS, like the default 16).
> An existing recommendation of 19,23,21,4095 for huge files from
> https://borgbackup.github.io/borgbackup/usage.html appears to
> translate into:
>
> minimum chunk of 512 KiB
> maximum chunk of 8 MiB
> medium chunk of 2 MiB
>
> In a 100GB file we are looking at 51200 chunks.
You need to take the total amount of your data (~2TB) and compute the
chunk count (1.000.000). Then use the resource formula from the docs and
compute the sizes of the index files (and RAM needs).
In your case this looks quite reasonable, you could also use 1MB chunks,
but better don't use 64KB chunks.
> beneficial to raise these further? The machine I have doing this has
> plenty of RAM (32 GB) and 8 CPU cores at 2.3 GHz, so RAM/compute is
> not a problem.
Right. But if your index is rather big, it'll need to copy around a lot
of data (for transactions, for resyncing the cache in case you backup
multiple machines to same repo).
Cheers, Thomas
----
GPG Fingerprint: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393
Encrypted E-Mail is preferred / Verschluesselte E-Mail wird bevorzugt.
More information about the Borgbackup
mailing list