[Borgbackup] Compression and De-duplication

Mon Aug 14 14:13:47 EDT 2017

Hi

Apologies if this is answered elsewhere, but I've not seen an answer in the
docs, nor a scan of the mailing list archives...

I'm new to borg, having moved across from obnam, as maintenance is in the
process of stopping - so I'm in the process of setting up a new backup
regime.

I'm interested to know at what point compression occurs within the backup
process - does borg perform a compress on a file, then split into chunks to
check for de-duplication, or does it split a file into chunks for
deduplication, then compress those chunks when saving them to disk?

The reason I ask is that I use hyperconverged storage at work (i.e. storage
within our VMWare environment is de-duplicated real-time) - and one of the
things pointed out by the vendor is that de-duplication is severely
affected by compression - i.e. if you compress a file, then try to de-dup,
the de-duplication can't locate similar blocks. For example, if you have a
large file, and modify the start of a file, when the compression is
performed, the whole file is different from the original (compressed)
version. Whereas, leaving the file in an uncompressed state, only the
blocks that are changed are stored, and so you end up with a greater
compression ratio, even though this appears counter-intuative.   A classic
case is when you perform a backup of a database file.

Hopefully the above makes sense.

Many thanks

Carl.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20170815/67b8a6c8/attachment.html>