[Borgbackup] Compression and De-duplication

Tue Aug 15 15:18:30 EDT 2017

Many thanks - much appreciated!

Carl.

> I'm interested to know at what point compression occurs within the
> backup process - does borg perform a compress on a file, then split into
> chunks to check for de-duplication, or does it split a file into chunks
> for deduplication, then compress those chunks when saving them to disk?

borg create:
read-from-file and chunk, id-hash, compress, encrypt, add-auth,
store-to-repo

borg extract: fetch-from-repo, check-auth, decrypt, decompress,
write-to-file

> The reason I ask is that I use hyperconverged storage at work (i.e.
> storage within our VMWare environment is de-duplicated real-time) - and
> one of the things pointed out by the vendor is that de-duplication is
> severely affected by compression - i.e. if you compress a file, then try
> to de-dup, the de-duplication can't locate similar blocks.

Yes, that's true for stream compression, like tar.gz.

Not so much for "rsyncable" compression.

Not sure what's the problem with your storage and borg though.

If your source files are on that storage, it won't matter for borg.

If your repo is on that storage, your hyperconverged-dedup won't work
due to encryption. But it doesn't need to work because a borg repo
internally is already deduped (by borg).

So, no problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20170816/79f7a1f1/attachment.html>