[Borgbackup] Storage, CPU, RAM comparisons

MRob mrobti at insiberia.net
Mon May 4 13:55:18 EDT 2020


Dmitry thank you for your perspective,

>> Maybe borg improved from attic, but its not the point. Borg, attic,
>> duplicacy based on deduplication using massive higher storage space in
>> relation to duplicity (and rdiff-backup?). I dont' understand why
>> file-based delta is more storage efficient then deduplication which 
>> can
>> consolidate chunks from all files in the repo. I expect the opposite
>> storage use ratio comparison.
> 
> In real-life scenarios when taking full backups could be prohibitively
> expensive (especially scenarios like massive amounts of data over
> high-latency low-speed links), rdiff-backup/duplicity approach becomes
> simply unviable, disk space saving or not, because you either have to 
> keep
> all backups infinitely long (and eventually run out of storage space) 
> or
> you need to pay the price of full backup once in a while (which will 
> likely
> overshadow all the time/disk space savings you have made previously).

I understand. For my case latency is not a concern. Also most files are 
text (for server backup not personal media). I use rsnapshot so I can 
take backup strategys for 1y, 6m, daily, etc. It is hard-link base so it 
is not so bad for disk space but I am exploring better choices, so far 
Borg looks best.

Yet I still want to understand if it is true or not true that 
deduplication would reduce disk space requirement. Isn't it the purpose 
of deduplication? Even if compression choice was not fairly evaluated in 
that comparison, why doesn't deduplication plus (worse) compression come 
closer to duplicity?

My first test showing with text data the default compression works nice 
(near 40% reduction) but common chunks is not very good (I think "total 
chunks" minus "unique chunks" right?), a very small reduction (1%) from 
deduplication. First time transfer was also slow (even on fast local 
link) but if its a one time operation thats ok.

> So some fraction of extra space taken by bog will be manifests that tie
> blocks in the storage to files that use them.

I don't mind overhead but wnat to have clear understand of the costs and 
benefit. Your opinion is for big picture having variety of backup 
snapshots. I agree to take the big perspective but also can you help me 
understand the details picture, if deduplication does better with 
storage than others.

Compare to hardlink setup like rsnapshot where changed files cause 
entire new file to be kept, I expect because common blocks are kept only 
once the storage reduction would be massive(? is it correct???) But 
similar to be true for rdiff tools that only store deltas. Evaluating 
features of borg vs. rdiff-backup/duplicity borg is clear winner but the 
large difference in storage cost I saw in that comparison is too big to 
ignore.

Another question: after for example a year of backup does it require 
alot more CPU/RAM to compute chunk deltas? What are those costs?


More information about the Borgbackup mailing list