[Borgbackup] Storage, CPU, RAM comparisons
MRob
mrobti at insiberia.net
Mon May 4 13:55:18 EDT 2020
Dmitry thank you for your perspective,
>> Maybe borg improved from attic, but its not the point. Borg, attic,
>> duplicacy based on deduplication using massive higher storage space in
>> relation to duplicity (and rdiff-backup?). I dont' understand why
>> file-based delta is more storage efficient then deduplication which
>> can
>> consolidate chunks from all files in the repo. I expect the opposite
>> storage use ratio comparison.
>
> In real-life scenarios when taking full backups could be prohibitively
> expensive (especially scenarios like massive amounts of data over
> high-latency low-speed links), rdiff-backup/duplicity approach becomes
> simply unviable, disk space saving or not, because you either have to
> keep
> all backups infinitely long (and eventually run out of storage space)
> or
> you need to pay the price of full backup once in a while (which will
> likely
> overshadow all the time/disk space savings you have made previously).
I understand. For my case latency is not a concern. Also most files are
text (for server backup not personal media). I use rsnapshot so I can
take backup strategys for 1y, 6m, daily, etc. It is hard-link base so it
is not so bad for disk space but I am exploring better choices, so far
Borg looks best.
Yet I still want to understand if it is true or not true that
deduplication would reduce disk space requirement. Isn't it the purpose
of deduplication? Even if compression choice was not fairly evaluated in
that comparison, why doesn't deduplication plus (worse) compression come
closer to duplicity?
My first test showing with text data the default compression works nice
(near 40% reduction) but common chunks is not very good (I think "total
chunks" minus "unique chunks" right?), a very small reduction (1%) from
deduplication. First time transfer was also slow (even on fast local
link) but if its a one time operation thats ok.
> So some fraction of extra space taken by bog will be manifests that tie
> blocks in the storage to files that use them.
I don't mind overhead but wnat to have clear understand of the costs and
benefit. Your opinion is for big picture having variety of backup
snapshots. I agree to take the big perspective but also can you help me
understand the details picture, if deduplication does better with
storage than others.
Compare to hardlink setup like rsnapshot where changed files cause
entire new file to be kept, I expect because common blocks are kept only
once the storage reduction would be massive(? is it correct???) But
similar to be true for rdiff tools that only store deltas. Evaluating
features of borg vs. rdiff-backup/duplicity borg is clear winner but the
large difference in storage cost I saw in that comparison is too big to
ignore.
Another question: after for example a year of backup does it require
alot more CPU/RAM to compute chunk deltas? What are those costs?
More information about the Borgbackup
mailing list