[Borgbackup] Central backup and ~/.cache/borg/ piling up

Sun Aug 30 14:22:12 EDT 2020

On Sun, 30 Aug 2020 19:14:55 +0200
Marian Beermann <public at enkore.de> wrote:

Oops, answering to the ML finally landed for your eyes only, back online.

> >> Since --no-cache-sync is marked experimental, I want to ask, what
> >> other downsides one can expect, and what experiences others have had
> >> using these flags, or otherwise tackling the issue in question.
> >
> > IIUC the doc, each and every backup of your's is re-processed
> > (almost?) the same as if it was the first one - this doesn't look
> > optimal at all; so, using these options should be reduced to very
> > special cases (I guess Thomas could enlighten us about which they
> > could be).
> >
> > In a word, you lose (very much) time and bloat the network for a
> > minor sparing - not to mention that if you have a lot of machines to
> > backup, the whole thing is taking much too long to achieve.
> >
> > So, IF my doc interpretation's correct, you lose dozens of time
> > whatever you hope to gain using --no-cache-sync, namely time and
> > network availability.
>
> I originally wrote the code for --no-cache-sync a few years ago, but I
> didn't remember what it does, so I re-read the code [1] to refresh my
> memory :)

Hehe, this is where a clean code thoroughly commented is (possibly)
reaching you ;-)

> --no-cache-sync uses the local cache *if* it is in sync. Otherwise it
> downloads a list of all chunks in the repository (that's about "Unique
> chunks" * 32 bytes in network traffic, in your case 31886048 * 32 bytes
> = 1 GB!) and uses that for deduplication. Because it only has the the
> ID of chunks, but doesn't know their size or compressed size, the stats
> will be all wrong (all deduped chunks will show up with a size of zero
> bytes). This might result in some situations (archives being deleted
> etc.) in a cache sync (--possibly on another host) to download actual
> data chunks to figure their (uncompressed) size out.

Thanks for these clarifications, Marian.

Doc is relatively clear about that, once you've read about "borg create"
and "why-it-is-not-such-a-good-idea-to-put-all-machines-in-the-same-repo"
(away from the compression thing).

> IIRC we added it to see if it would work in practice or not. I dunno if
> someone tried it in a bigger scenario and what the results there might
> have been.

Wild guess : terrible on small machines, such as laptops, that have
trouble to manage between (rust) disk R/W and Ethernet I/O when both are
busy and in any case (ie: when you use a SSD), network bloating - which
can be a hassle when you've got a lot of machines backupetting each one
to it's own repo on the same backup server.
Add it the data traffic and you end up with slow motion backups.

Jean-Yves