[Borgbackup] borg internals narrative

Wed Feb 3 13:23:34 EST 2016

I wrote this to a friend to clarify how borg internals works. I think
it'd be a great addition to the existing internals docs, but I don't
know where to start to make a pull request.

Hence this mailing list dump. Please do turn this into a patch. At
least, I think the sections in the existing internals docs have to be
reordered.

Wording and language obviously needs tweaking as well. Probably.

Not knowing what people know about backups system, i start from the
simplest ones and go in more complex examples, with a parallel to
versionning systems. In short:

 * tar is like renaming files
 * rsync and rdiff-backups are like CVS
 * bup, attic and borg are like git

Tar
---

Glues files together, sends them in a stream, compress. Uses timestamps
to add new files to tar or not, but the whole file gets sent in. To
restore, you need to read the whole tape or you may miss some bits.

Tapes. We know those. dump & restore work the same way.

Those are "incremental" in that it can backup only the newest file
changes if you want. Takes up a lot of space and is terribly slow at
restoring, but reasonably fast at backing up.

Compared to a versionning system this is a little like shoving your
files on a FTP server and renaming ".1, .2, .3" as you go.

rsync & rdiff-backup
--------------------

Kind of like tar: glues files together, sends them in a stream (which is
usually a pipe to another rsync which unglues the file and so on). Trick
is: only send the parts that changed on the stream. Use clever tricks
with timestamps and optionally whole file checksums to decide which
files to send, and then try to send only parts of the file too.

No history, unless you start doing crazy things like hardlinks,
something that was built into something we pretend is backup software
called rsnapshot.

rdiff-backup is similar, except you store the binary diffs on disk. and
for good measures, add a bunch of metadata metadata files to store
ownership and so on.

Those here are "incremental" in storage and transfer as well: only the
files that are changed are sent, but also the files are stored by
diffing with another version. It's annoying that we use the same word
for "incremental rsync" and "incremental tar" because it's an important
difference.

(In the case of rsnapshot, full copies of all versions are kept, really
wasteful.)

restoring is slow as you need to uncompress the stack of diffs as deep
as you need. rdiff stores the latest version in full and works the diffs
down, which means every backup is computationally expensive, as you need
to recompute the whole diff tree. fail again.

To compare this with versionning, this is the coming of RCS or CVS. You
have the file, and next to it you got a `,v` file that stores the
previous revisions as diffs. When you add a new version, you either add
the latest diff on top (CVS) or you recompute the whole tree
(rdiff-backup).

bup, attic and borg
-------------------

so sorry for the long introduction, but i needed something to hook
into.

i jumbled them all together bup, attic and borg because that's more or
less how i discovered their algorithms. i first found out about bup of
course, and was *amazed* at how fast it was compared to my previous
backup solution (just rsync, aka the "JWZ method"). it was comparable or
faster, which was puzzling to me.

if tar is renaming files, and rdiff-backup is CVS, then this is git. in
fact, bup is based on git, something i can't quite get my head around
it.

but we know that git has trouble storing large objects right? because
when you index a large file in git, it makes this giant checksum on the
giant file. if you change only one bit of that file, the checksum
changes and you need to rewrite the whole damn thing.

well, the above backup software, instead of doing checksums on file,
they do checksums on blocks. attic, in particular, uses a clever
algorithm called buzhash:

https://en.wikipedia.org/wiki/Buzhash

It's used in the rsync protocol, so it's nothing new. What is new
is instead of treating the backup source as a collection of files, we
treat it as a collection of blocks.

bup, for example, has a "bup split" command that can take random stdin
and split it into checksummed blocks. you can feed it block devices,
disk images, tarballs, and it will do the right thing.

I wrote the internals documentation of borg based on mailing lists
discussion and a bit of code dissecation:

http://borgbackup.readthedocs.org/en/stable/internals.html

.. I am not sure I understand it all, to be honest. The fact that
borg/attic uses msgpack should just be seen as an implementation detail,
so I think you can ignore that in the descriptions. But basically, in
borg, you have multiple archives (your different backup checkpoints),
that each containing a list of chunks, which contains metadata, which
points to other chunks.

So basically, your files are bundled up together, i guess a bit like tar
does, and then it's indexed based on a configurable, rolling chunk size,
which turns that data chunk into basically a list of indexes.

The clever thing is that all this works across multiple archives; those
chunks are shared across files, backups, you can even share them across
machines. And since one file can be many chunks, if you change a smaller
part of the file, you don't change all the chunks and the changes to the
backup are minimal. 

Here's a nasty ascii art diagram for you

+--------+ +--------+     +--------+
|archive0| |archive1| ... |archiveN|
+--------+ +--------+     +--+-----+
    |          |             |
    |          |             |
    |      +---+             |
    |      |                 |
    |      |                 |
    +------+-------+         |
    |      |       |         |
 /chunk\/chunk\/chunk\...   /maybe different chunks lists\ 
+-----------------------------------------------------------------+
|item list                                                        |
+-----------------------------------------------------------------+
    |                                                       
    +-------------------------------------+--------------+  
    |                                     |              |  
    |                                     |              |  
+-------------+                     +-------------+      |  
|item0        |                     |item1        |      |  
| - owner     |                     | - owner     |      |  
| - size      |                     | - size      |     ... 
| - ...       |                     | - ...       |         
| - chunks    |                     | - chunks    |         
+----+--------+                     +-----+-------+         
     |                                    |                 
     | +-----+----------------------------+-----------------+
     | |     |                                              |
     +-o-----o------------+                                 |
     | |     |            |                                 |
  /chunk0\/chunk1\ ... /chunkN\     /chunk0\/chunk1\ ... /chunkN'\
 +-----------------------------+   +------------------------------+
 |file0                        |   |file0'                        |
 +-----------------------------+   +------------------------------+

Notice how the metadata list itself is chunked and deduplicated.

Chunks storage is a simple key/value storage system (more or less). This
allows purging backups: when you remove an archive, if no other archive
refers to some chunk, you drop it from the storage.

Then there's encryption shoved on top of that: basically, I *think*
encryption is done by chunk: each chunk is AES-256 encrypted and, on the
result, a HMAC-SHA256 is computed (instead of the regular SHA256 key for
unencrypted backups).

Then there's all sorts of indexes and caches to make this really
fast. Oh, and chunks can be compressed of course.

a.
-- 
That's one of the remarkable things about life: it's never so bad that
it can't get worse.
                        - Calvin