[Borgbackup] A small compression test
William Kenworthy
billk at iinet.net.au
Wed Mar 8 00:03:59 EST 2023
From experience:
1. borg repos on a network file system (moosefs in my case) can be
very very slow
2. Borg has to read a complete VM image before it can calculate
checksums - and if you store the VM's on a network filesystem it is time
consuming just to read 500Mb of data in one image let alone process it
and then have to go on to do a number of other images.
3. Consider if you can avoid large VM's and use the OS files
natively on a filesystem/partition, or backup the inside of the VM
rather than the image - the borg algorithms skip files they see as not
having changed from metadata (but does do a safety recheck after a
certain number of runs - see docs). VM's by their nature have to be
read in their entirety every time to figure out what has changed, even
if its just one byte of data in it. I have found that reading a VM
images` contents a much faster operation after the first time. If (as
in my case) both the VM's and the repos are on a network filesystem, you
will need to carefully consider where the work (reading files and
calculating checksums) is to be done - reading multiple VM and storage
images of many hundreds of megabytes will take time and cant be
avoided. The good news is borg is still faster than most other backup
systems even in this scenario.
4. Consider paralleling as much as possible - running borgbackup on
multiple hosts pushing into individual repos at the same time takes only
a little longer than doing 1 backup. e.g. doing it serially is 1+1+1+1
etc., while parallel would be something like 1.5 in total. Note that in
my case, this is also leveraging the internal parallelisation of moosefs
running on a number of separate hosts.
** I found I reached the limits of my moosefs filesystem storing decades
of email, hundreds of thousands of photos, borg repos and other files
which it did quite well until I went too far for my hardware :( Moving
millions of smaller files in to loopback mounted images solved that
problem, at the expense of blowing out a 15 minute backup sequence to
many hours. Backing up the files by reading into the image made quite a
large timesaving.
BillK
On 8/3/23 02:00, Bzzzz wrote:
> On Tue, 7 Mar 2023 15:18:47 +0100
> Thorsten Schöning <tschoening at am-soft.de> wrote:
>
>> Guten Tag Bzzzz,
>> am Dienstag, 7. März 2023 um 14:42 schrieben Sie:
>>
>>> Normal : it is single threaded _and_ you have a lot more files to
>>> scan, to compare to what's in the repo and, eventually, compress.
>> The only change I'm aware of was lz4 to zstd and that doesn't
>> influence scan performance for changed files, that should be like
>> before. It only influences CPU load and compression time of changed
>> data.
> It does, as you have more compressed files in a BB file, so checksums
> are read faster than with lz4 because they're more concentrated.
>
>>> I meant think about only add changed VM chunks to the repo[...]
>> The changes per day to the VM images are larger than the changes to
>> the individually backed up files. So if X GiB are pretty fast for
>> VM-images and database dumps, I'm wondering why (X-Y) GiB of data is
>> that slow when backing up individual files. That doesn't make too much
>> sense.
> I reformulate to see if I understand correctly :
> * VM images & DB dumps are many GB of changed data and backup fast,
> * regular smaller files are not that often changed but backup slower.
>
> If I have to make a guess, I'd say that if a very few readings on
> either the client and the server, you have all what's needed for a
> VM/DB, when for regular files, that might not dwell into the same BB
> file and different areas on the HDD of the client, there's many more
> head movements (hence latency), plus BB have to calculate many more
> checksums when files are small than when they are made of big chunks.
>
> Jean-Yves
> _______________________________________________
> Borgbackup mailing list
> Borgbackup at python.org
> https://mail.python.org/mailman/listinfo/borgbackup
More information about the Borgbackup
mailing list