[Borgbackup] Reasons of remove --compression-from
Marcin Zajączkowski
mszpak at wp.pl
Sun Mar 24 19:21:29 EDT 2019
Thanks for your reply.
On 2019-03-24 23:08, Thomas Waldmann wrote:
>> I wonder what was the reason and if Borg can by any chance do it in some
>> other way as of 1.1/1.2?
>
> The reason was that "auto" is way easier from a user perspective and not
> expensive on the CPU.
>
> The previously planned feature would have required to maintain a file
> with a list of file formats and advised compression algorithms, reliably
> detect file format (which is an non-trivial issue by itself) and then
> still having an issue with file formats that might be compressed or not
> internally.
In fact, auto is much easier to use, however, it is much less flexible.
Especially the ratio determining if file should be compressed seems to
be hardcoded in code to 0.97:
https://github.com/borgbackup/borg/blob/2b16fc9039660abba0ce6f5a25ae9c0f31ad48f5/src/borg/compress.pyx#L317
It is not the best value in some situations. My test case - 760MB of JPG
photos. Two different empty repos on /tmp/ (in memory) with AES encryption.
Without compression:
> time -p borg create --progress --compression none --list --filter=AME
--stats ...
> ...
> Duration: 15.18 seconds
> Number of files: 128
> Utilization of max. archive size: 0%
> ------------------------------------------------------------------------------
> Original size Compressed size Deduplicated size
> This archive: 761.66 MB 761.68 MB 761.68 MB
> All archives: 761.66 MB 761.68 MB 761.68 MB
>
> Unique chunks Total chunks
> Chunk index: 411 411
> ------------------------------------------------------------------------------
> real 16.20
> user 10.43
> sys 0.89
With auto,lmza:
> $ time -p borg create --progress --compression auto,lzma --list --filter=AME --stats ...
> ...
> Duration: 1 minutes 53.94 seconds
> Number of files: 128
> Utilization of max. archive size: 0%
> ------------------------------------------------------------------------------
> Original size Compressed size Deduplicated size
> This archive: 761.66 MB 757.23 MB 757.23 MB
> All archives: 761.66 MB 757.23 MB 757.23 MB
>
> Unique chunks Total chunks
> Chunk index: 421 421
> ------------------------------------------------------------------------------
> real 115.02
> user 102.09
> sys 3.36
16 vs 115 seconds is noticeable. Especially that I have GBs of photos.
Why lzma? Over time, it is better for me to have smaller size of the
backup (to keep more snapshots) over backup duration (it can be done "in
background").
As a workaround, I could try to divide my data into more groups, but it
is much less convenient to manage, especially that for example 7z
archives can be placed in the same directory structure as "normal" files.
I suspect the ratio is to high in my case. I would like to have an
ability to change it from a command line. However, even better would be
an ability to define, at least, a list of extensions that should be
ignored from compression (as a lighter version of the removed mechanism).
What do you think about that?
Marcin
--
https://blog.solidsoft.info/ - Working code is not enough
More information about the Borgbackup
mailing list