Working with various compressed files on a daily basis, I found I didn’t actually know how the different tools performed compared to each other. I know different compression will best fit different types of data, but I wanted to compare using a large generic file.
The setup
The file I chose was a 4194304000 byte (4.0 GB) Ubuntu installation disk image.
The machine tasked with doing of the bit-mashing was an Ubuntu with a AMD Ryzen 9 5900X 12-Core CPU, 64 GB RAM, and WD Black SN850 NVMe drive.
Since the execution time will vary a lot between systems, the timing values themselves are not interesting. It’s the differences between them I was interested in, in addition to how small the files got and how much processing power was used.
Maximize compression
First test was to “maximize” compression, so I selected options for this.
- gzip and bzip2 are not multi-threaded, so cannot take advantage of the 24 virtual cores.
- xz and zstd support multi-threading, but it’s not the default.
- lzma was supposed to support multi-threading, but I didn’t get it to use more than one core.
- pigz is multi-threaded by default, which is nice.
Method | File size | Size % | Peak memory usage | User time | System time | Wall time | CPU usage |
---|---|---|---|---|---|---|---|
gzip -9 | 519492454 | 12.4% | 2.0 MB | 167.92 sec | 1.22 sec | 2:49.16 | 99% |
bzip2 -9 | 461727440 | 11.0% | 8.7 MB | 86.21 sec | 1.64 sec | 1:27.86 | 99% |
xz -9 -e -T 0 | 300608284 | 7.2% | 17.3 GB | 563.23 sec | 10.17 sec | 1:34.13 | 609% |
lzma -9 -e -T 0 | 299004234 | 7.1% | 675.8 MB | 465.10 sec | 2.17 sec | 7:47.30 | 99% |
zstd –ultra -22 -T0 | 320193687 | 7.6% | 9.2 GB | 496.78 sec | 6.10 sec | 3:45.29 | 223% |
pigz -9 | 519405554 | 12.4% | 21.3 MB | 187.17 sec | 6.02 sec | 0:11.80 | 1636% |
“Peak memory usage” is the “Maximum resident set size” value from /usr/bin/time -v
.
What I find interesting is the memory usage for xz
, and how much more CPU pigz
can utilize.
There’s also a clear correlation between CPU usage and time spent.
Default compression
Mostly one use the tools with default settings, so the next test were run without command line arguments. Not maximizing compression will greatly speed the process, but single-core default affected xz
and zstd
. (As mention above, lzma
didn’t use more than one core anyway.).
Method | File size | Size % | Peak memory usage | User time | System time | Wall time | CPU usage |
---|---|---|---|---|---|---|---|
gzip | 522133876 | 12.4% | 2.1 MB | 48.67 sec | 1.28 sec | 0:49.96 | 99% |
bzip2 | 461727440 | 11.0% | 8.7 MB | 85.54 sec | 1.61 sec | 1:27.16 | 99% |
xz | 310830856 | 7.4% | 95.8 MB | 269.40 sec | 1.51 sec | 4:30.94 | 99% |
lzma | 311065247 | 7.4% | 95.7 MB | 265.00 sec | 1.92 sec | 4:26.95 | 99% |
zstd | 428213344 | 10.2% | 51.0 MB | 4.84 sec | 1.75 sec | 0:05.21 | 126% |
pigz | 521830464 | 12.4% | 21.4 MB | 67.71 sec | 5.98 sec | 0:05.75 | 1280% |
It is no doubt that zstd
is the clear winner! pigz
does match on the wall time, but use ten times the CPU to do so.
Comparison of multi-core default compression
To better compare the multi-threaded compression, I ran another test with default settings.
Method | File size | User time | System time | Wall time | CPU usage |
---|---|---|---|---|---|
xz -T0 | 315784172 | 498.57 sec | 4.00 sec | 0:25.06 | 2005% |
zstd -T0 | 428213344 | 6.43 sec | 0.83 sec | 0:01.39 | 522% |
pigz | 521830464 | 64.89 sec | 5.52 sec | 0:05.16 | 1363% |
Again, zstd
is the clear winner. Thanks to disk cache, zstd
complete in just above one second! Without disk cache, the machine used 4 seconds to cat
the file to /dev/null
.
Decompression times
Fast compression is not the full picture. We also have to consider the decompression time.
Method | User time | System time | Wall time | CPU usage |
---|---|---|---|---|
gunzip | 14.46 sec | 2.98 sec | 0:17.44 | 99% |
bunzip2 | 32.58 sec | 3.79 sec | 0:36.37 | 99% |
xz -d | 16.43 sec | 1.17 sec | 0:17.60 | 99% |
lzma -d | 13.28 sec | 1.25 sec | 0:14.53 | 99% |
zstd -d | 1.73 sec | 1.15 sec | 0:02.89 | 99% |
pigz -d | 11.70 sec | 5.20 sec | 0:08.26 | 204% |
Conclusion
I don’t think we stop using gzip
and bzip2
for a long while yet. While xz
and lzma
have impressive compression levels, the time spent will probably limit their usefulness. The lack of working multi-threading for lzma
doesn’t help either.
pigz
is an interesting replacement for gzip
, especially on systems with several CPU cores, but since it’s not installed by default (at least on Ubuntu), it’s use will probably be limited. That said, if it’s installed, it’s no reason not to use it instead of ‘gzip’, as they work on the same format.
zstd
is the tool I will use more in the future. It is impressively fast even without multi-core, and the compression is still better than bzip2
.
Practical use case
A not uncommon task is to pack a directory to a single compressed file. Traditionally, one could use tar
for this, like tar czf packedfile.tar.gz directory
, which will result on a gzip
compressed tar
file. Well, “modern” tar
support more compression methods, and can even auto-detect what you want based on the file extension. So the alternative command could then look like tar caf packedfile.tar.zst directory
, which will result in a zstd
compressed file instead, which will probably be a bit smaller, and a lot faster.