Working with various compressed files on a daily basis, I found I didn’t actually know how the different tools performed compared to each other. I know different compression will best fit different types of data, but I wanted to compare using a large generic file.

The setup

The file I chose was a 4194304000 byte (4.0 GB) Ubuntu installation disk image.

The machine tasked with doing of the bit-mashing was an Ubuntu with a AMD Ryzen 9 5900X 12-Core CPU, 64 GB RAM, and WD Black SN850 NVMe drive.

Since the execution time will vary a lot between systems, the timing values themselves are not interesting. It’s the differences between them I was interested in, in addition to how small the files got and how much processing power was used.

Maximize compression

First test was to “maximize” compression, so I selected options for this.

  • gzip and bzip2 are not multi-threaded, so cannot take advantage of the 24 virtual cores.
  • xz and zstd support multi-threading, but it’s not the default.
  • lzma was supposed to support multi-threading, but I didn’t get it to use more than one core.
  • pigz is multi-threaded by default, which is nice.
Method File size Size % Peak memory usage User time System time Wall time CPU usage
gzip -9 519492454 12.4% 2.0 MB 167.92 sec 1.22 sec 2:49.16 99%
bzip2 -9 461727440 11.0% 8.7 MB 86.21 sec 1.64 sec 1:27.86 99%
xz -9 -e -T 0 300608284 7.2% 17.3 GB 563.23 sec 10.17 sec 1:34.13 609%
lzma -9 -e -T 0 299004234 7.1% 675.8 MB 465.10 sec 2.17 sec 7:47.30 99%
zstd –ultra -22 -T0 320193687 7.6% 9.2 GB 496.78 sec 6.10 sec 3:45.29 223%
pigz -9 519405554 12.4% 21.3 MB 187.17 sec 6.02 sec 0:11.80 1636%

“Peak memory usage” is the “Maximum resident set size” value from /usr/bin/time -v.

What I find interesting is the memory usage for xz, and how much more CPU pigz can utilize. There’s also a clear correlation between CPU usage and time spent.

Default compression

Mostly one use the tools with default settings, so the next test were run without command line arguments. Not maximizing compression will greatly speed the process, but single-core default affected xz and zstd. (As mention above, lzma didn’t use more than one core anyway.).

Method File size Size % Peak memory usage User time System time Wall time CPU usage
gzip 522133876 12.4% 2.1 MB 48.67 sec 1.28 sec 0:49.96 99%
bzip2 461727440 11.0% 8.7 MB 85.54 sec 1.61 sec 1:27.16 99%
xz 310830856 7.4% 95.8 MB 269.40 sec 1.51 sec 4:30.94 99%
lzma 311065247 7.4% 95.7 MB 265.00 sec 1.92 sec 4:26.95 99%
zstd 428213344 10.2% 51.0 MB 4.84 sec 1.75 sec 0:05.21 126%
pigz 521830464 12.4% 21.4 MB 67.71 sec 5.98 sec 0:05.75 1280%

It is no doubt that zstd is the clear winner! pigz does match on the wall time, but use ten times the CPU to do so.

Comparison of multi-core default compression

To better compare the multi-threaded compression, I ran another test with default settings.

Method File size User time System time Wall time CPU usage
xz -T0 315784172 498.57 sec 4.00 sec 0:25.06 2005%
zstd -T0 428213344 6.43 sec 0.83 sec 0:01.39 522%
pigz 521830464 64.89 sec 5.52 sec 0:05.16 1363%

Again, zstd is the clear winner. Thanks to disk cache, zstd complete in just above one second! Without disk cache, the machine used 4 seconds to cat the file to /dev/null.

Decompression times

Fast compression is not the full picture. We also have to consider the decompression time.

Method User time System time Wall time CPU usage
gunzip 14.46 sec 2.98 sec 0:17.44 99%
bunzip2 32.58 sec 3.79 sec 0:36.37 99%
xz -d 16.43 sec 1.17 sec 0:17.60 99%
lzma -d 13.28 sec 1.25 sec 0:14.53 99%
zstd -d 1.73 sec 1.15 sec 0:02.89 99%
pigz -d 11.70 sec 5.20 sec 0:08.26 204%

Conclusion

I don’t think we stop using gzip and bzip2 for a long while yet. While xz and lzma have impressive compression levels, the time spent will probably limit their usefulness. The lack of working multi-threading for lzma doesn’t help either.

pigz is an interesting replacement for gzip, especially on systems with several CPU cores, but since it’s not installed by default (at least on Ubuntu), it’s use will probably be limited. That said, if it’s installed, it’s no reason not to use it instead of ‘gzip’, as they work on the same format.

zstd is the tool I will use more in the future. It is impressively fast even without multi-core, and the compression is still better than bzip2.

Practical use case

A not uncommon task is to pack a directory to a single compressed file. Traditionally, one could use tar for this, like tar czf packedfile.tar.gz directory, which will result on a gzip compressed tar file. Well, “modern” tar support more compression methods, and can even auto-detect what you want based on the file extension. So the alternative command could then look like tar caf packedfile.tar.zst directory, which will result in a zstd compressed file instead, which will probably be a bit smaller, and a lot faster.

Are Tysland

at Redpill Linpro

Why TCP keepalive may be important

Executive summary

  • TCP connections may freeze due to network troubles, spontaneous reboots or failovers. The application should be robust enough to handle it.
  • The concept of “TCP keepalive” is one way of solving it. Linux does not come with any knobs to turn it on globally, and by default it’s turned off - but there are workarounds.
  • “TCP keepalive” needs to be configured according to the network; one may want slightly different configuration for communication between two servers ... [continue reading]

The irony of insecure security software

Published on December 09, 2024

Thoughts on the CrowdStrike Outage

Published on July 23, 2024