Working with various compressed files on a daily basis, I found I didn’t actually know how the different tools performed compared to each other. I know different compression will best fit different types of data, but I wanted to compare using a large generic file.

The setup

The file I chose was a 4194304000 byte (4.0 GB) Ubuntu installation disk image.

The machine tasked with doing of the bit-mashing was an Ubuntu with a AMD Ryzen 9 5900X 12-Core CPU, 64 GB RAM, and WD Black SN850 NVMe drive.

Since the execution time will vary a lot between systems, the timing values themselves are not interesting. It’s the differences between them I was interested in, in addition to how small the files got and how much processing power was used.

Maximize compression

First test was to “maximize” compression, so I selected options for this.

  • gzip and bzip2 are not multi-threaded, so cannot take advantage of the 24 virtual cores.
  • xz and zstd support multi-threading, but it’s not the default.
  • lzma was supposed to support multi-threading, but I didn’t get it to use more than one core.
  • pigz is multi-threaded by default, which is nice.
Method File size Size % Peak memory usage User time System time Wall time CPU usage
gzip -9 519492454 12.4% 2.0 MB 167.92 sec 1.22 sec 2:49.16 99%
bzip2 -9 461727440 11.0% 8.7 MB 86.21 sec 1.64 sec 1:27.86 99%
xz -9 -e -T 0 300608284 7.2% 17.3 GB 563.23 sec 10.17 sec 1:34.13 609%
lzma -9 -e -T 0 299004234 7.1% 675.8 MB 465.10 sec 2.17 sec 7:47.30 99%
zstd –ultra -22 -T0 320193687 7.6% 9.2 GB 496.78 sec 6.10 sec 3:45.29 223%
pigz -9 519405554 12.4% 21.3 MB 187.17 sec 6.02 sec 0:11.80 1636%

“Peak memory usage” is the “Maximum resident set size” value from /usr/bin/time -v.

What I find interesting is the memory usage for xz, and how much more CPU pigz can utilize. There’s also a clear correlation between CPU usage and time spent.

Default compression

Mostly one use the tools with default settings, so the next test were run without command line arguments. Not maximizing compression will greatly speed the process, but single-core default affected xz and zstd. (As mention above, lzma didn’t use more than one core anyway.).

Method File size Size % Peak memory usage User time System time Wall time CPU usage
gzip 522133876 12.4% 2.1 MB 48.67 sec 1.28 sec 0:49.96 99%
bzip2 461727440 11.0% 8.7 MB 85.54 sec 1.61 sec 1:27.16 99%
xz 310830856 7.4% 95.8 MB 269.40 sec 1.51 sec 4:30.94 99%
lzma 311065247 7.4% 95.7 MB 265.00 sec 1.92 sec 4:26.95 99%
zstd 428213344 10.2% 51.0 MB 4.84 sec 1.75 sec 0:05.21 126%
pigz 521830464 12.4% 21.4 MB 67.71 sec 5.98 sec 0:05.75 1280%

It is no doubt that zstd is the clear winner! pigz does match on the wall time, but use ten times the CPU to do so.

Comparison of multi-core default compression

To better compare the multi-threaded compression, I ran another test with default settings.

Method File size User time System time Wall time CPU usage
xz -T0 315784172 498.57 sec 4.00 sec 0:25.06 2005%
zstd -T0 428213344 6.43 sec 0.83 sec 0:01.39 522%
pigz 521830464 64.89 sec 5.52 sec 0:05.16 1363%

Again, zstd is the clear winner. Thanks to disk cache, zstd complete in just above one second! Without disk cache, the machine used 4 seconds to cat the file to /dev/null.

Decompression times

Fast compression is not the full picture. We also have to consider the decompression time.

Method User time System time Wall time CPU usage
gunzip 14.46 sec 2.98 sec 0:17.44 99%
bunzip2 32.58 sec 3.79 sec 0:36.37 99%
xz -d 16.43 sec 1.17 sec 0:17.60 99%
lzma -d 13.28 sec 1.25 sec 0:14.53 99%
zstd -d 1.73 sec 1.15 sec 0:02.89 99%
pigz -d 11.70 sec 5.20 sec 0:08.26 204%

Conclusion

I don’t think we stop using gzip and bzip2 for a long while yet. While xz and lzma have impressive compression levels, the time spent will probably limit their usefulness. The lack of working multi-threading for lzma doesn’t help either.

pigz is an interesting replacement for gzip, especially on systems with several CPU cores, but since it’s not installed by default (at least on Ubuntu), it’s use will probably be limited. That said, if it’s installed, it’s no reason not to use it instead of ‘gzip’, as they work on the same format.

zstd is the tool I will use more in the future. It is impressively fast even without multi-core, and the compression is still better than bzip2.

Practical use case

A not uncommon task is to pack a directory to a single compressed file. Traditionally, one could use tar for this, like tar czf packedfile.tar.gz directory, which will result on a gzip compressed tar file. Well, “modern” tar support more compression methods, and can even auto-detect what you want based on the file extension. So the alternative command could then look like tar caf packedfile.tar.zst directory, which will result in a zstd compressed file instead, which will probably be a bit smaller, and a lot faster.

Are Tysland

at Redpill Linpro

Att bana väg för öppen källkod i offentlig sektor

Open source i offentlig sektor - utmaningar, möjligheter och vägen framåt.

Summering

Denna artikel beskriver processen och lärdomarna från att släppa ett API som öppen källkod inom offentlig sektor. Projektet, som utvecklades för digitala nationella prov (“DNP”), visar hur öppen källkod kan stärka samarbete, transparens och innovation. Artikeln lyfter fram både möjligheter och utmaningar – från säkerhet och juridiska aspekter till kulturellt motstånd – och ger insikter för andra myndigheter som överväger liknande initiativ.

Slutsatsen är att öppen källkod ... [continue reading]

Why automate Ansible

Published on January 14, 2025

Why TCP keepalive may be important

Published on December 17, 2024