Notes
Go Compression Benchmark Results
This is an LLM-generated post that summarizes benchmarks used to choose compression libraries for the localtimezone lat/lng -> timezone library
A benchmark of pure-Go compression libraries against real-world binary data: H3 timezone cells from localtimezone, a library that needed fast and efficient decompression of an ~8.5 MB binary file. All ten libraries tested are pure Go with no CGo dependencies, making the results directly applicable to any Go project that needs cross-platform compression.
The benchmark code is available at github.com/albertyw/go-compression-benchmark. Full per-run result tables: data.h3 results · data_mock.h3 results.
Key Findings
Best compression ratio: XZ/LZMA2 — but at a steep cost
XZ achieves a 10.33x ratio on the 8.5 MB file, shrinking it to 848 KB. The catch: compression takes 450ms and uses 64 MB of memory with nearly a million allocations. Decompression is more reasonable at 97ms. If you compress offline and only pay the decompression cost at runtime, this can work — but the memory spike and allocation count during compression rule it out for any hot path.
Best balance: Zstd
Zstd is the standout. At SpeedFastest it achieves a 5.78x ratio in just 31ms with only 16 MB of memory and 7 allocations to decompress. Stepping up to SpeedBestCompression improves the ratio to 6.67x at the cost of 357ms compression time. Decompression is consistently fast at ~13ms across all levels. For most use cases, Zstd SpeedFastest or SpeedDefault is the right call.
Fastest compression throughput: pgzip and S2
pgzip (parallel gzip) compresses the 8.5 MB file in 5.6ms at BestSpeed by using multiple CPU cores. S2 Better is close at 4.9ms single-threaded. Both pay for this speed in ratio: pgzip gets ~5.16x, S2 Better only 2.40x. These are strong choices for pipelines where compression is on the critical path and throughput matters more than size.
Fastest decompression: Snappy and S2
Snappy decompresses in 4.6ms with just 1 allocation — the lowest overhead of any library. S2 Best is similarly fast at 5ms. These are the right choices if decompression latency is the primary concern and a lower compression ratio is acceptable.
Brotli: excellent ratio, terrible compression speed
Brotli Best (11) achieves 7.79x — the second-best ratio — but takes 10.65 seconds and 802 MB of memory to compress. Even Default (6) takes 302ms. Brotli is designed for serving pre-compressed static assets, not on-the-fly compression. Use it only when you compress once and serve many times.
Gzip stdlib: the safe default
If you want zero dependencies beyond the standard library, compress/gzip at Default gives a solid 5.36x ratio in 197ms. The klauspost drop-in replacement is faster (35ms at Default) with identical output format, making it a worthwhile swap whenever gzip interoperability is required.
Summary Table (8.5 MB file)
| Library | Level | Ratio | Compress | Decompress | Notes |
|---|---|---|---|---|---|
| XZ/LZMA2 | — | 10.33x | 450ms | 97ms | Best ratio, slow |
| Brotli | Best (11) | 7.79x | 10.65s | 36ms | Offline use only |
| Zstd | SpeedBestCompression | 6.67x | 357ms | 13ms | Best ratio+speed tradeoff |
| Zstd | SpeedFastest | 5.78x | 31ms | 13ms | Best all-around |
| Gzip stdlib | Default | 5.36x | 197ms | 24ms | Zero dependencies |
| Gzip klauspost | Default | 5.26x | 35ms | 15ms | Fast stdlib replacement |
| pgzip | BestSpeed | 5.16x | 5.6ms | 17ms | Fastest compress (parallel) |
| S2 | Better | 2.40x | 4.9ms | 7ms | Fast, low ratio |
| Snappy | — | 2.19x | 7.6ms | 4.6ms | Fastest decompress |
When to use each library
- Zstd — the default choice for new projects. Excellent ratio across all speed levels, fast decompression, low allocations, and a stable RFC-standardized format.
- XZ/LZMA2 — when ratio is paramount, compression is offline, and you can tolerate slow compression and high memory usage.
- Brotli — HTTP serving of static assets (CSS, JS, fonts). Pre-compress and cache; never compress on-the-fly.
- Gzip stdlib — when you need zero non-stdlib dependencies or interoperability with existing gzip consumers. Swap to klauspost for free speed gains.
- pgzip — when you need gzip output but compression throughput is a bottleneck; scales with core count.
- LZ4 — when decompression throughput is the absolute priority and you’re operating at memory-bandwidth speeds. Common in storage systems and databases.
- Snappy / S2 — lightweight, fast decompression with minimal allocations. S2 is strictly better than Snappy in pure-Go contexts.
Library Backgrounds
gzip / DEFLATE
DEFLATE is a lossless compression algorithm invented by Phil Katz in 1993 and formally specified in RFC 1951 (1996). It combines LZ77 — which replaces repeated byte sequences with back-references using a sliding window — and Huffman coding, which assigns shorter bit strings to more frequent symbols. The gzip file format was created by Jean-Loup Gailly in 1992 as a patent-free replacement for the Unix compress utility, whose LZW algorithm was encumbered by Unisys patents. DEFLATE became the backbone of a generation of internet infrastructure: it is the compression algorithm inside ZIP archives, PNG images, TLS connections, and HTTP content-encoding. The Go standard library provides compress/gzip and compress/flate; the klauspost/compress library offers drop-in replacements with assembly-optimized paths on amd64 that are substantially faster.
Zstandard (Zstd)
Zstandard was developed by Yann Collet at Facebook (now Meta) and open-sourced in August 2016. Its goal was to be a modern replacement for zlib that improves on all metrics simultaneously — compression speed, decompression speed, and ratio. Like DEFLATE it uses LZ77-style dictionary matching, but pairs it with a larger search window and a fast entropy coder based on Finite State Entropy (FSE), a variant of Asymmetric Numeral Systems (ANS). The algorithm was standardized as RFC 8478 in 2018. Adoption has been sweeping: Facebook uses it across its entire data infrastructure, the Linux kernel adopted it for module and filesystem compression, Fedora switched RPM package compression to Zstd in 2019, and Chrome and Firefox both added Content-Encoding: zstd HTTP support in 2024.
Brotli
Brotli was created at Google by Jyrki Alakuijala and Zoltán Szabadka in 2013, originally to reduce the size of WOFF2 web font transfers. Unlike Google’s earlier Zopfli (a superior DEFLATE compressor), Brotli introduced an entirely new format using a modern LZ77 variant, Huffman coding, second-order context modeling, and a large static dictionary of common words drawn from web content. It was generalized for HTTP content-encoding and standardized as RFC 7932 in 2016. Brotli is today the dominant HTTP compression algorithm: all major browsers support it, and Cloudflare, Akamai, AWS CloudFront, Nginx, and Apache all serve it. The WOFF2 font format — which depends on Brotli — received a Technology and Engineering Emmy Award in 2021. Its extreme compression ratios come at a steep compression-time cost, making it best suited for pre-compressing static assets offline rather than on-the-fly.
Snappy
Snappy (originally called “Zippy” internally) was developed at Google by Jeff Dean and Sanjay Ghemawat and open-sourced in March 2011. It was designed not for maximum ratio but for very high throughput, targeting CPU-bound scenarios inside Google’s own infrastructure — MapReduce, Bigtable, and internal RPC systems — where decompression speed is the bottleneck. The algorithm is LZ77-inspired and deliberately avoids entropy coding, accepting a lower ratio in exchange for simplicity and speed. Its wide adoption in open-source infrastructure is notable: Snappy is the default compression algorithm for MongoDB, RocksDB, LevelDB, Apache Cassandra, Hadoop, Apache Parquet, and InfluxDB.
S2
S2 is an extension and improvement of the Snappy format developed by Klaus Post as part of his klauspost/compress Go library, first introduced in August 2019. While Snappy already prioritized speed, S2 redesigns the block format and encoding strategy to simultaneously improve both ratio and throughput. S2 can decompress all valid Snappy data (backward compatible as a reader), but its own output is not readable by the original Snappy library — though it can optionally emit Snappy-compatible output at higher speed than Snappy itself. On typical machine-generated data, S2 in default mode can reduce compressed size by up to 35% compared to Snappy while improving decompression speed. On AMD64 with assembly-optimized paths, S2 stream compression exceeds 10 GB/s.
LZ4
LZ4 was developed by Yann Collet (who later also created Zstandard) and first released in April 2011. Its singular design goal is extreme speed: compression throughput routinely exceeds 500 MB/s per core and decompression can exceed 1 GB/s per core, making it one of the fastest compressors ever published. Like its LZ-family relatives it uses dictionary matching, but with a deliberately simple scheme that minimizes branch mispredictions and memory accesses. LZ4 trades ratio for that speed and is not competitive with gzip or Zstd on ratio. It was integrated into the Linux kernel in version 3.11 for SquashFS, pstore, and crypto layer compression, and ZFS on Linux, FreeBSD, and macOS supports it for transparent filesystem compression.
XZ / LZMA2
LZMA (Lempel–Ziv–Markov chain algorithm) was developed by Igor Pavlov starting in 1998 and became the compression engine powering the 7-Zip archiver’s 7z format. LZMA achieves exceptional ratios by combining a very large dictionary (up to 4 GB), a sophisticated match finder, and range encoding (an arithmetic-coding variant). LZMA2 adds multi-threaded compression by splitting data into independently compressed LZMA streams. The xz file format and XZ Utils were released in 2009 by Lasse Collin as a bzip2 successor, and XZ became the standard for distributing Linux kernel sources and software packages across Fedora, Debian, and Ubuntu — though both have since migrated to Zstandard. XZ gained unwanted notoriety in March 2024 when a supply-chain backdoor was discovered in XZ Utils 5.6.0 and 5.6.1.
pgzip
pgzip is a pure-Go parallel gzip library developed by Klaus Post (github.com/klauspost/pgzip). It is a drop-in replacement for the standard library’s compress/gzip, producing fully standard-compliant output that any gzip reader can decompress — the parallelism is transparent to consumers. Internally, pgzip splits input into independent blocks (defaulting to 1 MB each) and compresses them concurrently across available CPU cores, then stitches the resulting gzip members together. On multi-core hardware, compression throughput scales roughly linearly with core count, and pgzip also offers a Huffman-only mode reaching ~450 MB/s per core when ratio is secondary to predictable speed. It is the natural choice when gzip compatibility is required but single-threaded gzip becomes a bottleneck.
Methodology
- Data:
sample_data/data.h3(~8.5 MB real H3 timezone binary from localtimezone),sample_data/data_mock.h3(~1.2 KB mock) - Iterations: 2 warmup + 10 measured; median values reported
- Timing and memory measured in separate passes to avoid
ReadMemStatsstop-the-world pauses distorting timing - Environment: AMD Ryzen 9 7900X, amd64, linux, Go 1.26.1
- All libraries are pure Go — no CGo dependencies