GridDB v5.6 Compression Type (ZSTD). Querying much faster?

Question

GridDB 5.6 has a new compression method that I wanted to test. I made a simple test where I ingested X amount of rows and tested the compression method against the old compression available prior to 5.6 (ZLIB) and against no compression.

The results were what you would expect: no compression had the highest data footprint for 100,000,000 rows, ZLIB next, and finally, the new compression method ZSTD had the smallest footprint.

I also tested the query speed of these compression methods and to my surprise, the one with the smallest footprint (ZSTD) also had the quickest lookup times.

I am curious as to how this could be -- from my understanding, there must be some tradeoff when doing a more advanced method of compressing similar data. I'd at the very least expect that the newest compression method would be on par with ZLIB but with a smaller footprint.

And now for the results. As explained above, I inserted 100m rows of 'random' data and timed ingestion, the directory size of the data dir, and lookup times. Here are the results:

	NO_COMPRESSION	COMPRESSION_ZLIB	COMPRESSION_ZSTD
Search (ms)	32644	20666	11475
Agreggation (ms)	30261	13302	8402
Storage (gridstore)	11968312 (17GB)	7162824 (6.9GB)	6519520 (6.3GB)
Storage (/data)	17568708 (12GB)	1141152 (1.1GB)	1140384 (1.1GB)
Insert (m:ss.mmm)	14:42.452	15:02.748	15:05.404

If anybody has any insight into this perplexing compression issue, please share any expertise.

score 2 · Answer 1 · answered Aug 28 '24 at 15:12

Your system seems to be I/O-bound, judging by the fact that compression has very little overhead when inserting data -- most of the time is spent on I/O.

Subsequently, reading compressed data off that relatively slow disk is more efficient because there are fewer blocks to read, which results in the significant improvement of query performance, compared to an uncompressed database.

Read performance difference between zstd and zlib comes from the fact that the former has much better decompression performance:

Image source: https://engineering.fb.com/2016/08/31/core-infra/smaller-and-faster-data-compression-with-zstandard/

GridDB v5.6 Compression Type (ZSTD). Querying much faster?

1 Answers1