GridDB 5.6 has a new compression method that I wanted to test. I made a simple test where I ingested X amount of rows and tested the compression method against the old compression available prior to 5.6 (ZLIB) and against no compression.
The results were what you would expect: no compression had the highest data footprint for 100,000,000 rows, ZLIB next, and finally, the new compression method ZSTD had the smallest footprint.
I also tested the query speed of these compression methods and to my surprise, the one with the smallest footprint (ZSTD) also had the quickest lookup times.
I am curious as to how this could be -- from my understanding, there must be some tradeoff when doing a more advanced method of compressing similar data. I'd at the very least expect that the newest compression method would be on par with ZLIB but with a smaller footprint.
And now for the results. As explained above, I inserted 100m rows of 'random' data and timed ingestion, the directory size of the data dir, and lookup times. Here are the results:
| NO_COMPRESSION | COMPRESSION_ZLIB | COMPRESSION_ZSTD | |
|---|---|---|---|
| Search (ms) | 32644 | 20666 | 11475 |
| Agreggation (ms) | 30261 | 13302 | 8402 |
| Storage (gridstore) | 11968312 (17GB) | 7162824 (6.9GB) | 6519520 (6.3GB) |
| Storage (/data) | 17568708 (12GB) | 1141152 (1.1GB) | 1140384 (1.1GB) |
| Insert (m:ss.mmm) | 14:42.452 | 15:02.748 | 15:05.404 |
If anybody has any insight into this perplexing compression issue, please share any expertise.
