1

We have a hive installation that has MariaDB as metastore database. MariaDB has around ~250 GB metadata with ~100GB indexes. It becomes terribly slow during the peak load of 40-60K QPS.

Looking from the community to share similar experiences if any and what they did to scale out the meta store or fix it?

Some of the ideas i am looking at currently are:

  • Application Caching at HMS level: Didn't found out of box capability in my current v2.0.1. Is there support for it in higher versions?
  • Read replicas and routing SELECTS to it: Facing some failure if there is replication lag and i am trying to read back the value.
  • Horizontal sharding of Mysql: founding it way complex. Saw some recommendations of TiDB but not sure of its experience.
Shakti Garg
  • 111
  • 1

1 Answers1

2

An answer to this would greatly depend on things such as the type of workload (percentage of reads vs writes, etc.), it also depends on your hardware. From your opening question it seems like the data is time critical (once data is committed it should be read).

There are several approaches you can take including:

  • Application level caching, or a caching layer such as Memcached / Redis
  • MariaDB Galera cluster with a load balancer for the read loads
  • Sharding such as Spider engine
  • Tuning your data / schemas / database for the workload

If you don't have the resources to try these things out on a simulated identical setup then I recommend getting some kind of MariaDB consulting services involved. These services will help you make the best decisions for your specific workload. There are many companies out there who will provide this (including MariaDB Corporation themselves).

Edit: Declaration requested... I declare that I am not affiliated with MariaDB Corporation. I work for the MariaDB Foundation which is an independent non-profit entity.

LinuxJedi
  • 46
  • 3