1

I've run a kafka cluster(3 kafka instances + zookeeper) via docker compose on my server recently(July 2024) with all latest tag for docker images.
I've setup jmx_exporter and node-exporter for monitoring the kafka cluster and the server host, and also the prometheus and grafana.
By July, there're over 1.2K clients (99% are produce, and 1% are consume) connected on in and the application functions now are all fine, but as the clients are keep increasing, I just concern when the server host will reach its limit.

These are the charts I currently have, but actually I'm not quite sure what I can get from them:

node exporter for watch the server host enter image description here

top from server host enter image description here

jmx_exporter for watch the inside of kafka instances overall grafana snapshot

topic related

enter image description here

JVM related

result of vmstat 1 5 in one of docker kafka instance

root@ecs-01796520-002:~# docker exec -it 70314ef51864 bash
[appuser@70314ef51864 ~]$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
12  0      0 723444 169068 8917012    0    0    60  1535    5    8 50 22 26  2  1
15  0      0 722616 169080 8917808    0    0     4  2760 41451 36304 52 21 24  1  1
 8  0      0 721140 169108 8919176    0    0    84  4480 40002 33518 56 20 23  1  0
12  0      0 719684 169108 8920820    0    0   152     8 40462 35086 56 19 25  0  0
 8  0      0 716452 169140 8921556    0    0     0  4468 40856 35174 55 22 23  1  0

Questions:

  1. how many more clients my server can maintain by given performance charts?
  2. what is the suggested solution to scale for support more clients? like 5K?
Shawn
  • 141

1 Answers1

1

In short: your kafka system is under serious load.

  1. What I see the processors are loaded. See 1st image, CPU load. And from vmstat idle is ~25%. The good point is you do not have blocked processes and just small time of waiting. So from the point of CPU the system is balanced and good loaded.

  2. I see in vmstat good amount of cs and in which mean your software do a lot of system calls/network communication. The value is high but not unseen.

  3. There is not so much disk activity and you have good amount of available memory. So about this you are OK.

From the above point: you system can handle about 150 more clients (but will be on the edge of performance). But to handle 5K clients (producers+subscribers) you need to triple the kafka hosts in clusters. Also as recommendation you should check and balance the brokers. And check if you have (probably yes) replication between topics also to balance it. Of course do not keep two replicas in the same instance.

P.S. about JVM: first optimize and balance on OS level, then step in optimizing JVM.

P.P.S. We can't estimate if this load is normal, it depend on too many parameters.

About my suggestion for triple the cluster, I do not know what is the load (CPU and memory) of the host machine. But if its close to the numbers above you should add at least one physical host.

Romeo Ninov
  • 6,677