How to calculate server capacity?

Question

I am trying to answer a question:

Do we need more worker machines to handle current load?

We have a variable number of jobs coming in and each job takes different time. Here is a snapshot of our system for a period 12 hours.

But the number of jobs in system changes based on day and time.

How can I calculate throughput of our system and build monitoring around max load that our system can handle?

score 1 · Answer 1 · answered Sep 25 '18 at 20:18

(N.B. this answer is, in the spirit of the question, high-level and not very concrete...)

How can I calculate throughput of our system

If by "calculate" you mean to take some data points, dry run some formulas and get a pretty close approximation to your throughput - that is pretty much impossible, unless you have a lot more information - which may also include pretty random parts, hence quite hard to do theoretically.

If you mean "measure", then it could be as simple as checking your logfiles and counting lines per time unit.

and build monitoring around max load that our system can handle?

Unless you're NASA, that's probably a case for trial and error. Run your system(s) and see how much troughput you get. Increase worker nodes. See what happens.

If you already know the behaviour of your overal system - i.e., whether it scales linearly, whether there are bottlenecks like databases, lock contention and such, then you can take shortcuts or good guesses.

That's one of the reasons for doing the container-based scaling we do these days - you can throw a few more workers at it relatively easily.

How you actually (technically) do that depends on what platform you are using. AWS, Kubernetes/OpenShift etc. come with techniques or settings do do it automatically for you. I assume those work mainly with the metric of "free workers" - trying to get the "free:busy worker" ratio into some target corridor so that every new request has a very high likelihood to hit a free worker, at any time, thus getting (theoretical) constant time for each request.

score 0 · Answer 2 · edited Nov 06 '18 at 12:12

0

KPI (key performance indicator) or software metrics could help. For starters after a bit googling, Kubernetes has self-healing mechanism in which whether thresholds could be set up to heal after exceeding some measures is something I never tried. I'd love to know if it is doable though.

Read: Monitoring Kubernetes performance metrics.

edited Nov 06 '18 at 12:12

kenorb

8,011
14
43
80

answered Aug 25 '18 at 10:12

hakkican

281
1
2
11

How to calculate server capacity?

Do we need more worker machines to handle current load?

2 Answers2