How to improve this synchronous job execution design?

Question

Here is the design step by step:

User opens a webpage
Inputs few details in the form
Click submit
Request goes to API server
API server creates a pod in Kubernetes
Pod executes a script and stores the output in shared storage
Another pod keeps running and attached to the shared storage
API server waits for pod execution to complete
API server copies the file from Kubernetes API using the always running pod
Parses the file returns result to UI
User sees loading screen till above all steps to complete

Main challenge with this pattern is auto scaling. When pod goes to pending state because of no capacity available user needs to wait for 2-5 minutes for the auto scaling kick in and complete the pod execution.

score 1 · Answer 1 · answered Mar 26 '23 at 15:38

On the face of it this is a terrible design.

User opens a webpage
Inputs few details in the form
Click submit
Request goes to API server

fine so far
API server creates a pod in Kubernetes

This seems pointless. Have a worker process continually running and listening to a queue.
Pod executes a script and stores the output in shared storage

Instead of shared storage, post the result back to another queue or database.
Another pod keeps running and attached to the shared storage

You don't need a separate pod for everything.
API server waits for pod execution to complete

If the API server is waiting the whole time, just get it to do the work. The API should return immediately after creating the offline job
API server copies the file from Kubernetes API using the always running pod

?? why pass this data around so much? why have a pod for everything? why files??
Parses the file returns result to UI

why parse the file only to reserialise it again?
User sees loading screen till above all steps to complete

what if they kill the browser or click refresh or back?

However, you don't really say anything about why you are using this design over:

Just the API doing all the work.

As you wait for the result anyway, there's no benefit for all this sending stuff around
A queue + worker processes

Have the api write to a queue and return immediate "Processing Your Message"

Have a constantly running worker app pick up from the queue and do the work. Post back to another queue when done.

have the API listen to the "done" queue and push the messages back to the correct user via websocket.

Also, there is a scenario where you design might be good, or at least the only thing that would work. Thats where you have to run some third party application which doesn't like running multiple instances at the same time or starting and stopping cleanly (i'm looking at you microsoft excel)

In that kind of scenario you need to effectively spin up a new clean machine with no left over state, write and read files because that's the only thing the application understands and then clean up everything afterwards. But even then your unit is a container, not a pod?

How to improve this synchronous job execution design?

1 Answers1