Massive Simulator Software Architecture

Question

I am looking to perform a large overhaul on a complex simulation system that simulates several instances of several vehicle models in a classroom training environment. For example, 24 students may be running simulations on three different vehicles for maintenance and operation. Instructors will be required to have tablets that can connect to any of the 24 active simulations to control the training scenario.

The primary system will be running on Linux but there are no other requirements for OS and machine specs can be built as needed. Performance of each simulated pass must be able to consistently run at ~10ms intervals with a +-2ms tolerance.

A primary goal is to make this system very modular so that it can be extended and reused by other training facilities with unique vehicles and needs.

My thought was to use a layered architecture (system, business, UI). The definition of each vehicle model can be stored in a database and therefore edited independently by a superuser (modularity/extensibility of the vehicles). Each layer would likely have to read this database to dynamically allocate the resources that particular layer will require.

Originally I planned to use shared memory for the system layer, setting up permissions and authentication for any business layer to attempt to login. The primary simulation business logic would then continually update the vehicle details according to the active data. The instructor interface would have a business layer server that connects to all 24 clients and also logs into the system layer to modify simulation parameters. All inputs from the student and visual outputs would each have a business layer that can login to the shared memory system layer as well. All of these acting as separate applications so that they can be removed/added/extended as needed.

The problem then came when I realized that classes do not work well with shared memory. I would need to serialize every get/set of the shared memory into a flat memory structure. Having not worked with this architecture before, I am unsure if this plan will create a large performance hit. Technically I can dedicate some cores to the primary business layer logic that performs the simulation.

Would using shared memory with a suite of applications be an appropriate way to resolve this system? Would another cross-process communication type such as pipes be more advisable than shared memory? Would it be better to maintain the system and business logic into a single application and simply use mutexes and cross-threading to ensure performance? Am I going about this all wrong?

Berin Loritsch · Answer 1 · 2018-02-08T00:08:11.930

I've worked on exactly 1 simulation application, with looser requirements (i.e. we had to have everything update 1/s). Feel free to take this with a large grain of salt, but it is someplace to start. I think you are going to have difficulty realizing your timing goals with stock operating systems. I can only highlight some lessons learned:

Sometimes the requirements are over-stated
- Example: you have 10ms intervals with activity done in +/- 2ms, it might be good enough to have that overall rate but the real version is more like 100ms with activity done in +/- 20ms. Same rate, but much more reasonable to design for.
- Example:your external clients probably don't have the same near real time requirements as your application
You need to be able to stop processing early, either stop pulling from the queue if you are nearing the end of your window or actually cancel existing work
Depending on what is being simulated, the Actor Model or some similar approach might be the best bet for your overall design

In the similation I was working on, we were modeling communications with signal devices and towers and the effects of devices moving in near real time. Considering the number of devices we needed to simulate we couldn't devote a full thread to each actor (device and tower), but we could have a finite number of threads work against the corpus to process the messages up to the current simulation time-stamp.

The only thing I would recommend is to start with a stable foundation and stress test your simulation. It's those stress tests that will force you to make the hard decisions about where you can trade absolute fidelity with something that gives you the same overall effect. I guarantee you that no human user will notice the difference between 10ms and 100ms.

Any simulation is a trade-off between fidelity with your working model, and something that approximates it.

Also, don't make the decisions on shared memory vs. pipes or sockets initially. Focus on the core behavior.

Something I'm learning with some of my newer endeavors is that sometimes a shared nothing approach provides the best scalability overall. To take an idea from the microservices approach, you would have an architecture where one simulation service provides the entire environment for the high precision timing events. The trick is to partition the overall environment safely into the smaller environments. That can be done per vehicle or per air-space.

We have the following knowns:

Your critical code meets the requirements
Not everything has the strict timing requirements (based on your comments above and below)
Socket communications are slow, even on the same machine
Shared memory can be tricky and still has serialization costs, but limits you to one machine

Essentially, to have several simulation nodes work together you need to package and transmit state to affected nodes at regular intervals. You need that transmission to be fast, and dense. Options are:

named pipes, fairly high speed, high reliability
UDP packets, fairly high speed, packets can be split by routers with a maximum safe size of about 1134 bytes of payload.
TCP packets, slower, but much higher reliability

Communications can be done asynchronously invoking events in the receiving node. In my case we opted for UDP packets since the entire system was intended to be run in one subnet. We also used a binary transmission format like Google protocol buffers to maximize information per byte.

In my project, most simulations could be run on one machine, but the sensor visualization software was typically run remotely.

In this case it was a trade-off of how frequently we updated the external system and the quantity of data. Each node had a pair of sockets open, one for sending and one for receiving.

In most cases we could mathematically project our movement within enough accuracy that course corrections we received would go unnoticed and the simulation was still valid.

But again, an approach like this has to be tested against the known criteria for your system.

Massive Simulator Software Architecture

1 Answers1