1

Let us assume that we have a pool of some 50 computers with 6 cores and 12 threads each.

If someone plans to use it for intensive astrophysics simulation using all of its logical CPUs (50*12) for 24x7, how long will it be able to sustain without any physical damage? Given simple cooling with ACs and the CPUs come with their own fans. Can there be any performance degradation over time? If yes, what is the solution?

Please note the two main requirements

  1. 100% CPU usage for all CPUs and,
  2. the concern is about continuous running over say, years.

1 Answers1

5

how long will it be able to sustain without any physical damage?

If you buy decent production-quality servers then no, you shouldn't see any damage. In fact there's an argument that you'd see less than servers going between running hot and cold as thermal-shock can damage components more than being on all the time.

Can there be any performance degradation over time?

Not really, not on any solid-state components anyway, I suppose your PSUs might get slightly less efficient, your fans may even degrade a bit as they get covered in dust.

Obviously no amount of planning will stop components failing mid-life, buy if you design your clusters to handle that sort of thing it doesn't have to be business-impacting.

Chopper3
  • 101,808