0

We have installed torque on a dual Xeon (26 core, 52 available in hyperthreading). The node is configured with np=104. If I launch a MPI calculation in command line, I get near 100% cpu usage :

%Cpu(s): 53.9 us, 44.6 sy, 0.0 ni, 1.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

However, if I launch the same calculation with this torque submit file :

#!/bin/bash

#PBS -l walltime=20:00:00:00

#PBS -l nodes=1:ppn=104

#PBS -q batch

#PBS -N QE_test

cd $PBS_O_WORKDIR

/usr/lib64/openmpi/bin/mpirun -np 104 /opt/qe-6.3/bin/pw.x -inp scf.in > scf.out

The cpu usage is about 50% :

%Cpu(s): 32.5 us, 22.9 sy, 0.0 ni, 44.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

Do you have an idea why?

pbsnodes -a

servername

state = free

np = 104

ntype = cluster

status = rectime=1540890927,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=52,physmem=199919700kb,availmem=193132384kb,totmem=199919700kb,idletime=343335,nusers=0,nsessions=0,uname=Linux servername 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64,opsys=linux

mom_service_port = 15002

mom_manager_port = 15003

1 Answers1

2

I solved the problem. First I disabled hypertreading (Disable hyperthreading from within Linux (no access to BIOS)). The mpi in command line show near 100% usage, when it was 50% usage and 50% system with hyperthreading. Secondly, I downgraded torque for a version without numa support (from torque-4.2.10-10.el7.x86_64 to torque-4.2.10-5.el7.x86_64). After that, the pbsnodes -a command show ncpus=52 when it was 26 with numa support. Now I get same result with mpirun -np 52 in torque.