We have an influxdb VM that is constantly under 100% swap. Even if we restart the VM, the swap usage reaches 100% in about 20 minutes. However, memory usage is only about 50%. (The VM has 32 CPU Cores and 128 GB of Memory.)
Running free -h:
total used free shared buff/cache available
Mem: 123Gi 70Gi 567Mi 551Mi 52Gi 59Gi
Swap: 9Gi 9Gi 0B
Shows that we have at least 59GB of memory and 100% of the swap is still used.
If we run atop we see that the disk is 100% busy (swap and disk are red)
SWP | tot 10.0G | | free 0.0M | swcac 505.9M
DSK | nvme2n1 | busy 100% | read 33115 | write 527 | discrd 0 | KiB/r 19 | KiB/w 173 | | KiB/d 0 | MBr/s 63.3 | MBw/s 8.9 | avq 88.19 | avio 0.30 ms
This I'm guessing is the constant inflow of data-events.... (But why is read high then?)
Memory and I/O pressure from PSI:
cat /proc/pressure/memory
some avg10=32.65 avg60=32.74 avg300=31.25 total=35534063966
full avg10=32.25 avg60=32.34 avg300=30.87 total=35182532561
cat /proc/pressure/io
some avg10=84.83 avg60=78.83 avg300=78.96 total=70337558807
full avg10=84.38 avg60=78.05 avg300=78.08 total=69619870053
Memory pressure doesn't seem high but IO pressure is.
Running iotop it is clear that the disk activity is from influxdb:
4272 be/3 root 0.00 B/s 94.47 K/s ?unavailable? [jbd2/nvme2n1p1-8]
36921 be/2 vcap 1169.95 K/s 0.00 B/s ?unavailable? influxd -config /var/vcap/jobs/influxdb/config/influxdb.conf -pidfile /var/vcap/sys/run/influxdb/influxdb.pid
36927 be/2 vcap 323.37 K/s 0.00 B/s ?unavailable? influxd -config /var/vcap/jobs/influxdb/config/influxdb.conf -pidfile /var/vcap/sys/run/influxdb/influxdb.pid
36928 be/2 vcap 2038.33 K/s 0.00 B/s ?unavailable? influxd -config /var/vcap/jobs/influxdb/config/influxdb.conf -pidfile /var/vcap/sys/run/influxdb/influxdb.pid
36941 be/2 vcap 1936.59 K/s 0.00 B/s ?unavailable? influxd -config /var/vcap/jobs/influxdb/config/influxdb.conf -pidfile /var/vcap/sys/run/influxdb/influxdb.pid
37020 be/2 vcap 385.14 K/s 0.00 B/s ?unavailable? influxd -config /var/vcap/jobs/influxdb/config/influxdb.conf -pidfile /var/vcap/sys/run/influxdb/influxdb.pid
.
.
.
.
.
.
.
.
(Lots of influx threads)
SAR output
sar -d 10 6
Linux 6.2.0-39-generic (ac2f95dd-14d9-4eed-8e2f-060615e24dce) 03/24/2024 _x86_64_ (32 CPU)
06:45:57 AM DEV tps rkB/s wkB/s dkB/s areq-sz aqu-sz await %util
06:46:07 AM nvme1n1 0.30 12.80 1.60 0.00 48.00 0.00 1.33 0.12
06:46:07 AM nvme0n1 0.30 0.00 3.20 0.00 10.67 0.00 1.00 0.12
06:46:07 AM nvme2n1 3420.80 67438.40 3687.20 0.00 20.79 106.47 31.13 100.00
06:46:07 AM DEV tps rkB/s wkB/s dkB/s areq-sz aqu-sz await %util
06:46:17 AM nvme1n1 1.00 0.00 9.20 0.00 9.20 0.00 0.90 0.16
06:46:17 AM nvme0n1 0.90 16.00 9.60 0.00 28.44 0.00 0.67 0.20
06:46:17 AM nvme2n1 3404.80 68434.40 7868.00 0.00 22.41 102.23 30.03 100.00
06:46:17 AM DEV tps rkB/s wkB/s dkB/s areq-sz aqu-sz await %util
06:46:27 AM nvme1n1 9.70 26.40 20.40 0.00 4.82 0.02 1.69 1.24
06:46:27 AM nvme0n1 0.30 0.00 4.40 0.00 14.67 0.00 0.67 0.08
06:46:27 AM nvme2n1 3215.40 46037.20 12006.40 0.00 18.05 66.12 20.56 100.00
^C
Average: DEV tps rkB/s wkB/s dkB/s areq-sz aqu-sz await %util
Average: nvme1n1 3.67 13.07 10.40 0.00 6.40 0.01 1.61 0.51
Average: nvme0n1 0.50 5.33 5.73 0.00 22.13 0.00 0.73 0.13
Average: nvme2n1 3347.00 60636.67 7853.87 0.00 20.46 91.61 27.37 100.00
Running queries in influxdb:
It seems like this swap issue is even when queries arent running?
> show queries
qid query database duration status
--- ----- -------- -------- ------
265 SHOW QUERIES metrics 53µs running
vmstat output:
vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 32 10485756 541300 8784 108148928 11 140 3563 194 76 217 1 1 58 40 0
0 32 10485756 638500 8764 108060800 0 0 128216 60 5181 3351 0 1 59 40 0
1 31 10485756 505964 8780 108189872 0 0 128252 256 5077 3769 0 1 54 45 0
0 32 10485756 663736 8744 108035424 0 0 128332 0 5047 3327 0 1 50 50 0
0 32 10485756 536476 8752 108164376 0 0 127776 24 4087 3335 0 0 53 46 0
/proc/meminfo is
MemTotal: 129202084 kB
MemFree: 486060 kB
MemAvailable: 71279440 kB
Buffers: 24116 kB
Cached: 59442056 kB
SwapCached: 489676 kB
Active: 51318648 kB
Inactive: 75364416 kB
Active(anon): 27646572 kB
Inactive(anon): 28055976 kB
Active(file): 23672076 kB
Inactive(file): 47308440 kB
Unevictable: 24 kB
Mlocked: 24 kB
SwapTotal: 10485756 kB
SwapFree: 4 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 102236 kB
Writeback: 6156 kB
AnonPages: 66728116 kB
Mapped: 43055064 kB
Shmem: 127816 kB
KReclaimable: 855024 kB
Slab: 971400 kB
SReclaimable: 855024 kB
SUnreclaim: 116376 kB
KernelStack: 10976 kB
PageTables: 747920 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 75086796 kB
Committed_AS: 95698296 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 151392 kB
VmallocChunk: 0 kB
Percpu: 17920 kB
HardwareCorrupted: 0 kB
AnonHugePages: 7997440 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 202656 kB
DirectMap2M: 6404096 kB
DirectMap1G: 124780544 kB
I am also adding some excerpts of the pmap -x command:
Address Kbytes RSS Dirty Mode Mapping
0000000000400000 15232 3684 0 r-x-- influxd
00000000012e0000 31428 6552 0 r---- influxd
0000000003191000 4668 4380 396 rw--- influxd
0000000003620000 180 92 92 rw--- [ anon ]
0000000004436000 132 0 0 rw--- [ anon ]
000000c000000000 16384 9864 9864 rw--- [ anon ]
000000c001000000 47104 28172 28172 rw--- [ anon ]
000000c003e00000 6144 5016 5016 rw--- [ anon ]
000000c004400000 2048 1616 1616 rw--- [ anon ]
000000c004600000 2048 1620 1620 rw--- [ anon ]
.
.
.
000000c033a00000 155648 120028 120028 rw--- [ anon ]
000000c03d200000 8192 8192 8192 rw--- [ anon ]
000000c03da00000 114688 92768 92768 rw--- [ anon ]
.
.
.
000000c07d000000 270336 234948 234948 rw--- [ anon ]
.
000000cecc000000 176128 174080 174080 rw--- [ anon ]
.
.
000000ced8e00000 2048 2048 2048 rw--- [ anon ]
000000ced9000000 137216 135168 135168 rw--- [ anon ]
.
.
(Towrds the lower)
.
.
00007fa61fdef000 2116 2044 2044 rw--- [ anon ]
00007fa620000000 9664 0 0 r--s- L3-00000023.tsi
00007fa620a00000 40048 0 0 r--s- L5-00000032.tsi
00007fa623200000 40212 0 0 r--s- L5-00000032.tsi
.
.
.
00007fa6a2c00000 9772 0 0 r--s- L3-00000023.tsi
00007fa6a3600000 2098160 0 0 r--s- 000024596-000000002.tsm
00007fa723800000 9920 0 0 r--s- L3-00000023.tsi
00007fa724200000 615764 0 0 r--s- 000024596-000000005.tsm
00007fa749c00000 2100756 0 0 r--s- 000024596-000000004.tsm
00007fa7ca000000 9768 0 0 r--s- L3-00000023.tsi
.
.
.
00007fce82403000 28660 5412 5412 rw--- [ anon ]
00007fce84000000 4194308 2575504 0 r--s- index
00007fcf84001000 4 0 0 r--s- L0-00000001.tsl
00007fcf84002000 4 0 0 r--s- L0-00000001.tsl
00007fcf84003000 4 0 0 r--s- L0-00000001.tsl
.
.
00007fcfc48f7000 1060 0 0 r--s- L0-00000002.tsl
00007fcfc4a00000 262144 35444 0 r--s- 0046
00007fcfd4a00000 2048 1988 1988 rw--- [ anon ]
00007fcfd4c00000 262144 35948 0 r--s- 0045
.
.
00007fd055a00000 4 0 0 r--s- L0-00000001.tsl
00007fd055a01000 4 0 0 r--s- L0-00000001.tsl
00007fd055a02000 4 0 0 r--s- L0-00000001.tsl
.
.
00007fd065c0f000 960 924 924 rw--- [ anon ]
00007fd065cff000 1028 0 0 r--s- L0-00000005.tsl
00007fd065e00000 262144 31952 0 r--s- 003c
.
.
00007fda27fee000 8192 8 8 rw--- [ anon ]
00007fda287ee000 4 0 0 ----- [ anon ]
00007fda287ef000 43076 1164 1164 rw--- [ anon ]
00007fda2b200000 160 160 0 r---- libc.so.6
00007fda2b228000 1620 780 0 r-x-- libc.so.6
00007fda2b3bd000 352 64 0 r---- libc.so.6
00007fda2b415000 16 0 0 r---- libc.so.6
00007fda2b419000 8 0 0 rw--- libc.so.6
00007fda2b41b000 52 0 0 rw--- [ anon ]
00007fda2b428000 4 0 0 r--s- L0-00000001.tsl
00007fda2b429000 4 0 0 r--s- L0-00000001.tsl
00007fda2b42a000 4 0 0 r--s- L0-00000001.tsl
00007fda2b42b000 4 0 0 r--s- L0-00000001.tsl
00007fda2b42c000 4 0 0 r--s- L0-00000001.tsl
00007fda2b42d000 4 0 0 r--s- L0-00000001.tsl
00007fda2b42e000 452 452 452 rw--- [ anon ]
00007fda2b49f000 16 0 0 r--s- L0-00000018.tsl
00007fda2b4af000 268 112 112 rw--- [ anon ]
00007fda2b4f2000 4 0 0 r---- libpthread.so.0
00007fda2b4f3000 4 0 0 r-x-- libpthread.so.0
00007fda2b4f4000 4 0 0 r---- libpthread.so.0
00007fda2b4f5000 4 0 0 r---- libpthread.so.0
00007fda2b4f6000 4 0 0 rw--- libpthread.so.0
00007fda2b4f7000 4 0 0 r--s- L0-00000001.tsl
00007fda2b4f8000 8 0 0 r--s- L0-00000001.tsl
00007fda2b4fa000 4 0 0 r--s- L0-00000001.tsl
00007fda2b4fb000 8 0 0 rw--- [ anon ]
00007fda2b4fd000 8 8 0 r---- ld-linux-x86-64.so.2
00007fda2b4ff000 168 168 0 r-x-- ld-linux-x86-64.so.2
00007fda2b529000 44 40 0 r---- ld-linux-x86-64.so.2
00007fda2b534000 4 0 0 r--s- L0-00000001.tsl
00007fda2b535000 8 0 0 r---- ld-linux-x86-64.so.2
00007fda2b537000 8 0 0 rw--- ld-linux-x86-64.so.2
00007fff74913000 132 12 12 rw--- [ stack ]
00007fff7499b000 16 0 0 r---- [ anon ]
00007fff7499f000 8 4 0 r-x-- [ anon ]
ffffffffff600000 4 0 0 --x-- [ anon ]
---------------- ------- ------- -------
total kB 534464172 112696540 74590512
The series cardinality is 252390866 (So is the VM size in-adequate?)
VM details:
Influxdb: 1.8.10
CPU Count: 32
Memory: 128 GB
Disk: 1TB (Only 50% used)
AWS VM Type: m6a.8xlarge (32CPU,128GB Memory)... EBS IO is 10GBps based on this https://aws.amazon.com/ec2/instance-types/m6a/
Linux Version: Linux 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
The swapiness of the VM is 60 (default). (What does this mean? Initially, I thought it was a percentage but apparently, it's an absolute number?)
How do we debug this disk usage and also if the IOPS has reached its limits? And what is causing so much read rather than write?
Update Vm size was increased to 2x in memory:
Observations
vmstat:
and meminfo:
MemFree: 9436328 kB
MemAvailable: 246346788 kB
Buffers: 829708 kB
Cached: 171495864 kB
SwapCached: 124960 kB
Active: 78087852 kB
Inactive: 167324320 kB
Active(anon): 6396424 kB
Inactive(anon): 2389588 kB
Active(file): 71691428 kB
Inactive(file): 164934732 kB
vmstat
vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 0 2379520 10251664 835112 172756112 1 2 196 596 7 4 2 0 93 5 0
Disk usage atop has significantly reduced to 20%
DSK | nvme2n1 | busy 20% | read 51 | write 2103 | discrd 0 | KiB/r 18 | KiB/w 165 | | KiB/d 0 | MBr/s 0.1 | MBw/s 34.0 | avq 13.95 | avio 0.94 ms