4

One of the servers monitored by Zabbix is not reachable. I have no idea why as this works normally with other servers.

  • The zabbix-agent service on the monitored server is running.
  • We have several servers, all monitored by zabbix. In /etc/zabbix/zabbix_agentd.conf I see no difference between this problematic server and another one that works normally.
  • Both the zabbix server and the monitored server (agent-server) are hosted by Amazon.
  • All zabbix monitored servers are linked to a security group with two inbound rules for port 10050 and 10051 for the zabbix-server IP. So incoming requests from the zabbix-server to the zabbix-agents on these servers should be allowed. They work on several servers, but not on this one.
  • The zabbix-server has a different security group, and no rules set for ports 10050 and 10051, so they should be blocked. Iptables returns no rules.
  • I can open a telnet session from the zabbix-server to the agent. It disconnects automatically, but it connects. So I guess the firewall is not the problem.
  • Server: Amazon Linux (Centos like)
  • Installed file: http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-release-2.2-1.el6.noarch.rpm
  • SELinux is disabled on all these agents and on the server.

Agent log after restart of zabbix-agent service

 10939:20151127:093938.268 Starting Zabbix Agent [agent-server.test]. Zabbix 2.2.11 (revision 56693).
 10939:20151127:093938.268 using configuration file: /etc/zabbix/zabbix_agentd.conf
 10942:20151127:093938.269 agent #1 started [listener #1]
 10945:20151127:093938.269 agent #4 started [active checks #1]
 10941:20151127:093938.270 agent #0 started [collector]
 10944:20151127:093938.270 agent #3 started [listener #3]
 10943:20151127:093938.271 agent #2 started [listener #2]
 10945:20151127:141742.930 active check configuration update from [zabbix-server-ip:10051] started to fail 
 (cannot connect to [[zabbix-server-ip]:10051]: [4] Interrupted system call)

When I telnet to the agent-server, then enter agent.version, it returns: ZBXD2.2.11

Contents of /etc/zabbix/zabbix_server.conf (server):

ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
DBName=zabbix
DBUser=zabbix
DBPassword=******
DBSocket=/var/lib/mysql/mysql.sock
SNMPTrapperFile=/var/log/snmptt/snmptt.log
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts

Contents of /etc/zabbix/zabbix_agentd.conf (agent)

PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
EnableRemoteCommands=1
Server=zabbix-server-ip
ListenPort=10050
StartAgents=3
# ServerActive=zabbix-server-ip # commented out
Hostname=server.test
Timeout=3
AllowRoot=1
Include=/etc/zabbix/zabbix_agentd.d/

Netstat on zabbix server

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10051               0.0.0.0:*                   LISTEN      7624/zabbix_server  
tcp        0      0 :::10051                    :::*                        LISTEN      7624/zabbix_server

Netstat on problematic agent

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN      3248/zabbix_agentd  
tcp        0      0 :::10050                    :::*                        LISTEN      3248/zabbix_agentd 

Netstat on working agent

$ sudo netstat -lpn | grep zabbix
tcp        0      0 0.0.0.0:10050               0.0.0.0:*                   LISTEN      24242/zabbix_agentd 
tcp        0      0 :::10050                    :::*                        LISTEN      24242/zabbix_agentd

Active vs passive agent

  • I've opened port 10051 on the server for the problematic agent IP.
  • Telnet shows that works, from agent to server.
  • I've activated the ActiveServer option with the zabbix-server-ip as value. The error mesage is gone in the log after restarting the agent.
  • The problem is still there...

Next try:

  • I've did the same for a working agent, can telnet from agent to server.
  • ActiveServer is set with the zabbix-server-ip, agent is restarted
  • StartAgents is set to 0, to force using the active agent.
  • Zabbix reports that this server is unreachable...
  • Then I reset to passive.

All in all, the active mode may have been set in the agent config on several servers, it has never worked. All reports are from passive agents.

Agent Interfaces

  • Opening via Monitoring > Latest data, selecting host=all, I click the server name, and choose Host Inventory
  • The working agent displays its own IP address.
  • The problematic agent displays the zabbix-server-ip.

I don't know why this happens, but it seems strange.

What can cause this connection problem? How can I reconnect the server with the agent?

Solution

It turns out that the IP address set in the host configuration (via the web interface) was that of the zabbix-server itself. This should of course be the address of the agent-server.

SPRBRN
  • 590

3 Answers3

2

How about current setting of SELinux and iptables on agent box? Can you from agent telnet to server via port 10051?

You can try to check the connectivity between boxes using tcpdump on agent: tcpdump -i your_interface tcp port 10050. Using this you can see the incoming/outgoing packets.

cuongnv23
  • 230
  • 3
  • 9
2

I think, you need to understand the active and passive mode of connection for zabbix to resolve the problem. Here from zabbix documentation:

Passive and active checks

Zabbix agents can perform passive and active checks.

In a passive check the agent responds to a data request. Zabbix server (or proxy) asks for data, for example, CPU load, and Zabbix agent sends back the result.

Active checks require more complex processing. The agent must first retrieve a list of items from Zabbix server for independent processing. Then it will periodically send new values to the server.

Now the active mode to work, you need to have port 10051 open at Zabbix server, so that the agents from clients can connect to it. From the error what you are getting, this is the problem:

10945:20151127:141742.930 active check configuration update from [zabbix-server-ip:10051] started to fail (cannot connect to [[zabbix-server-ip]:10051]: [4] Interrupted system call)

The tests you have done is about connection from the Zabbix server to the client and it seems working without any problem. But thats not enough for the active mode to function. The connection from the client agent to the server on port 10051 is not working in your case and you need to focus on that.

The information you have provided is misleading:

The zabbix-server has a different security group, and no rules set for ports 10050 and 10051, so they should be blocked. Iptables returns no rules.

The above about the port can not be true, as you are using active mode. The server must have port 10051 open for the clients to connect or you have to use passive mode.

So, please check the nececessary firewall rules in between and make sure the client/agent can reach the server on this port. I am sure the other agent ( on the other working server), can reach the Zabbix server on port 10051.

Diamond
  • 9,291
1

It turns out that the IP address set in the host configuration (via the web interface) was that of the zabbix-server itself. This should of course be the address of the agent-server.

SPRBRN
  • 590