5

In the startup log entries indicate the autovacuum not working. I query the pg_stat_user_tables table and the last_vacuum and last_autovacuum columns are empty in spite of the vacuum query I ran just before. Connecting pgadmin to the database gives the indication that the vacuum is not working.

I am using postgresql on two Ubuntu Azure VM's. One VM is set up to be the master, the second is the replicated database by means of streaming. Roughly discribed in https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-on-postgresql-on-an-ubuntu-12-04-vps.

All seems so work except for the autovacuum. During the startup the following error is logged:

LOG:  test message did not get through on socket for statistics collector
LOG:  disabling statistics collector for lack of working socket
WARNING:  autovacuum not started because of misconfiguration
HINT:  Enable the "track_counts" option.
LOG:  database system was shut down at 2017-01-19 14:07:13 UTC
DEBUG:  checkpoint record is at 38/F6000028

In the postgresql.config I use the following settings:

track_counts = on  
autovacuum = on
log_autovacuum_min_duration = 200 
autovacuum_max_workers = 1  
autovacuum_naptime =960
autovacuum_vacuum_threshold = 128 
autovacuum_analyze_threshold = 256

A query (select * from pg_stat_user_tables) on the database to find the last (auto)vacuum gives empty colums for the last (auto)vacuum in stead of an datetime. Were just before I ran the VACUUM FULL VERBOSE; and this gave me vacuum results.

If I query for the vacuum settings with:

select *
from pg_settings 
where name like 'autovacuum%'

This is the result:

"autovacuum";"on"<br />
"autovacuum_analyze_scale_factor";"0.1"
"autovacuum_analyze_threshold";"256"
"autovacuum_freeze_max_age";"200000000"
"autovacuum_max_workers";"1"<br />
"autovacuum_multixact_freeze_max_age";"400000000"
"autovacuum_naptime";"960"<br />
"autovacuum_vacuum_cost_delay";"20"
"autovacuum_vacuum_cost_limit";"-1"
"autovacuum_vacuum_scale_factor";"0.2"
"autovacuum_vacuum_threshold";"128"
"autovacuum_work_mem";"-1"

These are the 'track_' results:

"track_activities";"on"
"track_activity_query_size";"1024"
"track_commit_timestamp";"off"
"track_counts";"off"
"track_functions";"none"
"track_io_timing";"off"

The pg_hba.conf (without the replication and network/user settings) looks like this:

local   all             all                                     trust
host    all             all             localhost               trust
host    all             all             10.1.1.5/32             md5
host    all             all             127.0.0.1/32            md5
host    all             all             0.0.0.0 0.0.0.0         md5

the /etc/hosts:

127.0.0.1       localhost
127.0.1.1       ubuntu
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

This is the result of 'netstat -ant|grep 5432' It if cleaned up and formatted.

User@Machine:/datadrive/log/postgresql/pg_log$ netstat -ant|grep 5432
tcp        0      0 0.0.0.0:5432            0.0.0.0:*               LISTEN
tcp       39      0 InternIpMaster:5432           InternIpSlave:36338          ESTABLISHED
tcp        0      0 InternIpMaster:5432           IpJob:63814     TIME_WAIT
tcp        0      0 InternIpMaster:5432           IpJob:22192      TIME_WAIT
tcp        0      0 InternIpMaster:5432           IpJob:47729      TIME_WAIT
tcp        0      0 InternIpMaster:5432           IpJob:55663      TIME_WAIT
tcp6       0      0 :::5432                 :::*                    LISTEN

I don't expect the autovacuum needed work yet because of the

So during startup the track_counts are disabled on runtime.

I have been looking for solutions changing the iptables. Without any iptable rules it will not work. I have connected to localhost as a host. I have changed the firewall settings in Azure. I opened the 5432 to access the vm from all ip's. I am able to access the database from other other systems. I have reset the conf to the default with only replication changes. I restarted the service many times.

What am i missing?

Bart Dirks
  • 91
  • 1
  • 9

3 Answers3

3

You want to fix this:

LOG: test message did not get through on socket for statistics collector
LOG: disabling statistics collector for lack of working socket

The stats collector expects UDP packets from localhost. Given than localhost looks fine in your /etc/hosts (specifically it does not resolve to IPv6) the next more plausible explanation is that there's a firewall filtering these packets.

Related: Problem in creating UDP sockets solved with: Found and resolved the problem in creating UDP sockets. It was because of the OS firewall (iptables) restricting in creating UDP sockets.

1

According to your link,You should now be able to ssh freely between your two servers as the postgres user.So, you need set up the trusted relation for the postgres user from master to the slave and slave to the master.

You could use ssh-keygen to create a pair of key with blank password.

shui@shui:~$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/shui/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/shui/.ssh/id_rsa. Your public key has been saved in /home/shui/.ssh/id_rsa.pub. The key fingerprint is: SHA256:mCyBHNLeEdCH2VqBjhtOC8njVLSXnjU7V9GbufK+hlE shui@shui The key's randomart image is: +---[RSA 2048]----+ |..++.*.. .. | | o.+B = .. | |.o+=.B o . + | |o+= *oooo . E | |o+.+.o+oS. . . | | .+ . o o . | | = | | . o | | oo. | +----[SHA256]-----+ More information please refer to this link.

Also, you need open port 5432 on Azure NSG.

1

I want to elaborate on the answer @Daniel gave and the solution to my problem.

I had set up the iptables in order to get acces to postgresql like this:

sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -A OUTPUT -o lo -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 5432 -m state --state NEW,ESTABLISHED -j ACCEPT
sudo iptables -A INPUT -j DROP

I assumed this was plenty. However when i used sudo iptables --flush and restarted the postgres server the error disabling statistics collector for lack of working socket was gone.

I also used iptraf to investigate the traffic (sudo apt-get install iptraf sudo iptraf). I noticed that a traffic originated on the ip local (subnet) address of the server but on different ports. This is the traffic on the slave machine (without the azure traffic).

SubnetIpSlave:22
SubnetIpSlave:45622
SubnetIpSlave:44770
SubnetIpSlave:48948
SubnetIpMaster:5432

I assume that this traffic is blocked by the iptables for it is not going via the loopback. Therefore I cleaned the iptables. This is the result:

sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -A OUTPUT -o lo -j ACCEPT
sudo iptables -A INPUT -p icmp -j ACCEPT
sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 5432 -j ACCEPT
sudo iptables -A INPUT -s 10.1.1.0/24 -j ACCEPT
sudo iptables -A INPUT -j DROP

I included the subnet. I think this is what makes it work, for the SubnetIpSlave and SubnetIpMaster are in this range. I probably am allowed to remove the ESTABLISHED,RELATED rule.

The log looks like it should:

2017-01-24 09:19:38 UTC [1482-1] LOG:  database system was shut down in recovery at 2017-01-24 09:17:41 UTC
2017-01-24 09:19:38 UTC [1483-1] [unknown]@[unknown] LOG:  incomplete startup packet
2017-01-24 09:19:38 UTC [1482-2] LOG:  entering standby mode
2017-01-24 09:19:38 UTC [1482-3] DEBUG:  checkpoint record is at 5D/F2042CA8

I am happy ;)

Bart Dirks
  • 91
  • 1
  • 9