First of all, here is what my infra looks like and how it works :
Controller1/2 and Compute1/2 both runs VM and are linked to each other via a VPN. On each server, the br-ext interface is plugged with the ext interface (the vpn one). All server are able to communicate together and so are the VM on their private interfaces.
I have two ubuntu 16.04 router (the 2 box with ETH3 and BR-ext ), only one is active at a time (the second is a failover with keepalived) and own at the same time, the public subnet (51.38.X.Y/27) and the IP 10.38.166.190 (that act as a gateway for all VM).
I use Iptables and Iproute2 to allow traffic to let's say 51.38.X.YYA to reach 10.38.X.YYA, and from 10.38.X.YYA to go through 51.38.X.YYA.
From one of the VM, I can reach the outside without issue and if I run a curl ifconfig.co i'm prompted with the public IP which is the behavior I want.
My Issue :
If I try to reach VM2 from VM1 using it's public IP, it doesn't work at all.
I will take two VM to illustrate my issue and will give all the configuration about it :
VM1 : 10.38.166.167 / 51.38.166.167 VM2 : 10.38.166.166 / 51.38.166.166
What I've done so far :
On router1 :
ETH1 = Main interface (management) ETH3 = Interface that hold all IP and NAT to VM br-ext = bridge that contain the VPN interface ext = VPN interface (plugged on the bridge br-ext)
[root@network3] ~# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:19:3e:41 brd ff:ff:ff:ff:ff:ff
inet 51.38.166.162/32 brd 51.38.x.162 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe19:3e41/64 scope link
valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:72:94:cb brd ff:ff:ff:ff:ff:ff
inet 51.38.166.163/32 brd 51.38.x.163 scope global eth3
valid_lft forever preferred_lft forever
inet 51.38.166.166/32 scope global eth3
valid_lft forever preferred_lft forever
inet 51.38.166.167/32 scope global eth3
valid_lft forever preferred_lft forever
7: br-ext: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether d2:f8:64:36:64:f2 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.103/9 brd 10.127.255.255 scope global br-ext
valid_lft forever preferred_lft forever
inet 10.0.0.120/32 scope global br-ext
valid_lft forever preferred_lft forever
inet 10.38.166.190/32 scope global br-ext
valid_lft forever preferred_lft forever
10: ext: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-ext state UNKNOWN group default qlen 1000
link/ether d2:f8:64:36:64:f2 brd ff:ff:ff:ff:ff:ff
I've set a bunch of route to allow Packet coming from outside on 51.38.x.160/27 to be routed on 10.38.x.y/27
[root@network3] ~# ip ru l | grep "lookup 103"
9997: from 10.38.x.167 lookup 103
9998: from 10.38.x.166 lookup 103
# rules to tells that each IP of the /27 need to use table 103
10301: from 51.38.166.163 lookup 103
10302: from all to 51.38.166.163 lookup 103
10307: from 51.38.166.166 lookup 103
10308: from all to 51.38.166.166 lookup 103
10309: from 51.38.166.167 lookup 103
10310: from all to 51.38.166.167 lookup 103
[root@network3] ~# ip r s table 103
default via 51.38.166.190 dev eth3
51.38.166.160/27 dev eth3 scope link
[root@network3] ~# ip r s
default via 51.38.166.190 dev eth1 onlink
10.0.0.0/9 dev br-ext proto kernel scope link src 10.0.0.103
172.16.0.0/16 dev br-manag proto kernel scope link src 172.16.0.103
My iptables looks like follow :
[root@network3] ~# iptables -nvL
Chain INPUT (policy ACCEPT 21334 packets, 1015K bytes)
pkts bytes target prot opt in out source destination
91877 4376K ACCEPT icmp -- * * 0.0.0.0/0 0.0.0.0/0 /* 000 accept all icmp */
18 1564 ACCEPT all -- lo * 0.0.0.0/0 0.0.0.0/0 /* 001 accept all to lo interface */
0 0 REJECT all -- !lo * 0.0.0.0/0 127.0.0.0/8 /* 002 reject local traffic not on loopback interface */ reject-with icmp-port-unreachable
343K 123M ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 state ESTABLISHED /* 003 accept related established rules */
243 14472 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 multiport dports 1022 /* 030 allow SSH */
481M 42G ACCEPT udp -- * * 0.0.0.0/0 0.0.0.0/0 multiport dports 3210:3213 /* 031 allow VPNtunnel */
4155 241K DROP all -- eth0 * 0.0.0.0/0 0.0.0.0/0 /* 999 drop all */
Chain FORWARD (policy ACCEPT 98325 packets, 8874K bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 964M packets, 93G bytes)
pkts bytes target prot opt in out source destination
Iptables NAT rules
[root@network3] ~# iptables -t nat -nvL --line
Chain PREROUTING (policy ACCEPT 156K packets, 6455K bytes)
num pkts bytes target prot opt in out source destination
31 11228 771K DNAT all -- * * 0.0.0.0/0 51.38.166.166 /* 112 NAT for 10.38.166.166 */ to:10.38.166.166
32 11624 809K DNAT all -- * * 0.0.0.0/0 51.38.166.167 /* 112 NAT for 10.38.166.167 */ to:10.38.166.167
Chain INPUT (policy ACCEPT 85077 packets, 3527K bytes)
num pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 16505 packets, 1294K bytes)
num pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 105K packets, 4357K bytes)
num pkts bytes target prot opt in out source destination destination
31 17 1196 SNAT all -- * * 10.38.166.166 0.0.0.0/0 to:51.38.166.166
32 8 549 SNAT all -- * * 10.38.166.167 0.0.0.0/0 to:51.38.166.167
I also inserted somes rules in the RAW tables to help me track packets :
[root@network3] ~# iptables -t raw -nvL
Chain PREROUTING (policy ACCEPT 3765 packets, 227K bytes)
pkts bytes target prot opt in out source destination
0 0 TRACE all -- * * 51.38.166.167 0.0.0.0/0
185 12988 TRACE all -- * * 0.0.0.0/0 51.38.166.167
Chain OUTPUT (policy ACCEPT 7941 packets, 837K bytes)
pkts bytes target prot opt in out source destination
Testing from VM1 :
ubuntu@test-1:~$ ip a l dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:51:0a:0b brd ff:ff:ff:ff:ff:ff
inet 10.38.166.167/24 brd 10.38.166.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe51:a0b/64 scope link
valid_lft forever preferred_lft forever
ubuntu@test-1:~$ curl ifconfig.co
51.38.166.167
ubuntu@test-1:~$ ping 51.38.166.166 -c 4
PING 51.38.166.166 (51.38.166.166) 56(84) bytes of data.
--- 51.38.166.166 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3031ms
Testing from VM2 :
ubuntu@test-2:~$ ip a l dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
link/ether fa:16:3e:9d:79:ce brd ff:ff:ff:ff:ff:ff
inet 10.38.166.166/24 brd 10.38.166.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe9d:79ce/64 scope link
valid_lft forever preferred_lft forever
ubuntu@test-2:~$ curl ifconfig.co
51.38.166.166
ubuntu@test-2:~$ ping 51.38.166.167 -c 4
PING 51.38.166.167 (51.38.166.167) 56(84) bytes of data.
--- 51.38.166.167 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3023ms
LOGS from network3 :
[root@network3] ~# tail -f /var/log/kern.log | grep "SRC=10.38.166.166 DST=51.38.166.167"
Jul 5 11:58:12 network3 kernel: [79540.314496] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49094 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=57
Jul 5 11:58:13 network3 kernel: [79541.322501] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul 5 11:58:13 network3 kernel: [79541.322543] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul 5 11:58:13 network3 kernel: [79541.322574] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul 5 11:58:14 network3 kernel: [79542.330582] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul 5 11:58:14 network3 kernel: [79542.330615] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul 5 11:58:14 network3 kernel: [79542.330639] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
^C
As the ID do not change for a given SEQ, I can search anything in log regarding this ID/SEQ :
[root@network3] ~# grep "ID=49367" /var/log/kern.log
Jul 5 11:58:14 network3 kernel: [79542.330582] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul 5 11:58:14 network3 kernel: [79542.330615] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul 5 11:58:14 network3 kernel: [79542.330639] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
If I refer to this diagram : http://inai.de/images/nf-packet-flow.png
It's seems to be stuck on the routing decision. (I've discard the possibility to be stucked in the bridging decision, because it's exactly the same behavior if I do the exact same thing without any bridge involved).
The other possibility would be that it match the NAT prerouting rules 32 but doesn't apply it, but I can't figure why.
any clue of something I'm missing in that case ?
