1

I have a Linux server that is an OpenVPN endpoint, but also hosts a webserver. When my client connects to the server address for the webserver, the packets travel outside the VPN. Rightly so, since the route to the server set by OpenVPN is more specific than the default route to enter the VPN. However I see that as a "leak".

Hence I tried to setup a similar setup as Wireguard does (Wireguard is great, but I need OpenVPN because it needs to be TCP).

I based my setup on the Wireguard page, as well as on other questions: Prevent routing loop with FwMark in Wireguard (Hat off for the lecture held there !) Routing fwmark to VPN gateway using nftables mark

Despite the setup, Wireshark shows the http/https requests still go through the physical interface and not through the vpn tun0 interface. When I look at the packet marks with nft monitor trace, it seems the meta mark is properly set and only the appropriate packets (to/from port 1194) appear.

So I suspected this is:

  • the pbr rule that does not work as expected.
  • the packet marking that does not happen early enough.

I tried to change the chain to mark outgoing packets as:

  • type route hook output
  • type filter hook output
  • --> with no more luck

These commands return the following:

- ip rule:
0:  from all lookup local
32764:  from all lookup main suppress_prefixlength 0
32765:  not from all fwmark 0x4 lookup vpn
32766:  from all lookup main
32767:  from all lookup default
  • ip route show table vpn:

default dev tun0 scope link

  • ip route:

default via 10.8.0.1 dev tun0 proto static metric 50 default via 192.168.1.1 dev wlp4s0 proto dhcp src 192.168.1.10 metric 600 10.8.0.0/24 dev tun0 proto kernel scope link src 10.8.0.2 metric 50 END.POINT.IP.ADDRESS via 192.168.1.1 dev wlp4s0 proto static metric 50 192.168.1.0/24 dev wlp4s0 proto kernel scope link src 192.168.1.10 metric 600

-nft list ruleset: table inet vpn { chain premangle { type filter hook prerouting priority mangle; policy accept; ip saddr END.POINT.IP.ADDRESS tcp sport 1194 meta nftrace set 1 meta mark set ct mark }

chain postmangle {
    type filter hook postrouting priority mangle; policy accept;
    ip daddr END.POINT.IP.ADDRESS tcp dport 1194 meta nftrace set 1
    ip daddr END.POINT.IP.ADDRESS tcp dport 1194 meta mark set 0x00000004
    meta mark 0x00000004 ct mark set meta mark
}

}

  • traceroute -n --fwmark=0x4 END.POINT.IP.ADDRESS shows it goes via the physical interface out of the vpn (as expected)

  • traceroute -n END.POINT.IP.ADDRESS shows it goes via the physical interface out of the vpn (UNWANTED)

Thank you so much in advance !

2 Answers2

0

If not using Strict Reverse Path Forwarding ("SRPF"), then no nftables should be used at all.

While routed (forwarded) traffic usually works fine when marks are handled in iptables or nftables, locally initiated rerouted traffic because of a mark (in type route hook output chain) usually gets issues: the reroute check which happens in the type route hook output chain won't magically change the local source IP address that was already chosen on the client socket. Usually it's the wrong IP address. It thus usually requires a NAT bandaid (that would be needed in type nat hook output) and will probably get UDP handling even more difficult than it already is in a multi-homed environment. Using nftables for this should be avoided whenever possible.

Just as WireGuard, OpenVPN can adequately set the firewall mark itself on its envelope outgoing traffic, and this will then happen before any route lookup happens for locally outgoing traffic:

--mark value

Mark encrypted packets being sent with value. The mark value can be matched in policy routing and packetfilter rules. This option is only supported in Linux and does nothing on other operating systems.

This works the same as WireGuard: the outgoing envelope packets, on the real interface, get the mark, probably by having the client use SO_MARK on its socket before connecting to the server:

SO_MARK (since Linux 2.6.25)

Set the mark for each packet sent through this socket (similar to the netfilter MARK target but socket-based). Changing the mark can be used for mark-based routing without netfilter or for packet filtering.

Of course if neither rerouting nor direct use of policy routing, including direct marking (with SO_MARK or an equivalent method) are in place, chances are it won't work at all.


So delete all nftables rules:

nft delete table inet vpn

and instead add in the client configuration:

mark 4

Keep the routing rules and table (they should probably be integrated in VPN hooks):

- ip rule:
0:  from all lookup local
32764:  from all lookup main suppress_prefixlength 0
32765:  not from all fwmark 0x4 lookup vpn
32766:  from all lookup main
32767:  from all lookup default
- ip route show table vpn:
default dev tun0 scope link

Note: the parts at the end of this answer, only for the SRPF case, should be added before adding the routing table entry above to avoid temporary disruption.

Do not add a default route through the VPN nor an explicit route to the remote endpoint. Don't have the server push this configuration. Or have the client ignore it with:

pull-filter ignore redirect-gateway

or:

route-nopull

In order that these routes don't appear:

default via 10.8.0.1 dev tun0 proto static metric 50 
END.POINT.IP.ADDRESS via 192.168.1.1 dev wlp4s0 proto static metric 50 

but only this one gets added:

10.8.0.0/24 dev tun0 proto kernel scope link src 10.8.0.2 metric 50 

Instead the policy routing rules will handle the default route by selecting the routing table vpn only when adequate.


As explained in my answer to the 1st linked Q/A, most of the nftables ruleset for WireGuard's Table = auto + AllowedIPs = 0.0.0.0 is to handle SRPF for reply traffic. There are a few cases:

  • rp_filter=0 everywhere

    including net.ipv4.conf.default.rp_filter and net.ipv4.conf.all.rp_filter. No RPF check: nothing to do. No nftables needed.

  • rp_filter=1

    Now envelope reply traffic can fail SRPF

    • Either choose Loose RPF on the main interface:

      sysctl -w net.ipv4.conf.wlp4s0.rp_filter=2
      

      and be done with it. No nftables needed,

    • or implement all the logic to mark return envelope traffic just as is done in WireGuard

      • Have the fwmark also be used in reverse path lookup

        by enabling src_valid_mark on main interface (could have been made on all instead), thus allowing SRPF to pass:

        sysctl -w net.ipv4.conf.wlp4s0.src_valid_mark=1
        
      • Transpose (IPv4 only here) WireGuard's setup

        as seen in linked Q/A with additional corner cases described at the end also accounted for, so reply traffic gets the fwmark

        table ip vpn {
            chain preraw {
                type filter hook prerouting priority raw; policy accept;
                iifname != "tun0" ip daddr 10.8.0.2 fib saddr type != local drop
            }
        
        chain premangle {
            type filter hook prerouting priority mangle; policy accept;
            ct mark 4 meta mark set ct mark
        }
        
        chain postmangle {
            type filter hook postrouting priority mangle; policy accept;
            meta mark 4 ct mark set meta mark
        }
        

        }

        Chain preraw is optional and can be removed if needed. It protects against remote (LAN) attempts to access the internal VPN local address.

        The mark is created by OpenVPN on outgoing envelope packets, copied into the connmark at hook postrouting, and re-injected into reply envelope packets at hook prerouting. No endpoint address or port appears anywhere.

        No rerouting is done (no type route hook output nor type nat hook output present).

      Note: the sysctl command and the nftables ruleset above should both be executed before adding the default route in the routing table vpn or temporary loss of connectivity will happen until the VPN TCP socket recovers (still, only once both are added).


The client system can now reach the server from within the tunnel.

Connectivity tests can be done like this:

socat -d -d TCP4:END.POINT.IP.ADDRESS:443 -

OP's tcpdump should reach END.POINT.IP.ADDRESS in a single hop: through the VPN.

At least on an amd64 (x86-64) architecture, the VPN can be bypassed (as root) with:

socat -d -d TCP4:END.POINT.IP.ADDRESS:443,setsockopt-listen=1:36:L4 -

where setsockopt-listen means: use SO_MARK before connecting (rather than listening, for this case). and the 4 in L4 is the same mark value as used by OpenVPN.


Note: the specific case of the client querying through the tunnel an UDP service on the server with a server's public IP address can hit a common issue not really related to VPN but to using UDP and being multi-homed. This requires the UDP service to be multi-homed aware: usually either by using multiple UDP sockets, binding once for each local address (so usually at least once per interface) or with a single unbound UDP socket by using IP_PKTINFO with additional handling code.

A.B
  • 13,968
0

First thanks so much for the prompt reply.

Ow, ok. I understand that since the initial routing decision just out of the requesting process already happened, without NAT, the re-routing will send the packets with a wrong source and hence fail.

So I did the following:

Concerning rp_filter, my distro default is 2, I won't add anymore complexity with that. So leaving all this out of the way as advised. I set all these to 0 for now.

I kept nftables only to trace packets and ensure openvpn did mark the packets as per the below).

- ip rule
0:  from all lookup local
32764:  from all lookup main suppress_prefixlength 0 **I tried with and without that one with no effect on reaching the webserver**
32765:  not from all fwmark 0x4 lookup vpn
32766:  from all lookup main
32767:  from all lookup default
  • ip route show table vpn:

default dev tun0 scope link

  • ip route

default via 192.168.1.1 dev wlp4s0 proto dhcp src 192.168.1.10 metric 600 10.8.0.0/24 dev tun0 proto kernel scope link src 10.8.0.2 192.168.1.0/24 dev wlp4s0 proto kernel scope link src 192.168.1.10 metric 600

  • nft list ruleset (kept just for seeing if openvpn did mark the packets, and it did)

table inet vpn { chain premangle { type filter hook prerouting priority mangle; policy accept; meta mark 0x00000004 meta nftrace set 1 }

chain postmangle {
    type route hook output priority mangle; policy accept;
    meta mark 0x00000004 meta nftrace set 1
}

}

However now, I cannot access the webserver anymore:

traceroute -n --fwmark=0x4 END.POINT.IP.ADDRESS
    Shows it goes via the physical interface and my ISP ips outside of the vpn envelope (as expected). Wireshark shows all the Icmp packets.

traceroute -n END.POINT.IP.ADDRESS attempts the 30 hops without success and fails to find a route. Wireshark does not show any Icmp packet on any interface

socat -d -d TCP4:END.POINT.IP.ADDRESS:443 - socat[56833] N opening connection to AF= ** and stalls until Ctrl-C

I did ip route flush cache just in case. With no success, while all seems in order with:

ip route get END.POINT.IP.ADDRESS
END.POINT.IP.ADDRESS dev tun0 table vpn src 10.8.0.2 uid 0 cache

ip route get END.POINT.IP.ADDRESS mark 4 END.POINT.IP.ADDRESS via 192.168.1.1 dev wlp4s0 src 192.168.1.10 mark 4 uid 0 cache

Wirehshark does not show any packet related to the request, no even to a wrong address or no answer.

Any clue of anything that can get in the way, please ?! Sorry if I misunderstood any point, the setup you proposed makes sense to me frankly. Thanks !