Prevent routing loop with FwMark in Wireguard

Question

I want to set up a VPN server so that the VPN connection is used only when accessing resources within the server. Normally, I'd do this by using the server's internal IP, but I want to use the domain name to access this server.

There are a few ways to achieve this:

Use a custom DNS server
Bind another IP to this server
Set Endpoint to be the same as AllowedIPs in the "client" (I'm aware that wg doesnt use server-client architecture, but it's easier for me to understand this way). E.g:
```
# client
[Interface]
PrivateKey = ...
Address = 10.0.0.2/24
[Peer]
PublicKey = ...
AllowedIPs = 1.2.3.4/32
Endpoint = 1.2.3.4:51820
```

Long story short, option 3 works the best for my use case, but it would cause a loop in the routing table. After doing some research ("Improved Rule-based Routing" section in wireguard page and this solution), I learned that using FwMark in the "server" config could resolve the issue.

So I came up with this:

# server
[Interface]
PrivateKey = ...
ListenPort = 51820
Address = 10.0.0.1/24
FwMark = 51820
PostUp = ip route add default dev wg0 table 2468
PostUp = ip rule add not fwmark 51820 table 2468
PostDown = ip route del default dev wg0 table 2468
PostDown = ip rule del not fwmark 51820 table 2468
[Peer]
...

Needless to say, this doesn't work (both the VPN and direct communication gets timed out, so I'm guessing I've messed up the routing rule) My questions are:

Why doesn't this work?
Does setting FwMark = 51820 in the "server" config marks VPN routed packets? Or do I need something like PostUp = wg set wg0 fwmark 51820?
What happens if I replace 2468 with default, main, or even local? I guess I don't understand why the official doc had to set a new table for this.

Thanks

EDIT: Fix typo with the port

A.B · Accepted Answer · 2023-11-06T13:26:13.420

Introduction

There are two commands: wg the low-level WireGuard ("WG") command that only affects a type wireguard interface's settings (including setting a mark for outgoing envelope traffic), and wg-quick which in addition will configure addresses, routes, and sometimes routing rules, additional routing tables, and even nftables rules to tie all this together.

this answer will explain how to make it work
FwMark = 51820 marks the outgoing envelope (not the payload).

The adequate wg command will be run when this parameter is set.
Table = tells wg-quick where to add routes.
- When not provided, the default is Table = auto. It behaves like Table = main unless there's also AllowedIPs = 0.0.0.0/0 (or AllowedIPs = ::/0 for IPv6) which completely changes the behavior with additional routing table, routing rules, marks and nftables (or iptables if nftables is not installed) rules. This altered behavior is what allows full tunneling VPN (rather than split VPN) configuration to work properly. I give an example later.
- If Table = off then wg-quick won't touch routes at all. This will all have to be handled by the administrator (and possibly with PreUp/PostUp rules).
- Other useful cases are to pick an arbitrary table that will be reused later with custom routing rules. This answer will use Table = 2468 on the client. wg-quick will add routes to this routing table instead of the main routing table. An other choice could have been to use Table = off and add any relevant route with PostUp entries.
- The local routing table should be left alone. It's populated by the kernel and is handling local routes related to addresses assigned on the system for incoming traffic destined to this system, including broadcasts.
- the default routing table is almost never used and is empty in a standard setup. It's probably used in some routing daemons. It would only make sense to have entries in it if there was no default route in the main routing table, else this routing table would never have a chance to be reached with the default routing rules in place (main's default route once found would stop the lookup).

The setup below assumes that the client is running Linux, has nftables installed and its main interface (which will be named eth0) or actually all its interfaces have Strict Reverse Path Forwarding ("SRPF") enabled (sysctl net.conf.ipv4.all.rp_filter=1). nftables rules below, including in the Corner cases for the client paragraph, are used only to have envelope incoming traffic be marked adequately in order to be seen as using the correct route and succeed at the SRPF check. Should no such setup be enabled or at least the main interface (eth0) set to Loose RPF (net.conf.ipv4.eth0.rp_filter=2) then the whole nftables parts can be ignored. It then doesn't matter anymore if incoming packets used the wrong path: they were received and will be used by the routing stack instead of being dropped. Any part below dealing with nftables will thus assume that this is in place on the client:

sysctl -w net.ipv4.conf.default.rp_filter=1
sysctl -w net.ipv4.conf.all.rp_filter=1

There's one case already handled by wg-quick that also accounts for rp_filter=1 where the payload address has to be distinguished from the envelope address which is a remote address using the default route: when using the default setting Table = auto and AllowedIPs = 0.0.0.0/0, which is usual on a client deployment. Then wg-quick uses a firewall mark and additional nftables rules for this case. It would appear like this:

# wg-quick up wg0
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
[#] ip -4 address add 10.0.0.2/24 dev wg0
[#] ip link set mtu 1420 up dev wg0
[#] wg set wg0 fwmark 51820
[#] ip -4 route add 0.0.0.0/0 dev wg0 table 51820
[#] ip -4 rule add not fwmark 51820 table 51820
[#] ip -4 rule add table main suppress_prefixlength 0
[#] sysctl -q net.ipv4.conf.all.src_valid_mark=1
[#] nft -f /dev/fd/63
# ip rule
0:  from all lookup local
32764:  from all lookup main suppress_prefixlength 0
32765:  not from all fwmark 0xca6c lookup 51820
32766:  from all lookup main
32767:  from all lookup default
# ip route show table 51820
default dev wg0 scope link 
# nft list table ip wg-quick-wg0
table ip wg-quick-wg0 {
    chain preraw {
        type filter hook prerouting priority raw; policy accept;
        iifname != "wg0" ip daddr 10.0.0.2 fib saddr type != local drop
    }
chain premangle {
    type filter hook prerouting priority mangle; policy accept;
    meta l4proto udp meta mark set ct mark
}

chain postmangle {
    type filter hook postrouting priority mangle; policy accept;
    meta l4proto udp meta mark 0x0000ca6c ct mark set meta mark
}

}

The firewall mark set by WG for outgoing traffic allows to distinguish (outgoing) envelope traffic from (outgoing) payload traffic in routing rules.

The nftables part is for handling envelope's reply/incoming traffic:

chronologically 1st, it updates in the postrouting hook the conntrack flow to the mark set by WG for outgoing envelope traffic.
then, in the prerouting hook this mark is retrieved back from the saved conntrack flow entry to mark incoming reply envelope traffic
the early prerouting hook (preraw) is to prevent receiving non-WG traffic to the address set on the WG interface (not for routing, but security, won't be retained below)

These nftables rules allow in the end to have incoming envelope traffic seen as envelope traffic in routing rules. Along src_valid_mark=1 all this effort is to have traffic not fail SRPF.

Because wg-quick's bash code triggers it only for Table = auto + AllowedIPs = 0.0.0.0/0, this will have to be adapted and done manually on the client (and that's also the filtering rules part of the * remark at the end of OP's linked Q/A's answer: unless you set up some fancy packet routing/filtering rules on your client instead of using the defaults the WireGuard client sets up for you).

Configurations on client and server

client's /etc/wireguard/wg0.nft which will be loaded from wg0.conf:

table ip wg-quick-wg0        # for idempotence
delete table ip wg-quick-wg0 # for idempotence
table ip wg-quick-wg0 {
    chain postmangle {
        type filter hook postrouting priority mangle; policy accept;
        meta l4proto udp meta mark 1234 ct mark set meta mark
    }
chain premangle {
    type filter hook prerouting priority mangle; policy accept;
    meta l4proto udp meta mark set ct mark
}

}

client's /etc/wireguard/wg0.conf (referencing the previous file):

[Interface]
PrivateKey = clientprivkey
FwMark = 1234
Table = 2468
Address = 10.0.0.2/24
# setting it on eth0 would have been enough
PreUp = sysctl -q -w net.ipv4.conf.all.src_valid_mark=1
PreUp = nft -f /etc/wireguard/wg0.nft
PostUp = ip -4 rule add not fwmark 1234 table 2468
# not actually needed for OP's current setup, but won't hurt 
PostUp = ip -4 rule add table main suppress_prefixlength 0
# wg-quick probably already deleted the table because of the chosen name but just in case...
PostDown = nft delete table wg-quick-wg0 2>/dev/null || true
[Peer]
PublicKey = serverpubkey
AllowedIPs = 1.2.3.4/32
Endpoint = 1.2.3.4:51820
needed if client is behind NAT and can receive server-initiated WG traffic
PersistentKeepalive = 25

Assuming below the client's current IP address on its main interface eth0 is (NAT-ed...) 192.168.1.2/24 with a gateway 192.168.1.1 here's what would be the outcome for various routing decisions:

# # outgoing payload
# ip route get 1.2.3.4
1.2.3.4 dev wg0 table 2468 src 10.0.0.2 uid 0 
    cache 
# # outgoing envelope: mark is set directly by WG
# ip route get 1.2.3.4 mark 1234
1.2.3.4 via 192.168.1.1 dev eth0 src 192.168.1.2 mark 0x4d2 uid 0 
    cache
# incoming payload
ip route get from 1.2.3.4 iif wg0 to 10.0.0.2
local 10.0.0.2 from 1.2.3.4 dev lo table local 
    cache <local> iif wg0
# incoming envelope without mark: doesn't happen thanks to nftables. Would be rejected by SRPF
ip route get from 1.2.3.4 iif eth0 to 192.168.1.2
RTNETLINK answers: Invalid cross-device link
# incoming envelope: reply envelope traffic mark is set by nftables from conntrack entry
ip route get from 1.2.3.4 iif eth0 to 192.168.1.2 mark 1234
local 192.168.1.2 from 1.2.3.4 dev lo table local mark 0x4d2 
    cache <local> iif eth0

The server setup is much simpler because its own endpoint is a known configuration and the client's payload address within WG is a non-routable address over Internet and won't clash with the client's envelope address. No nftables involved on the server.

server's /etc/wireguard/wg0.conf:

[Interface]
PrivateKey = serverprivkey
ListenPort = 51820
#no address set on wg0
[Peer]
PublicKey = clientpubkey
AllowedIPs = 10.0.0.2

If this server's main interface eth0 has address 1.2.3.4/24 and a gateway of 1.2.3.1/24 with a remote WG peer (client) discovered as 192.0.2.2:45678 when it reached the server, the various routing cases, all simple, are:

# # outgoing payload 
# ip route get 10.0.0.2
10.0.0.2 dev wg0 src 1.2.3.4 uid 0 
    cache 
# # outgoing envelope
# ip route get 192.0.2.2
192.0.2.2 via 1.2.3.1 dev eth0 src 1.2.3.4 uid 0 
    cache
# incoming payload
ip route get from 10.0.0.2 iif wg0 1.2.3.4
local 1.2.3.4 from 10.0.0.2 dev lo 
    cache <local> iif wg0
# incoming envelope
ip route get from 192.0.2.2 iif eth0 1.2.3.4
local 1.2.3.4 from 192.0.2.2 dev lo 
    cache <local> iif eth0

No address was added on wg0 to prevent the usual UDP multi-homed complication. Had this address been added, the source address to reach 10.0.0.2 would have been chosen as 10.0.0.1, thus triggering issues for multi-homed-unaware UDP services answering through the tunnel: UDP replies to queries made to 1.2.3.4 would use 10.0.0.1 as source which would be rejected by the client application (and even before this, dropped by its WG interface since 10.0.0.1 is not allowed). Additional details in these Q/A where I made an answer:

SF: Libvirt - UDP not working between host and VM
UL SE: Server does not respond to ping - ICMP is received and nothing happens (search for Caveat: UDP services)

The best way to never encounter an UDP multi-homed problem is to actually not be in the multi-homed case and not add an IP address on wg0. Source address selection algorithm will select the "main" IP address instead when it can't select an address from wg0. Should the server be already multi-homed and somehow the wrong source address selected, then it's possible to fix the route added by wg-quick with this additional entry on server's wg0.conf:

PostUp = ip route replace 10.0.0.2/32 dev wg0 src 1.2.3.4

which will hint the address selection algorithm to use 1.2.3.4 as source to reach 10.0.0.2. This could also allow to set 10.0.0.1 on wg0 even if it won't be used and change this kind of route for the whole /24 then in case of multiple peers for example with this instead of above:

Address = 10.0.0.1/24
PostUp = ip route replace 10.0.0.0/24 dev wg0 src 1.2.3.4

This route change won't survive or follow something affecting the actual 1.2.3.4 (eg: main interface administratively set down then up). Actually one can just add "again" 1.2.3.4 on wg0. For example instead of above (be sure it's as a /32 so it won't trigger the creation of any additional routes causing network disruption):

Address = 1.2.3.4/32

This is for the server as an end node. If it's acting as router and routing traffic (eg: containers or VMs) rather than generating it itself, then there's no UDP multi-homed issue, but chances are that settings to handle this router case will be missing anyway.

Corner cases for the client

If not using SRPF (rp_filter=1) on the client then just as all other nftables settings, applying the fixes below is not needed.

There are corner cases on the client when bypassing WG to reach the server with an adequate application able to use setsockopt(sockfd, SOL_SOCKET, SO_MARK, ...)

traceroute (--udp)

Need to traceroute toward the server through the actual Internet to figure out what network outage is going on?
```
traceroute -n --fwmark=1234 1.2.3.4
```
This will still timeout right before displaying the target because expected return traffic isn't UDP but a related ICMP error: no mark set and rejected by SRPF. Can be fixed by adding a rule marking related traffic:
```
nft add rule ip wg-quick-wg0 premangle ct state related ct mark 1234 meta mark set ct mark
```

ping, other traceroutes, socat with TCP...

ping -n -m 1234 1.2.3.4
traceroute -n --icmp --fwmark=1234 1.2.3.4
socat exec:'echo foo' TCP4:1.2.3.4:3333,setsockopt-listen=1:36:L1234

Also systematically mark and retrieve conntrack flow's mark, not just for UDP:

nft add rule ip wg-quick-wg0 postmangle meta mark 1234 ct mark set meta mark
nft add rule ip wg-quick-wg0 premangle ct mark 1234 meta mark set ct mark

This makes the 3 commands above work instead of having a timeout.

Prevent routing loop with FwMark in Wireguard

1 Answers1

Introduction

Configurations on client and server

needed if client is behind NAT and can receive server-initiated WG traffic

# incoming payload

ip route get from 1.2.3.4 iif wg0 to 10.0.0.2

# incoming envelope without mark: doesn't happen thanks to nftables. Would be rejected by SRPF

ip route get from 1.2.3.4 iif eth0 to 192.168.1.2

# incoming envelope: reply envelope traffic mark is set by nftables from conntrack entry

ip route get from 1.2.3.4 iif eth0 to 192.168.1.2 mark 1234

# incoming payload

ip route get from 10.0.0.2 iif wg0 1.2.3.4

# incoming envelope

ip route get from 192.0.2.2 iif eth0 1.2.3.4

Corner cases for the client

Linked