Outbound IPv6 connection replies not routed back to firewall in VPC

Question

In a newly-built AWS VPC (deployed with Terraform to minimise typos), I have one "DMZ" subnet and one internal. A firewall appliance bridges the two, with an interface in each. Both interfaces have IPv4 and IPv6 addresses. The IPv6 addresses are one link-local and one global. The firewall can reach a HTTPS URL over IPv4 and IPv6, with traffic routed out of the DMZ interface. A test server built in the internal zone has one interface in the internal subnet, and has appropriate IPv4 and IPv6 addresses (one link-local, one global). The default route on the server is that of the global address of the internal interface of the firewall, for both the IPv4 and IPv6 route tables.

The interfaces on the firewall have Source/dest. check disabled. Security groups and Network ACLs are set up so that internal servers must route outbound web connections via the firewall - the NACL disallows any inbound or outbound packets on the internal subnet (so the dual-homed firewall is the only one that can get out of it). The DMZ NACL permits outbound HTTPS to all destinations (IPv4 and IPv6), and high port responses from all destinations. The firewall NATs outgoing IPv4 connections from internal, and this is working OK; the server can reach the HTTPS URL over IPv6. However, the server cannot reach the IPv6 address. It fails to connect.

The firewall itself can successfully make an outbound IPv6 connection using the same URL, so the remote URL must be working
The server successfully resolves the URL to an IPv6 IP just as the firewall does
tcpdump on the DMZ interface of the firewall shows the packets leaving the appliance with the src IP as that of the global IPv6 IP of the server and the dest IP as that of the remote URL, so packets are being successfully routed to/through the firewall
Same dump shows no response packets from the remote URL so assuming the packets make it to the remote URL (since they do when the firewall sends them), the firewall isn't receiving the response packets
The appliance does not attempt to do anything such as NAT or tunneling on IPv6, it is just routing and forwarding (and firewalling)
The appliance logs all connections it firewalls so I know it is not blocking this connection
AWS Reachability Analyzer does not work with IPv6 so I cannot use it to find the problem
The routing table for the DMZ subnet lists the DMZ interface of the firewall as the next hop for IPs in internal (and DMZ)
The firewall is a Linux instance, packet forwarding for IPv6 is enabled (otherwise the packets wouldn't have made it as far as the DMZ interface)

I'm wondering if the stateful nature of the security group rules is at the heart of the problem; although the firewall has an inbound security group rule allowing all protocols and ports from anywhere, since the firewall sent out a packet for which it was not the src IP, will the AWS routing system return the response packets to the firewall? It ought to since that is the stated route for that destination in the route table for the VPC subnet. And in any case, shouldn't the disabled source/dest check override any concerns in that area?

Edit: The goal is to deploy a VPC NAT gateway at a cheaper price than the $40 or so per month per availability zone that AWS charges; assuming two AZs and two regions for availability that's $160/month which would be a 100% increase on my AWS spend. The design is the standard internal/DMZ/external design used by companies the world over for decades and is VPC best practice; I see security groups as analogous to host firewalls; good to have (and in the case of security groups with the advantage of some centralised API control) but not sufficient on their own; in a non-virtualised environment you wouldn't give a Windows server an external IP and hope Windows firewall is never misconfigured, you'd put a separate firewall in front of it if you're at all serious about your infrastructure security. So assuming I want centralised declarative "control" of the host firewall (OS unspecified), that is security groups via Terraform and I need something else in front of that. NACLs are too size limited, so a NAT gateway is the natural choice. It also gives a single egress point where I can inspect traffic for signs of internal compromise.

I have noticed that in this document AWS state that their NAT instance does NAT64, increasing my suspicion that there is some structural reason why IPv6 routing doesn't work in a VPC.

Edit 2: VPC flow logs show the outbound packet flow being ACCEPTed, and there is no corresponding flow for the reply, which means (I think) the packets aren't being routed to the fw interface.

score 1 · Answer 1 · answered Apr 03 '24 at 10:46

The answer to this was Edge associations. I was trying to do what is described in this document as middlebox routing; and the sentence that had not penetrated my conscious when reading it was:

Associate this route table with your internet gateway

Because Edge associations weren't in the document, I had misinterpreted the meaning of that and was configuring the subnet route tables rather than the igw route table (it didn't have one at all, and so I suppose was using some default one - maybe the main route table?). Thus, the reply packets were not being routed to the firewall. Now it routes IPv6 happily.

I had never used Edge associations before so it wasn't in my conscious as a thing.

Outbound IPv6 connection replies not routed back to firewall in VPC

1 Answers1