0

Problem: Google's Public DNS returns NXDOMAIN for certain SLDs.

Proof of problem:

dig vpn.example.com @8.8.8.8

   ; \<\<\>\> DiG 9.11.5-P4-5.1+deb10u8-Debian \<\<\>\> vpn.example.com @8.8.8.8
    ; global options: +cmd
    ; Got answer:
    ;; -\>\>HEADER\<\<- opcode: QUERY, status: NXDOMAIN, id: 8324
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;vpn.example.com.        IN    A

;; AUTHORITY SECTION:
example.com.    1800    IN    SOA    ns1.example.com. root.example.com. 1675851775 28800 7200 604800 86400

;; Query time: 134 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Feb 09 09:52:06 EET 2023
;; MSG SIZE  rcvd: 93

as you can see the query status is NXDOMAIN. Asking the authoritative DNS server listed in the AUTHORITY section, however, points to a correct answer:

dig vpn.example.com @ns1.example.com

; \<\<\>\> DiG 9.11.5-P4-5.1+deb10u8-Debian \<\<\>\> vpn.example.com @ns1.example.com**
;; global options: +cmd
;; Got answer:
;; -\>\>HEADER\<\<- opcode: QUERY, status: **NOERROR**, id: 37073
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 600 ;; QUESTION SECTION: ;vpn.example.com. IN A

;; ANSWER SECTION: vpn.example.com. 3600 IN A XXX.XX.X.XXX

;; Query time: 128 msec ;; SERVER: XXX.XX.XX.XXX#53(XXX.XX.XX.XXX) ;; WHEN: Thu Feb 09 09:58:05 EET 2023 ;; MSG SIZE rcvd: 64

Other public DNS servers (opendns, cloudflare, etc) all resolve the SLD.

The authoritative DNS server (all 4 NS records) is consistent in the responses:

for i in $(seq 1 30)
do
query=$(dig +short us1.vpn.example.com @ns1.example.com)
if [[ -z "$query" ]]
then echo "NO ANSWER"
else
echo "ANSWER"
fi
sleep 2
done | sort | uniq -c

30 ANSWER

I tried the following in two different tabs:

TAB1 client side //

while true; do dig +short vpn.example.com @8.8.8.8; sleep 1; done

TAB2 dns server side //

tcpdump -vvvvv -w /tmp/dns.pcap udp and port 53

TAB2 dns server side //

tcpdump -n -t -r /tmp/dns.pcap | grep vpn

and tried to discern any IPs of the subnets listed here : https://developers.google.com/speed/public-dns/faq#locations_of_ip_address_ranges_google_public_dns_uses_to_send_queries

and found none for that specific host. How can I further debug this? Thanks for any forthcoming suggestions!

vidarlo
  • 11,723

3 Answers3

2

The authoritative DNS server (all 4 NS records) is consistent in the responses:

No, it is not. The server ns1.exmaple.com occasionally flips between returning the A record and returning NXDOMAIN for this name. (It seems that making a query via TCP, using dig +vc, is a reliable way to make it start responding with NXDOMAIN over both protocols.)

$ dig vpn.example.com @ns1.example.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24350
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

$ dig +vc vpn.example.com @ns1.example.com ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 63787 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; WARNING: recursion requested but not available

$ dig vpn.example.com @ns1.example.com ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10815 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; WARNING: recursion requested but not available

In this situation it is normal to have some cache inconsistency (as seen by Tomek), as Google DNS is not just anycast globally – each location has its own multiple resolvers with independent caches behind a load balancer, so even if you're seeing the same NSID you're still getting replies from a different backend server every time. (As a side note, don't forget the cache flush page.)

It is possible that ns1.example.com is similarly handled by more than one server behind a load balancer, some of which give the correct result and some do not.

grawity
  • 17,092
0

I can intermittently reporoduce it on all Google DNS addresses:

triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
XXX.XX.X.XXX
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
XXX.XX.X.XXX
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
XXX.XX.X.XXX
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
XXX.XX.X.XXX
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
XXX.XX.X.XXX
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
XXX.XX.X.XXX
triss:~> dig vpn.obfuscated.com @2001:4860:4860::8844 +short
XXX.XX.X.XXX

Looks like some cache inconsistency at Google.

Tomek
  • 3,776
0

This turned out to be a PIPEBackend issue with the powerdns authoritative server. Thanks to all involved!