3

I have a standalone, isolated network running mixed Windows and Linux systems, with a Windows 2008 R2 server performing AD duties and DNS.

I'm seeing 5-second delays with the use of getaddrinfo on the Linux systems.

In Wireshark I see (C->S means client to DNS server):

t=0.000   C->S Query A     foo.example.com    ID=0x1111
t=0.000   C->S Query AAAA  foo.example.com    ID=0x2222
t=0.004   S->C Response to 0x2222, No error
          (Query is echoed)
          Authoritative nameservers:
             example.com: type SOA, class IN, mname svr01.example.com
               Name: example.com
               Type: SOA
               Class: IN
               TTL: 1 hour
               Primary name server: svr01.example.com
               Refresh interval: 15 minutes
               Retry interval: 10 minutes
               Expiration limit: 1 day
               Minimum TTL: 1 hour

[5 second delay]

t=5.004   C->S Query A     foo.example.com    ID=0x1111
t=5.005   S->C Query response A  192.168.1.17'

If I make the same request again, shortly thereafter, I will see no delay, as expected:

t=0.000   C->S Query A     foo.example.com    ID=0x3333
t=0.000   C->S Query AAAA  foo.example.com    ID=0x4444
t=0.001   S->C Query response A  192.168.1.17'

I can continue to get immediate responses for some period of time. After a while (still experimenting) the delay will return.

What is going on here? If I use gethostbyname() (which only does IPv4) or nslookup foo.example.com, there is no delay.

Additional info:

  • IPv6 is disabled on the server NICs

Update:

This answer on Ask Ubuntu suggested adding

options single-request

to /etc/resolv.conf. This seemed to correct the problem for me.

However, I'm still curious:

  • What the SOA record actually means
  • Why the server doesn't respond the first time to the A query

2 Answers2

2

Your DNS server appears to be buggy. Two requests are sent to the DNS server, but it sends only a single reply. The client does what clients are supposed to do in that case, it waits a short while and then retransmits the request.

An initial delay of 5 seconds may be reasonable for non-interactive usage. But for interactive usage I would consider that to be way too high.

The proper fix would be to upgrade the DNS server to a version without the bug or to contact the vendor if no fix has been released yet. Everything else is a workaround.

Using man resolv.conf on a Ubuntu system will explain what the single-request and single-request-reopen options do. Those are two different variations of a workaround for a known bug in certain DNS servers. The drawback of those options is that it slows down name resolution by roughly a factor of two. However given that the bug appears to slow down name resolution by a factor of about 1000, you may still be better off using the workaround.

When requesting a nonexistent record you may receive a response with a SOA record instead. The reason for sending not just an error code but also a SOA record is that the SOA record contains information which will allow the negative result to be cached.

kasperd
  • 31,086
0

The correct way to interpret your packet capture is that you're seeing dropped reply packets for both the A and AAAA record responses.

The SOA record seems to be confusing you and is worth elaboration:

  • The SOA record is actually in the authority section, not the answer section.
  • NXDOMAIN means "there are no records that have that name". If there are other records with the same name, but different types, the response you will see is NOERROR with zero records in the answer section.
  • What you're seeing is a NOERROR response with zero answers and an authority section telling you what zone that answer came from. You can ignore the SOA component entirely. This reply is telling you that there is no AAAA record.

Now that we've established that the AAAA reply is a correctly formatted packet and what you should be seeing in this scenario, it changes the context of what we're you're looking at entirely. You are seeing cases where A record replies are being lost, in addition to AAAA replies being lost. Your research suggests that AAAA responses are being lost more frequently, but not exclusively.

Based on the information supplied, we're not going to be able to explain what is going on here. You need to set up packet captures on the DNS servers themselves and identify the following factors:

  1. Do the queries associated with the missing replies actually arrive at the DNS server?
  2. If the queries are arriving at the DNS server, are replies actually being sent?
  3. If the server is not sending replies, is your DNS server having to get this information from a different DNS server that is taking a long time to respond? (times out on the initial attempt, but has the query in cache for the second attempt) Are you seeing heavy enough query load to overflow your socket queue?
  4. If the server is sending the replies, what devices between the server and the client could be losing the packets? Does one of your DNS servers have a routing problem compared to the others? Does it seem like packets are being lost from all of the DNS servers, suggesting a network problem somewhere between the client and server?

As you can see, there's a lot of things that could be going on here. You're going to need to narrow in on the problem to rule out possibilities. I apologize for this answer not being conclusive, but this was far more than could be covered within a few comments. Feel free to update your question.

Andrew B
  • 33,868