Windows DNS Lag and Timeouts

Question

I've recently migrated some 2012 R2 servers to Server 2022, including some domain controllers.

Everything appears to be fine except that one of the VMs is behaving strangely when attempting name resolution.

The behaviour:

I first noticed something strange when trying to ping other servers on the network, a ping would take a few seconds to resolve an IP address, and sometimes timeout. This was obviously uncharacteristic, as even external hosts usually resolve in under a second. This would explain why some of the scripts we run are failing, i.e. cannot connect to a SQL server, unable to resolve external API endpoint URLs, and so on.

It also appears that it's not maintaining local cache, for example, when it does resolve, it seems to take half a second longer than it should, another example are recently resolved internal addresses that don't resolve a few minutes later.

However, when I use nslookup with the same nameservers (I've tried the internal DCs and 8.8.8.8) it is working as expected, there's no sign of any issue.

I installed WireShark on this problem-host and the name server, I can see requests being made and received, however I do notice that the domain controller doesn't appear to respond to all of the requests (that might be expected, I'm no DNS expert), but when it does then it makes it back and the client is happy.

What I've tried:

netsh resets of IP and winsock.
ipconfig /flushdns
changing IP addresses of the host
using 8.8.8.8 instead of the internal nameservers
disabling the firewall
disable AV software (Defender)
re-installed NIC

For now, I've got critical hosts listed in the hosts file and it seems OK.

I'm baffled by this, any ideas would be great. Thanks

score 1 · Answer 1 · answered Oct 12 '23 at 02:19

Try enabling debugging and analyzing the traffic in DNS logs. It may provide more information. Response rate limiting could be the cause of dropping of some DNS Requests. Did doing the migration cause the recreation of root hints or loss of any forwarders or any other settings? Did you try isolating the issue to one particular name server? You may not be able to recreate the problem if a dns suffix or something is being appended by the application, but I'd expect that analyzing the DNS logs would make that apparent. You may also use Get-DNSServerCache and Show-DNSServerCache in powershell to see if the TTL for cached records is short. What events do you see from the event logs on the DNS Server and client?

score 1 · Answer 2 · answered Nov 17 '23 at 04:51

1

Apparently this was caused by a bug in Nutanix VirtIO drivers.

https://portal.nutanix.com/page/documents/kbs/details?targetId=kA07V000000LYCqSAO

answered Nov 17 '23 at 04:51

s-twig

83
1
3
10

Windows DNS Lag and Timeouts

2 Answers2