The Setup:
We deployed 2 multi-session host VMs with NVIDIA GPU and 110GM RAM (CPU SKU = NC16as T4 v3).
Session hosts (pooled AVD config) are Entra ID joined and receive policies using InTune. Everything works well most of the time.
AVD Doc we followed: https://learn.microsoft.com/en-us/azure/virtual-desktop/azure-ad-joined-session-hosts
Azure Files storage account (premium tier) 1 TB volume, 200 MiB/sec throughput. Config doc: https://learn.microsoft.com/en-us/azure/virtual-desktop/create-profile-container-azure-ad
All AVD users were created in traditional ADDC (deployed in Azure), then syncd to Entra ID by EntraID Connect. This doc was followed in order to enable Kerberos authentication on the Azure Files storage account, every section was completed, ADDC security groups (for ACL assignment on the storage account share) are all synced to Entra ID. https://learn.microsoft.com/en-us/azure/storage/files/storage-files-identity-auth-hybrid-identities-enable?tabs=azure-portal
The Problem:
Randomly, the user's profile.vhdx file (stored in Azure Files) disconnects and the VM session crashes because Windows cannot write to the user profile at C:\users\username. The same thing happened to three different users who were logged on to the same session host VM.
The users were all disconnected within the same hour. They logged in within the same hour, about seven hours earlier.
Klist shows no kerberos ticket for this storage account, it simply disappears and doesn't get refreshed. I'm guessing Windows falls back to NTLM authentication, and a DC cannot be reached because the Session Host isn't joined to the local AD domain, it's Entra ID joined.
Is Azure Files sputtering? Is our ADDC sputtering, not replying with refreshed kerberos tickets?
From the FsLogix log:
[20:35:08.091][tid:00001270.00001274][INFO] Configuration Read (DWORD): SOFTWARE\FSLogix\Profiles\ReAttachRetryCount. Data: 3
[20:35:08.091][tid:00001270.00001274][INFO] Configuration Read (DWORD): SOFTWARE\FSLogix\Profiles\ReAttachIntervalSeconds. Data: 15
[20:35:08.091][tid:00001270.00001274][INFO] ===== Begin Session: Volume re-attach
[20:35:08.093][tid:00001270.00001274][INFO] Session configuration read (DWORD): SOFTWARE\FSLogix\Profiles\Sessions\S-1-12-8-1199028510-1098096551-2196708500-1227410091\LogonStage = '5'(Logon_Complete)
[20:35:08.094][tid:00001270.00001274][INFO] Session configuration read (DWORD): SOFTWARE\FSLogix\Profiles\Sessions\S-1-12-8-1827290170-1117134380-2978440076-3511415481\LogonStage = '5'(Logon_Complete)
[20:35:08.094][tid:00001270.00001274][INFO] Session configuration read (DWORD): SOFTWARE\FSLogix\Profiles\Sessions\S-1-12-8-946945468-1263498019-3621207431-xxxx\LogonStage = '5'(Logon_Complete)
[20:35:08.095][tid:00001270.00001274][INFO] Attempting re-attach of volume: \\?\Volume{33b768bd-fc58-444c-87ac-b40e906720eb}\ for SID: S-1-12-8-946945468-1263498019-3621207431-xxxx
[20:35:08.095][tid:00001270.00001274][INFO] Configuration setting not found: SOFTWARE\FSLogix\Profiles\LogonSyncMutexTimeout. Using default: 60000
[20:35:08.095][tid:00001270.00001274][INFO] Acquired reattach virtual disk lock for user sturner (SID=S-1-12-8-946945468-1263498019-3621207431-xxxx) (Elapsed time: 0)
[20:35:08.095][tid:00001270.00001274][INFO] VHDPath: \\sa.file.core.usgovcloudapi.net\profiles\S-1-12-8-946945468-1263498019-3621207431-xxxx_sturner\Profile_sturner.VHDX
[20:35:08.105][tid:00001270.00001274][INFO] Username: sturner
[20:35:08.105][tid:00001270.00001274][INFO] Attempting re-attach as the user
[20:35:08.105][tid:00001270.00001274][INFO] Retry Count: 3 Retry Interval (seconds): 15
[20:35:08.113][tid:00001270.00001274][INFO] Unsuccessful re-attach attempt. Retry in 15 seconds.
[20:35:23.115][tid:00001270.00001274][INFO] Retrying re-attach (1 of 3)
[20:35:23.115][tid:00001270.00001274][ERROR:000004f1] Failed to read WindowsSessionID (The system cannot contact a domain controller to service the authentication request. Please try again later.)
[20:38:23.385][tid:00001270.000042e4][ERROR:00000003] Unable to check free disk space for vhd(x): \\sa.file.core.usgovcloudapi.net\profiles\S-1-12-8-946945468-1263498019-3621207431-xxxx_sturner\Profile_sturner.VHDX (The system cannot find the path specified.)
[20:38:23.390][tid:00001270.000042e4][INFO] Profile refcount decremented to: 0