XenDesktop Session Launch Hypervisor Interactions

I got an email recently asking if I knew whether or not a XenDesktop site takes a hosting unit’s load or availability into consideration when brokering session launch requests, especially reconnects to desktops that were ‘In Use’ when a host goes down. This question was posed in the context of Desktop Groups with catalogs that are spread across multiple hosting units.

The simple answer to this question is no. XenDesktop’s interactions with the hypervisor (via the Hypervisor Abstaction Layer) were always intended to be used for power action/status, and MCS/PVS related cloning activities. When a XenDesktop site selects a ‘worker’ to fulfill a session launch request, it only looks at the worker’s registration status, and not that of the host that the worker guest VM is running on.

That said, the selection process for the next available worker is determined via stored procedures. To find out what the ‘next available’ worker is going to be in a XenDesktop 5.x site, you can run following T-SQL against the database, specifying the Desktop Group UID in the first line:

declare @DesktopGroupUid int = 1
declare @Readiness int = 3
declare @Uid int

 update Top(1) chb_State.Workers
 set @Uid = W.Uid
from chb_State.Workers W
            with (readpast,
                  index(IX_Workers_DesktopGroupUid_Usage_DynamicSequence))
         where DesktopGroupUid = @DesktopGroupUid
           and LaunchReadiness >= @Readiness
           and SinBinReleaseTime is null;select * from chb_State.WorkerNames
where Uid = @Uid
go

This script will return the name of the machine that the site will use to satisfy the next pooled-random session launch request to the specified desktop group, and doesn’t care what’s going on with the hosting unit where the worker lives. The site is only concerned with the worker’s registration state, and could care less if the power state of the VM is On, Off, or Unknown, much less does it care about the load of the hosting unit where that machine is running.

To that point, if a worker continues to register when a hosting unit connection becomes inaccessible (vCenter is down, but not the ESX host, for example), the desktop will still be available for session launch, but not for power management. This scenario can cause problems, such as ‘tainted’ workers that don’t get powered off after use, and end up in the unfortunate sounding ‘SinBin’. This process is only temporary, and is only corrected after the machine is rebooted by the XenDesktop site (check out the CTX article I wrote for more info).

As far as a scenario where a session was ‘In Use’ when the host goes down, the broker reaper site service will eventually clean up the failed worker when the ‘DDC Ping’ times out (controlled by the ‘HeartbeatPeriodMs’ value on the DDC running the reaper service). So, by default, you could potentially get into a situation where reconnects for ‘In Use’ session will keep selecting the failed worker until the reaper cleans it up. While this shouldn’t take longer than 5 minutes with the default heartbeat value, it may cause problems if there are frequent outages or service interruptions between geographically dispersed datacenters.

To work around the ~5 minute functionality gap of hosting unit availability awareness, as it relates to session launch anyways, one could easily trigger a XenDesktop PoSH script in the event of an outage (and the reverse when the outage is recovered) to toggle the ‘maintenance mode’ flag on any workers on a failed host. I’d like to hope that the XenDesktop product team has at least considered the potential for expanding the site’s visibility into the status of a guest VM’s host, and would love to see ‘smarter’ brokering logic such as thing in future releases.

Leave a comment