My current project makes heavy use of the Windows Server AppFabric Caching Service and whilst I think this is great piece of technology, it does have a pretty big hole in functionality. There’s currently no supported, out-the-box way to get a cache host (or the entire cluster) to start automatically, say after a server is rebooted – deliberately or otherwise. This is a bit of a limitation – after you spend time digesting all the configuration options and understood the importance of high availability and set up your cluster perfectly, you still need to be on permanent stand by in case one of your production servers goes down. At the minute, without logging on to one of the hosts in the cluster and manually executing some Powershell commands, your cache host remains permanently “out of the cluster” if something went wrong. Oh dear.
Some trawling around the web throws up some interesting but conflicting discussions – on the one hand, this functionality is not supported, but on the other, it does work by changing the Windows service to start up automatically (although with the caveat that this could take up to 15 mins to restart), and it will never work with an XML based configuration store.
Confusing to say the least. Changing the AppFabric Caching Service to have a startup type of Automatic in the Services control panel resulted in some really unpredictable behaviour. We are using an XML based configuration store, so maybe this was never going to work, but all sorts of errors and crashes started appearing in the Event Log, basically the cache became unusable until this was changed back to Manual.
So, a custom approach was needed…
The custom solution is pretty simple, but there’s a few moving parts so I’ve broken it down piece by piece. It involves a custom Powershell script to start the cache host or cluster that is trigged by a Windows Scheduled Task upon system startup.
By default, Powershell ships in a restricted mode that denies the execution of scripts. This always catches me out, and did so when I migrated this solution from running fine in my development environment onto one of our production servers. After failing silently a few times, I eventually realised what was going on, so make sure your Powershell is configured correctly before you start to you don’t waste time like I did!
Open the Powershell console (you may need to Run as Administrator) and execute the following commandlet to grant script execution:
Accept the warning by entering Y and you’re good to go.
The script itself is pretty simple. It basically imports the relevant modules, tells the caching service to look for the local installation to get the cluster settings and then starts the cache host based on the local machine name using the default cache port.
import-module DistributedCacheAdministration $computer = gc env:computername use-cachecluster start-cachehost -hostname $computer -cacheport 22233
I created this as a new Powershell script called StartCacheHost.ps1
In order to call this script at system startup, I created a new scheduled task using the Task Scheduler.
After opening the Task Scheduler, I created a new task as follows:
Make sure that this task is set to run whether the user is logged in or not, and that it’s configured to run for your flavour of OS. I also ticked the “Run with highest privileges” box as I usually have to run the Caching Administration Powershell Tool as an Administrator.
Moving to the Triggers tab, I added a new trigger to execute the script on system startup as follows:
I opted to delay execution for 30 seconds after startup. This may not be necessary, but it’ felt like a minor trade-off to ensure everything’s up and running before we try to start the cache host.
Finally, moving to the Actions tab, I set the action to execute as follows:
The program/script to run is the Powershell executable and the argument to pass in is the full path to the saved StartCacheHost.ps1 script.
In order to test that everything’s hanging together nicely, I wanted to start at the bottom and build upwards. So firstly, I wanted to test that the script itself would work.
On a running cache cluster, I stopped the current host by executing the following commandlet in the Caching Administration Powershell Tool:
stop-cachehost –hostname xxxxxxx –cacheport 22233
where xxxxxxx is the local machine name (the current host).
returns information about the hosts running in the cluster and indicates that my current host now has a status of “DOWN”.
I then opened the Windows Powershell console and executed my StartCacheHost.ps1 script at the command prompt. Running get-cachehost again indicates that my host is now back up and running again – i.e. it has rejoined the cluster.
I then moved on to testing the scheduled task by stopping the current cache host again. This time I selected my new task in Task Scheduler and manually executed it by clicking Run.
Again, running get-cachehost indicates that my host is back up and running again.
Finally, to piece everything together, I restart my cache host server. 30 seconds after the server comes back up, my script executes and the host has rejoined the cluster. Perfect.
The script above assumes that you’re running in a cluster (i.e. more than one cache host) and that the cluster is in a working state so that the current host can rejoin the cluster. If the cluster was down (e.g. if too many lead hosts had gone down) then the host would not be able to re-join the cluster.
I would treat these cases however as critical failures as the entire cluster has gone down and manual intervention is probably required anyway. Ideally, I’d like to be able to check if the cluster is running before executing the start-cachehost command – if there’s no cluster running then this could be swapped for start-cachecluster, although whether this would work would depend on the configuration of lead hosts etc.
I also wanted this behaviour to function in my development environment to save me having to start the caching service whenever I rebooted my laptop. In this case, where I know that there’s only ever one host running in the cluster, I would change the script to execute the start-cachecluster command instead of the start-cachehost e.g.
import-module DistributedCacheAdministration use-cachecluster start-cachecluster