Windows Azure Diagnostics – An Introduction

I’ve sat down at least a dozen time over the last few weeks to write this update. Be it sitting in my hotel after work or at the airport waiting for my flight. I either couldn’t get my thoughts collected well enough, or simply couldn’t figure out what I wanted to write about. All that changed this week when I had a moment of true revelation. And I think its safe to say that when it comes to Azure Diagnostics… I get it.

First off, I want to thank a true colleague, Neil. Neil’s feedback on the MSDN forums, private messages, twitter, and his blog have been immensely helpful. Invariably, when I had a question or was stuck on something he’d sail in with just the right response. Neil, thank you!

November Launch – Azure Diagnostics v1.0

If you worked with Windows Azure during the CTP, you remember the RoleManager.WriteToLog method that used to be there for instrumenting Windows Azure applications. Well, as I blogged last time, RoleManager is gone and has been replaced by ServiceRuntime and Diagnostics. The changes were substantial and the Azure team has done their best to get the news out about how to use it. From a recording of the PDC session (good job Matthew) to the MSDN reference materials and of course lots of community support.

I managed to ignore these changes for several months. That is until I was engaged to help a client do a Windows Azure POC. And this required me to dive quickly and deeply into Windows Azure Diagnostics (WAD from here on out). There’s some excellent resources on this subject out there. But what I was lacking for the longest time was a comprehensive idea of how it all worked together. The big picture if you will. That changed for me this week when the final pieces fell into place.

Azure Diagnostics Configuration

Now if you’ve watched the videos, you know that Azure Diagnostics is not started automatically. You need to call DiagnosticMonitor.Start. And when you do that, you need to give it a connection to Azure Storage and optionally a DiagnosticMonitorConfiguration object. We can also call DiagnosticMonitor.GetDefaultInitialConfiguration to get a template set of Diagnostic configuration settings for us to modify with our initial settings.

This is the first thing I want to look at more closely. Why do you need a connection to Azure Storage? Why do I have to call the “Start” method?

The articles and session recordings you’ve seen that WAD will persist information to Azure Storage. So you may assume that’s the only reason it needs the Azure Storage account. However, you may not realize is that these settings are saved in a special container in Azure Storage as an XML document. We can see this by popping open a storage explorer (I’m partial to Cerebrata’s Cloud Storage Studio) and dropping into blob storage and finding the “wad-control-container” container. Within this will be a container for each deployment that has called DiagnosticMonitor.Start, and within that will be a container for each instance. And finally, there is a blob that contains the configuration file for that instance.

Its this external storage of the configuration settings that allows us to change the configuration on the fly… REMOTELY. A useful tool when providing support for your hosted service. Its also why I strongly recommend anyone deploying a hosted service to set aside an Azure Storage account specifically for diagnostic storage. By separating your application data from the diagnostic data you can help control access a bit better. Typically the folks that are doing support for your application typically don’t need access to your production data.

DiagnosticsMonitor.Start

So what happens when we call this “Start” method. I’ve done some testing and poked around a bit and I’ve found out that two things.

First, our configuration settings are persisted to azure storage (as I mentioned above). If we don’t give DiagnosticsMonitor a set of configuration settings, the default settings will be used. Secondly, the WAD process is started within the VM that hosts our service. This autonomous process is what loads our configuration settings and acts on them, collecting data from all the configured sources and persisting it to Azure storage.

Not much is currently known about this process (at least outside of MSFT). The only thing I’ve been able to verify is that it is a separate process but that it runs within the hosting VM.  However, I believe that its this processes ability to monitor for changes in the configuration settings stored in Azure storage that allows us to perform remote management. And this ability is singularly important to my next topic.

Remote Monitor Configuration

So WAD also has the DeploymentDiagnosticManager class. This class is our entry point for doing remote WAD management. I’m going to dive into the actual API another day, so for this article I just want to give you an overview and explore how this API works.

Ok, just a little ways above, I talked about how the configurations are stored in Azure Storage and that we can navigate a blob hierarchy (wad-control-container => <deploymentId> => <rolename> => <instanceid>) to get to the actual configuration settings. Remotely, we can traverse this hierarchy using the DeploymentDiagnosticManager. Getting a list of roles, their instances, and finally their current WAD configuration. At this lowest point, we’re back to an instance of the DiagnosticMonitorConfiguration class, just like we had to begin with. The XML doc that is kept in Azure storage is really just a serialized version of this object.

So now that we have this, we can modify its contents and save it back to storage. Once so updated, WAD will pick up the changes and act on them. Be it capturing a new performance counter, or performing an on-demand transfer.

Configuration Best Practices

So this calls into question… what is the best practice for WAD configuration? Here’s my two bits… ALL services should start WAD. Not doing so just neuters your ability to monitor your remote services. It and a connection string are already in the RoleEntryPoint shells for new roles for a reason. I’m sure some folks will have reason to not enable it, but even if you configure it so that it monitor’s nothing by default, you should at least have it started. Without the WAD process running, your ability to capture any details if things start going wrong is gone.

My second opinion is that your hosted service should configure WAD to include normal production level monitoring/tracking. Let it do these things automatically when the service starts up. That way you have the basics covered for every deployment you are doing. If you have multiple roles that all share the same configuration, it will be a simple matter to have a centralized method that can be used by all the roles to create a common configuration. But by having the role’s do it during their start, you can ensure that the minimum is there for day to day. Then, should you have need, use remote configuration to alter the log levels to capture any additional details you may need.

Next, put the diagnostic storage in the same datacenter as your application. This not only helps with the speed/efficiency of storage persistance, but eliminates any nasty bandwidth charges.

Lastly, if you don’t need something monitored, DON’T MONITOR IT. WAD saves stuff to storage so you’re going to have transaction charges. Its pretty easy to rack up some significant usage even for a small application if you turn diagnostics up. So be careful what you’re monitoring and make sure it has value.

So What’s Next?

My crystal ball says we’ll see several things in the future. Client tools for doing remote management both from MSFT (I’m sure we’ll see another version of their MMC snap-in) and from 3rd parties (Cerebrata’s Diagnostic Manager is already in beta). I also feel its not too far of a stretch to say that we’ll eventually see the ability to configure WAD from either application configuration or service configuration files. I’ve already had thoughts of my own about how to go about implementing this with tools already available.

As for myself, now that I’ve gotten this out, I have several more articles in mind. Next up will be a more detailed dive into the doing WAD configuration, then another regarding on-demand transfers. Finally, I’m going to touch on something I’ve done for my POC project, and show you how to take an existing file based trace listener and migrate it to a hosted service.

So until then, its back to a life of hotels, airport terminals, and take-out food. I’m learning much with this real world Windows Azure experience and I’m looking forward to sharing it with all of you. Till next time!

Follow

Get every new post delivered to your Inbox.

Join 1,076 other followers