Windows Azure Diagnostics Part 2–Options, Options, Options

It’s a hot and muggy Sunday here in Minnesota. So I’m sitting inside, writing this update while my wife and kids both get their bags ready for going back to school. Its hard to believe that summer is almost over already. Heck, I’ve barely moved my ‘68 cutlass convertible this year. But enough about my social agenda.

After 4 months I’m finally getting back to my WAD series. Sorry for the delay folks. It hasn’t been far from my mind since I did part 1 back in April. But I’m back with a post that I hope you’ll enjoy. And I’ve taken some of the time to do testing, digging past the surface and in hopes of bringing you something new.

Diagnostic Buffers

If you’ve read up on WAD at all, you’re probably read that there are several diagnostic data source that are collected by default. Something that’s not made real clear in the MSDN articles (and even in many other blogs and articles I read preparing for this), is that this information is NOT automatically persisted to Azure Storage.

So what’s happening is that these data sources are buffers that represent files stored in the local file system. The size of these buffers is governed by a property of the DiagnosticMonitorConfiguration settings, OverallQuotaInMB. This setting represents the total space on the VM that will used for the storage of all log file information. You can also set quotas for the various individual buffers the sum total of which should be no greater than the overall quota.

These buffers will continue to grow until their maximum quota is reached at which time the older entries will be aged off. Additionally, should your VM crash, you will likely lose any buffer information. So the important step is to make sure you have each of your buffers configured properly to persist the logs to Azure Storage in such a way that helps protect the information you are most interested in.

When running in the development fabric, you can actually see these buffers. Launch the development fabric UI and navigate to a role instance and right click it as seen below:

image

Poke around in there a bit and you’ll find the various file buffers I’ll be discussing later in this update.

If you’re curious about why this information isn’t automatically persisted, I’ve been told it was a conscious decision on the part of the Azure team. If all these sources were automatically persisted, the potential costs associated with Azure Storage could present an issue. So they erred on the side of caution.

Ok, with that said, its time to move onto configuring the individual data sources.

Windows Azure Diagnostic infrastructure Logs

Simply put, this data source is the chatter from the WAD processes, the role, and the Azure fabric. You can see it start up, configuration values being loaded and changed etc… This log is collected by default but like we just mentioned, but not persisted automatically. Like most data sources, configuring it is pretty straight forward. We start by grabbing the current diagnostic configuration in whatever manner suits you best (I covered a couple ways last time), giving us an instance of DiagnosticMonitorConfiguration that we can work with.

To adjust the WAD data source, we’re going work with DiagnosticInfrastructureLogs property which is of type BasicLogsBufferConfiguration. This allows us to adjust the following values:

BufferQuoteInMB – maximum size of this data source’s buffer

ScheduledTransferLogLevelFilter – this is the LogLevel threshold that is used to filter entries when entries are persisted to Azure storage.

ScheduledTransferPeriod – this TimeSpan value is the interval at which the log should be persisted to Azure Storage.

Admittedly, this isn’t a log you’re going to have call for very often, if ever. But I have to admit, when I looked it, it was kind of interesting to see more about what was going on under the covers when roles start up.

Windows Azure Logs

The next source that’s collected automatically is Azure Trace Listener messages. This data source is different from the previous because it only contains what you put into it. Since its based on trace listener, you have instrument your application to take advantage of this. Proper instrumentation of any cloud hosted application is something I consider a best practice.

Tracing is a topic so huge that considerable time can (and has) been expended to discuss it. You have switches, levels, etc… So rather then diving into that extensive topic, let me just link you to another source that does it exceedingly well.

However, I do want to touch on how get this buffer into Azure Storage. Using the Logs property DiagnosticMonitorConfiguration we again access an instance of the BasicLogsBufferConfiguration class, just like Azure Diagnostics Infrastructure logs, so the same properties are available. Set them as appropriate, save your configuration, and we’re good to go.

IIS Logs (web roles only)

The last data source that is collected by default, at least for web roles, are the IIS logs. These are a bit of an odd bird in that there’s no way to schedule a transfer or set a quota for these logs. I’m also not sure if their size counts against the overall quota. What is known is that if you do an on-demand transfer for ‘Logs’, this buffer will be copied to blob storage for you.

FREB Logs

Out next buffer, the Failed Request Event Buffering log or FREB, is closely related to the IIS Logs. It is of course the failed IIS requests. This web role only data source is configured by modifying the web.config file of your role, introducing the following section.

image

Unfortunately, my tests for how to extract these logs haven’t yet been completed as I write this. But as soon as I do, I’ll update this post with that information. But for the moment, my assumption is that once configured, an on-demand transfer will pull them in along with the IIS Logs.

Crash Dumps

Crash dumps, like the FREB logs, aren’t automatically collected or persisted. Again I believe that doing an on-demand transfer will copy them to storage, but I’m still trying to prove it. But configuring the capture of this data also requires a different step. Fortunately, it’s the easiest of all the logs in that its simply and on/off switch that doesn’t even require a reference to the current diagnostic configuration. As follows:

Microsoft.WindowsAzure.Diagnostics.CrashDumps.EnableCollection(true);

Windows Event Logs

Do I really need to say anything about these? Actually yes, namely that the security log… forget about it. Adding custom event types/categories? Not an option.  However, what we can do is gather from the other logs though a simple xpath statement as follows:

diagConfig.WindowsEventLog.DataSources.Add("System!*");

In addition to this, you can also filter the severity level.

Of course, the real challenge is formatting the xpath. Fortunately, the king of Azure evangelists, Steve Marx has a blog post that helps us out. At this point I’d probably go on to discuss how to gather these, but you know… Steve already does that. And it would be pretty presumptuous of me to think I know better then the almighty “SMARX”. Alright, enough sucking up… I see the barometer is dropping. So lets move on. Open-mouthed smile

Performance Counters

We’re almost there. Now we’re down to performance counters. A topic most of us are likely familiar with. The catch is that as developers, you likely haven’t done much more than hear someone complain about them. Performance counters belong in the world of the infrastructure monitoring types. You know. Those folks that site behind closed doors with the projector aimed at a wall with scrolling graphs and numbers? If things start to go badly, a mysterious email shows up in the inbox of a business sponsor warning that a transaction took 10ms longer then it was supposed too. And the next thing you know, you’re called into an emergency meeting to find out what’s gone wrong.

Well guess what, mysterious switches in the server are no longer responsible for controlling these values. Now we can via the WAD as follows:

image

We create a new PerformanceCounterConfiguration, specific what we’re monitoring, and set a sample rate. Finally, we add that to the diagnostic configuration’s PerformanceCounters datasources and set the TimeSpan for the scheduled transfer. Be careful when adding though, because if you add the same counter twice, you’ll get twice the data. So check to see if it already exists before adding it.

Something important to note here, my example WON’T WORK. Because as of release of Azure Guest OS 1.2 (April of 2010), we need to use the specific versions of the performance counters or we won’t necessarily get results. So before you go trying this, get the right strings for the CounterSpecifier.

Custom Error Logs

*sigh* Finally! We’re at the end. But not so fast! I’ve actually saved the best for last. Smile How many of you have applications you may be considering moving to Azure? These likely already have complex file based logging in them and you’d rather not have to re-instrument them. Maybe you’re using a worker role to deploy an Apache instance and want to make sure you capture its non-Azure logs. Perhaps its just a matter of your having an application that captures data from another source and saves it to a file and you want a simply way to save those values into Azure storage without having to write more code.

imageYeah! You have an option through WAD’s support for custom logs. They call them logs, but I don’t want you to think like that. Think of this option as your escape clause for any time there’s a file in the VM’s local file store that you want to capture and save to Azure Storage! And yes, I speak from experience here. I LOVE this option. Its my catch all. And the code snippit at the left shows how to configure a data source to capture a file. In this snippet, “AzureStorageContainerName” refers to a blob in Azure Storage that these files will be copied too. LogFilePath is of course where the file(s) I want to save are located.

Then we add it to the diagnostic configuration’s Directories data sources. So simply yet flexible! All that remains is to set a ScheduledTransferPeriod or do an on-demand transfer.

Yes, I’m done

Ok, I think that does it. This went on far longer then I had originally intended. I guess I just had more to say then I expected. My only regret here is that just when I’m getting some momentum going on this blog again.. I’m going to have to take some time away. I’ve got another Azure related project that needs my attention and is  unfortunately under NDA. Smile with tongue out

Once that is finished, I need to dive into preparing several presentations I’m giving in October concerning the Azure AppFabric. If I’m lucky, I’ll have time to share what I learn as I work on those presentations. Until then… stay thirsty my friends.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: