Azure Resource Manager Template Tips and Tricks

Its interesting the places technology interests take you. A simple idea that can send you down the rabbit hole and helps you discover new things you never even imagined were possible. Its one such journey that has led me to this particular posting.

This particular journey begins in mid February when I was fortunate enough to be able to participate in a hackfest around the open source solution Nether. As part of this event, I was tasked with helping refactoring the Azure Resource Manager deployment templates. They wanted something that had some consistency, as well as increased flexibility. The week following this, I was on-site with one of my ISV partners where we had a similar need. Both these projects helped drive my understanding and skill with ARM templates to an entirely new level. Along the way I learned a few tips/tricks that I figured I’d pass along to you.

JSON is “object” notation

The first learning is to realize that an ARM template isn’t just a bunch of strings, its defining objects that represent resources you want the Azure providers to create for you. An ARM template is a JSON (javascript object notation) file consisting (for the most part) of key/value pairs, object declarations (stuff inside curly brackets) and arrays (stuff inside square brackets). Furthermore, ARM templates provide us with various functions that can be use to create, manipulate, and insert things in the template.

Now, if you look at something like a simple Windows VM’s ip configuration, we can see this.

"ipConfigurations": [
    {
        "name": "ipconfig1",
        "properties": {
             "privateIPAllocationMethod": "Dynamic",
             "publicIPAddress": {
                  "id": "[resourceId('Microsoft.Network/publicIPAddresses',variables('publicIPAddressName'))]"
             },
             "subnet": {
                 "id": "[variables('subnetRef')]"
             }
         }
     }
]

This section is an array (square brackets) of objects (curly brackets). And this particular example is associating the VM (well, its NIC actually) with a public IP address and the subnet by setting the values for those particular properties of the IP configuration “object”. But… what about if you’re using a load balancer?

"ipConfigurations": [
    {
        "name": "ipconfig1",
        "properties": {
            "privateIPAllocationMethod": "Dynamic",
            "subnet": {
                 "id": "[variables('subnetRef')]"
            },
            "loadBalancerBackendAddressPools": [
                 {
                     "id": "[concat(variables('lbID'), '/backendAddressPools/BackendPool1')]"
                 }
            ],
            "loadBalancerInboundNatRules": [
                 {
                     "id": "[concat(variables('lbID'),'/inboundNatRules/RDP-VM', copyindex())]"
                 }
            ]
        }
    }
]

Now the same “object” has a different set of properties. Gone is the publicIPAddress setting, and added is the loadBalancerBackendAddressPools and loadBalancerInboundNatRules. Not a big deal, unless you’re trying to create a template for a VM that can be easily deployed in either configuration. But if we look at the sections I’ve selected above, we realize that our template can actually look more like this.

"ipConfigurations": [
    {
        "name": "ipconfig1",
        "properties": "[parameters('ipConfig')]"
    }
]

In this example, we still have an array with one object, but rather then defining the individual properties, we’ve instead said that the properties are contained in a parameter that was passed into the template itself. A parameter that looks as follows:

"ipConfig": {
    "value": {
        "privateIPAllocationMethod": "Dynamic",
        "privateIPAddress": "[parameters('privateIP')]",
        "subnet": {
            "id": "[parameters('subnetResourceId')]"
        },
        "publicIPAddress": {
            "id": "[resourceId('Microsoft.Network/publicIPAddresses', variables('publicIPName'))]"
        }
    }
}

We could also just as easily construct the object in a variable. Which can be really helpful if we have a common set of settings we want to share across multiple objects in the same template.

This realization also opens up a whole new world of possibilities as we can now pass objects as parameters into a template,

"ipConfig": {
    "type": "object",
    "metadata": {
        "description": "The IP configuration for the VM"
    }
}

and receive as output from a template

"outputs": {
    "subnetIDs" : {
        "type" : "object",
        "value": {
            "frontEnd" : "[variables('subnetFrontEndRef')]",
            "backEnd" : "[variables('subnetBackEndRef')]",
            "management" : "[variables('subnetManagementRef')]"
        }
    }
}

By using objects and not just simple data types (strings, integers, etc…), we make it a big easier to group values together and pass them around.

Using variables to transform

I mentioned declaring objects in the variable section for reuse. But we can also use variables to transform things. Lets say you’re creating a template for a SQL database. The database needs an edition and a requestedserviceObjectiveName (tier). You could have both values passed into template and then set the properties using those values. But perhaps you want to simplify that for the template’s end user to avoid something like a request for a “Standard” edition with a “P4” service tier. So the template declares an input parameter that looks something like the following.

"databaseSKU": {
    "type": "string",
    "defaultValue": "Basic",
    "allowedValues": [
        "Basic",
        "Standard",
        "Standard S1",
        "Standard S2",
        "Standard S3",
        "Premium P1",
        "Premium P2",
        "Premium P4",
        "Premium P6",
        "Premium P11",
        "Premium P15"
    ],
    "metadata": {
        "description": "Specifies the database pricing/performance."
    }
}

The user just declares that they want a “Basic”, or “Standard S2”… and the template transforms that into the appropriate settings. In the variables section of the template, we then create a a collection of objects that we can access using the parameter value as a key. Each object in the collection sets the values that can be used to set the properties of the database.

"databasePricingTiers" : {
    "Basic" : {
        "edition": "Basic",
        "requestedServiceObjectiveName": "Basic"
    },
    "Standard" : {
        "edition": "Standard",
        "requestedServiceObjectiveName": "S0"
    },
    …
}

Since each item in the collection is an object, we can even use it set an entire section of the database configuration just like we did the IP configuration earlier.  Something like..

“properties”: “[variables(‘databasePricingTiers’)[parameters(‘databaseSKU’)]]”

We can even take this a step further, and have more complex templates use simplified sizings such as “small”, “medium”, and “large”, which are used to control all kinds of individual settings across different resources.

Linked Templates

IMHO, there are two advantages to the techniques I just mentioned. Used properly, I feel they can make a template easier to maintain. But just as importantly, these allow for reuse. And reuse is most evident when we start talking linked templates.

A linked template is one that’s called from another template. Its accomplished by providing the URL for where the template is located. This means that the template has to be somewhere that it can be linked to. A web site, or the raw github source link works well.  But sometimes you don’t want to expose your templates publicly.

This is where the powershell script I have in my repo comes in. Among other things, it creates a storage account and uploads all the templates to it.

Get-ChildItem -File $scriptRoot/* -Exclude *params.json -filter deploy-*.json | Set-AzureStorageBlobContent `
    -Context $storageAccount.Context `
    -Container $containerName `
    -Force

This snippet has been designed to go with the naming conventions I’m using. So it will only get files that start with “deploy-“ and end in “.json”. It also ignores any files that end in “params.json”, so I can include parameter files locally for testing purposes and not have to worry about uploading them accidentally. My GitHub repo has taken this a step further and ignores any files that end in privateparams.json so I don’t accidentally check them in.

I’d like to call out the work of Stuart Leeks on this. He did the up front work as part of the Nether project I mentioned earlier. I just adapted it for my needs and added a few minor enhancements in. I’ve worked with Stuart on a few things over and years and its always been a pleasure and great learning experience. So I really appreciate what I learned from him and for him as a result of some of the work on the Nether project. Back to the task at hand.

With the files uploaded, we then have to link to them. This is why some of my templates have you pass in a templateBaseURL and templateSaaSToken.  I’ve parameterized these values allow me to construct the full URI for where the files will be located. Thus I could pass in the following for templateBaseURL if I just wanted to access them from my GitHub repository:

https://raw.githubusercontent.com/brentstineman/PersonalStuff/master/ARM%20Templates/LinkedTemplateExample/

the templateSaaSToken is there in case you want to use a shared access signature for a blob container to access the files.

Any ARM template can pass values out. But when combined with linked templates, we can now take those outputs and pass them into subsequent templates.  Something like…

"sqlServerFQDN": { "value": "[<strong>reference('SQLDatabaseTemplate').outputs.</strong>databaseServerFQDN.value]" }

In this case “reference” says we are referencing the runtime values of an object in the current template (in this case of a linked template). From there we want its outputs and specifically the one named databaseServerFQDN, and finally its value property. In the template that outputs these values they are declared like…

"outputs": {
    "databaseServerFQDN" : {
        "type" : "string",
        "value": "[reference(variables('sqlDBServerName')).fullyQualifiedDomainName]"
    },
    "databaseName" : {
        "type" : "string",
        "value": "[parameters('databaseName')]"
    }
}

Outputs is an object, that contains a collection of other objects. Note the property outputs.databaseServerFQDN.value. We could also get databaseServerFQDN.type if we wanted. Or access the database name properties.

What’s also important here is the reference function. You may have seen this used in other places and thought it was interchangeable with the resourceID function. But its when you work with linked templates that it really shines. The reference function is really telling the resource provider to wait until the item I’m getting a reference to has completed, then give me access to its run time properties. This means that you don’t even need a “dependson” for the other template as the resource function will wait for that template to complete already. But me, I like putting it in anyways. Just to be safe.

The other big item here is that when we call a linked template, we have to give it a name. And here’s why… Each template is essentially run independently by Azure’s resource manager. So if you have a master template, that’s using 4 linked templates and then you check the resource group’s deployment history, you’d actually see 5 deployments.

multiple deployments resulting from a single master template with multiple linked templates.

Now the reason these deployment names are important is because the resource manager will track them and won’t allow two deployments with the same name to run at the same time. This isn’t a big deal most of the time. But earlier in this post, I described creating a reusable virtual machine template. That template is used by a parent template to create a resource. And if I have 2-3 of those parents running, I need to make sure that names don’t collide.

Now the handy way to avoid this is to reference a run-time value with the ARM template… deployment(). This exposes properties about the deployment such as the name. So when calling a linked template, we can actually craft a unique name by doing something like…

concat(deployment().name, ‘-vm’)

This allows each deployment template to take the parent’s name and add its own unique suffix on. Thus (hopefully) helping avoid having to deal with non-unique nested names. If you look at the image to the right, you’ll see deployments like jumpboxTemplate and jumpboxTemplate-vm. The later deployment is a reusable template that is linked from the former. And I’m using the value of deployment to set name of the vm template deployment. The same is also present in loadbalancedvmTemplate-lbvms000 and 0001. In that case, this is two VMs being deployed using the same linked template, but in this case being done multiple times as part of a copy loop in the parent template.

Other Misc Learnings

As if all this wasn’t enough, there were a couple other tips I wanted to pass along.

When I was working with Stuart on the Nether project, we wanted a template that would add a consumer group to an existing Service Bus event hub. Unfortunately, all the Service Bus templates we could find only showed the creation of the consumer group as part of creating the event hub via an approach called nested resources. I was able to quickly figure out how to create the consumer itself, but the challenge was how to then reference it.

When working with nested resources it is important to understand the paths present in both the resource type and its name. In the case of our consumer group, we were quickly able to determine that the proper resource type would be Microsoft.EventHub/Namespaces/EventHubs/ConsumerGroups.

You might assume that now that you have the type path, you’d just specify the resource name as something like “myconsumer”. But with nested resources, its gets more complicated The above type represents 3 nested tiers. As such, the name needs to follow suit and have the same number of tiers. So I had to actually name to something more like //.

Stuart pointed me to a tip he learned on another project. Namely that these two values are combined like the teeth on a zipper to create the path to the resource:

Microsoft.EventHub/Namespaces/NamespaceName/EventHubs/HubName/ConsumerGroups/GroupName

Once I realized this, a light bulb went off. This full name actually reflects the same type of value you usually get back from a call to the resourceId function. This function accepts two parameters, a type, and a name, and essentially zips them together while also adding on the leading value based on the current resource group (subscription and the like). You can even see this full path when you look at the properties for an existing object in the Azure portal.

Now the second tip is about the provider API versions. I often asked why I put these values into a variable and what should be the right value. Well, I put them in a variable because it means there’s less I have to accidentally mess up when creating a template. It also means that if/when I want to update the version of an API I’m using, I only have to change it once.

But as for the big question about how do we know what versions of the API exist… I got that tip from Michael Collier (former Azure MVP and currently one of my colleagues) who in turn got it from another old friend, Neil MacKenzie. They pointed out that you can get these pretty easily via Powershell.

(Get-AzureRmResourceProvider -ProviderNamespace Microsoft.Compute).ResourceTypes | where {$_.ResourceTypeName -eq 'virtualMachines'} | select -ExpandProperty ApiVersions

This powershell command will spit out the available API versions for Microsoft.Compute/VirtualMachines.

New versions are shipped all the time and its great to know I can be aware of them without having wait for someone to publish a sample template with those values in them.

My last item comes from another colleague, Greg Oliver. Greg has found what when you’re working on templates, you really get slowed down waiting for each deployment to finish, then get deleted, then start the deployment over again. So he’s taken to adding an ‘index’ parameter to his templates. Then, when he runs them, he simply increments the value (index++). Then, while the new deployment is running, you can go ahead and start deleting the old one. There can be several “old” iterations in the process of deleting while you continue to work on your template. Something like this could also be used  as part of my suffix approach, but Greg has gone the extra mile to make the iteration its own parameter. Awesome time saving tip!

All in all, I think these are some great, if little known, ARM tips.

Deployment Complete

I wish I could say that these tips and tricks will make building ARM templates easier. Unfortunately, they won’t. Building templates requires lots of hands on practice, patience, and time. But I hope the tips I’ve discussed here might help you craft templates that are easier to maintain and reuse.

To help illustrate all these tricks (and a few less impressive ones), I’ve created a series of linked templates and put them in a single folder on GitHub. These include a PowerShell script to run the deployment as well as sample parameter files. I hope to continue to tweak these as I learn more including adding into the PowerShell script some options to help prevent issues with dns name collisions. Hopefully they’ll work without any issues, but if you run into something, please drop me a line and let me know.

Until next time!

Advertisement

Azure Logic Apps, Functions, and Service Bus

Here we are yet again. Me writing something on this blog if for no other reason then to document something I learned. There’s no real narrative behind this one other than I built another POC for a partner and in the process found some things I wanted to pull together.

The story here is about digging beyond the Logic App designer and interacting with Service Bus queues, topics, and event hubs. Access and manipulating the message properties as we start chaining Logic App workflows together with functions and custom code.

Since we’re going beyond what the designer currently supports, we’ll look exclusively at the “code view” for everything.

Sending to a Queue

The first step was to create a workflow that would accept an HTTP request and use that to create a message in an Azure Service Bus queue. In doing this ‘simple’ task I learned two things, how to compose an object and how to set a custom message property.

The compose action allows you to construct a JSON object from various inputs. I wanted to be able to send a message to a queue for further processing as well as to event hub for logging. So being able to compose the object once and reuse it was VERY handy.

 
"ComposeJobMsg": {
   "inputs": {
       "JobID": "@{body('SaveJobtoDatabase')?['OutputParameters']['JobID']}",
       "customer": "@{triggerBody()?['customer']}",
       "job_payload": "@triggerBody()?['job_payload']",
       "job_type": "@{triggerBody()?['job_type']}"
   },
   "runAfter": {
       "SaveJobtoDatabase": [
         "Succeeded"
       ]
   },
   "type": "Compose"
}  

This action takes input from the workflow trigger and the result of a previous stored procedure, SaveJobToDatabase, and constructs a simple JSON object with four properties (with horribly inconsistent naming conventions I know).

With the message object created, I can now send it to a queue, specifying the output of the compose operation as the ContentData for my queue message. The SendToQueue action’s body looks like this:

"body": {
   "ContentData": "@{encodeBase64(string(outputs('ComposeJobMsg')))}",
   "ContentType": "JSON",
   "Properties": {
       "job_type": "@{triggerBody()?['job_type']}"
   }
}

There are a few things going on here I want to point out. We’re taking the output of the compose action, outputs(‘ComposeJobMsg’), and converting it from a JSON object to a string. We then base64 encode that string to ensure it will survive transport through the queue. We’re also starting the ContentData value with ‘@{‘ to designate that we’re using a parameter value and we want to treat it as a string. Using ‘{‘ to inform the Logic App that its a string is unnecessary, but sometimes its nice to err on the side of caution. You can learn more about the use of expressions like ‘@‘ and ‘{‘ in the Workflow Definition Language documentation.

Next up, we make sure to set the ContentType as “JSON”.  And finally I add a custom property, “job_type” and set its value to parameter that was on the workflow trigger (again treating that value as a string).

Queue as a Trigger

This is where things started to get interesting. I created a second workflow that is triggered “when a message is received’ and set it to run at 30 second intervals. But this created a problem with trying to update the workflow. Currently (this is something that’s being worked on), the Logic Apps connector takes advantage of Service Bus Queues’ long polling capabilities. Long polling is great because it helps reduce the latency between when a message arrives and it can be processed. So even though the workflow was set to check every 30 seconds… when a message want sent to the queue, it triggered the workflow almost immediately.

The reason for this is that the workflow is not actually running at 30 second intervals, but instead starts polling the queue and waits for that to time out, then it’ll wait 30 seconds and poll again. Where this creates an issue is that if you are in active development, you’re likely going to be changing the workflow every few minutes. Tweak this… run a test… adjust that, run a test. When the workflow start’s polling, its going to wait about 10 minutes for that operation to time out. So even though the “save” works fine, any changes you’re making won’t take affect until the next time the workflow is triggered (after the long-polling times out).

The recommendation I was given, that worked really great (thanks Jeff), is to change the  interval to something like once a day. Then via the portal, we can use the “run trigger” feature to kick off a one-time run of the workflow. So what I would do is I’d modify the workflow, submit a test message to the queue, then manually trigger it. Admittedly, its not as smooth as I’d like, but it gets the job done. The product team seems aware of this so I’m hopeful this workaround won’t be needed for the long term. Once development is complete, the “production” version of the workflow can use a normal timing setting, as long as we’re aware of and OK with the long polling behavior.

Accessing Queue Message Properties

I wanted to be able to consume events both via a Logic App workflow, as well as from some C# code. The workflow portion would look at the job_type property I set above, and use that in a condition to control routing of the message to another queue. If you’re using the drag/drop workflow designer, its pretty easy to get at the queue message ContentData. If you click on it in the designer, the code behind will insert something like this:

triggerBody()['ContentData']

Something was pointed out to me (thanks again Jeff!) and the lighbulb went off.  Note that ContentData is the exact same property we set when we sent the message to the queue up above. So if we wanted to access the job_type value we set, we simply access the Properties collection like so:

triggerBody()?['Properties']['job_type']

You can’t currently do this via the designer, so you’ll need to flip over to code view if you want to access the individual properties within the Properties collection.

But what if we want at the actual payload of ContentData? The JSON object is there, we just have to reverse what we did when we put it into the message. We’ll use a couple Workflow Definition Language functions undo the base64 encoding and get the string content. That string is JSON, so we use the json method to convert it to an object. Once its an object again, we can then access any properties within it, such as the JobID we set when we composed the original object.

@json(base64toString(triggerBody()['ContentData']))['JobID']

In C#, if we want to get at the contents of our object, we do a similar process to get the body of the BrokeredMessage object, and transform that JSON payload into an object.

// get the message body
var body = message.GetBody<Stream>();
string jsonJob = new StreamReader(body, true).ReadToEnd();

// convert message body to object
dynamic job = JsonConvert.DeserializeObject(jsonJob);

What about Event Hub?

There isn’t a connector for Event Hub (at least as of the authoring of this post). So I created an Azure function (code on github) to do this for me. It accepted a few parameters and put it into the event hub so I could later process them via stream analytics. Calling it from the workflow was then pretty straight forward.

"LogToEventHub": {
   "inputs": {
       "body": {
           "JobID": "@{json(base64toString(triggerBody()['ContentData']))['JobID']}",
           "customer": "@{json(base64toString(triggerBody()['ContentData']))['customer']}",
           "job_payload": "@string(outputs('ComposeLogMsg'))",
           "status": "routing"
       },
       "function": {
           "id": "<insert your function reference here>"
       }
   },
   "runAfter": {
       "ComposeLogMsg": [
           "Succeeded"
       ]
   },
   "type": "Function"
}

Just like when we were sending content to the queue, make sure you know what format the objects should go into event hub should be in. My function is called via HTTP, so parameters of the body need to be strings. I opted to use compose to create the payload, then convert that to a string to be output to the event hub. Make sure you know what you’re passing and how it needs to be done as forgetting the proper ‘{‘ or ‘@’ can cause a real headache.

One final Gotcha, WebJobs

Now this one is a truly personal note. When you add Azure Function to a resource group, its currently a valid target for a VSTS web publish. In fact, if you look at the Function in the portal, you can click an option to view the hosting Web App. Once in that web app, you could see any web jobs you may or may not have accidentally deployed to the wrong location (yeah, it happened). I’ve been told that this will eventually be disabled. But in the interim, I wanted to share this little tidbit so nobody else wastes a late night hour trying to figure out why event messages are being consumed when the web jobs that were processing it all are all stopped (or so you thought).

Lesson learned. 🙂

Until next time!

Azure File Depot – The BlobWatcher

Recently I was taking a look at WebJobs, the new feature added to Windows Azure Web Sites that lets you run applications continuously, at intervals, or triggered by certain events (such as a new object in Azure storage). One of the questions that popped into my head was, how does the “binding to blobs” work? If I could find an answer to that, perhaps I could add that as a feature to the File Depot project.

After poking around a bit, I found that there’s not really anything magical to what WebJobs was doing. They are leveraging the information that could be available to you or me. In the case of Azure Storage blobs, when you create a blob binding for the job, WebJobs is reaching into the storage account in question and turning on the write logs for blobs in that account and giving those logs a 7 day retention period. It’s these logs that are scanned/monitored by WebJobs so that when new blobs arrive, it can trigger your job based on the bindings you’ve set up.

FileDepotBlobWatcher-EnableLogs

 

So how are the logs scanned? Fortunately, Mike Stall has already published a great little write-up. In a nutshell, when a job is started, it does a full scan of the logs for all past data, then does incremental scans for new files. So armed with this information, I set out to create my own implementation of a blob detector, the Azure File Depot Blob Watcher!

Azure Storage Logs

Armed with the info from Mike’s post, the first step is to dig into the storage logs and figure out how they work. The storage team has a great post on using the logs and I recommend you take the time to give it a complete read. But here are the highlights…

When you enable logging, a new “$logs” container will be created. The blobs placed into this container are read only, you can read and delete them, but not alter them or their properties. The logs buffered up internally and periodically ‘flushed’ into this container as individual blobs.

In Mike’s post, he mentions that there is latency (5-10 minutes) detecting blobs, and this is because Azure storage buffers the logs for up to 5 minutes or until the buffer hits 4MB in size. At that time, they are written out, and we are able to access them. Thus the latency.

Log files are only written when there are operations we’ve indicated we want to log. But the naming convention always follows the pattern: <service>/YYYY/MM/DD/HHmm/<sequence>.log

So we’ve already identified a couple of requirements for our solution…

  • Don’t scan for new log files more than every 5 minutes
  • Get a list of logs from the $logs container that start with “blob/”
  • Don’t reprocess log files we’ve already examined

Once we have the files, we then have to parse them. I wrote a post last fall that describes using Excel parse the semi-colon delimited log entries. We’re going to need to do that in code, but fortunately it’s not that difficult. The logs are semi-colon delimited and use double-quotes to denote strings that include semi-colons that we won’t want to split/explode on. You could do this using a regular expression, but my own regex skills are so rusty that I opted to just parse the file via a bit of C# code.

int endDelim = 0;
int currentPos = 0;
while(currentPos <= logentry.Length-1)
{
 
    // if a quoted string... 
    if (logentry.Substring(currentPos,1).StartsWith("\""))
    {
        currentPos++; // skip opening quote
        endDelim = logentry.IndexOf("\";", currentPos);
        if (endDelim == -1) // if no delim, jump to end of string
            endDelim = logentry.Length - 1;
        properties.Add(logentry.Substring(currentPos, endDelim - currentPos));
        // skip ending quote and semicolon
        endDelim = endDelim + 2;
    }
    else // not quoted string
    {
        endDelim = logentry.IndexOf(';', currentPos);
        if (endDelim == -1) // if no delim, jump to end of string
            endDelim = logentry.Length - 1;
        properties.Add(logentry.Substring(currentPos, endDelim - currentPos));
        endDelim++;
    }
 
    currentPos = endDelim; // advance position
}

Not as elegant as a regex I fully admit. But with my unpracticed skills (it’s been 10+ years since I had my fingers deep in that), it would have taken me 2-3 times longer to get that working then just brute forcing it.

The final step is knowing what we want out of the logs. There’s two key values from the log that I’m after. The OperationType, and the RequestURI. The request URI is self-explanatory enough, that’s the URI of the blob that we’re trying to detect. The OperationType is the action that was performed against Azure storage. There’s only two values we’re going to monitor for, PutBlob and PutBlockList.

Now here is a bit of an issue. A small enough blob can be created or UPDATED, using just the PutBlob call. So if we detect that operation. So there is a chance that we may process the same file multiple times. We could resolve this by using a “receipt” pattern as is called out in the comments section of Mike’s post, or we could keep a list of processed blobs (perhaps in table storage). The approach really depends on your needs, so I’m going to leave it out of this implementation for now.

NOTE: It should also be noted, that since we’re only looking for PutBlob or PutBlockList operations, we’re not doing to be able to detect page blobs and will catch (via PutBlob) updates to smaller page blobs. Fixing this is definitely on my list, but will need to wait for another day.

The solution

Now that we know how to get at the log information, it’s time to start creating a solution. The first decision I made, was to separate detecting new logs files from their parsing. So we’ll have a LogScanner, and a LogParser. I also wanted to make parsing the log entries super easy, so I decided to create a LogEntry class that I can feed the string that is a log entry into and exposes the values as properties.

But I still have two issues… It’s likely, especially under high volumes, that parsing the logs will take much longer then detecting them. So under most circumstances, I can get by with a single LogScanner. So I’m going to implement a “traffic cop” or “gatekeeper” pattern so that only one LogScanner can run at a time.

My second issue is how to ensure I only alert to a new log file once. I’ll be running scans every 5 minuts or so, and listing blobs doesn’t really have an option for “give me only the new ones”. Fortunately, since I’m already using a gatekeeper, I can have it store the name of the last log file I processed for me. Making it pretty simple to keep track.

The final step of course is having both the LogScanner and LogParser use delegates so whomever is implementing them can create a method to handle when a log file is detected, or a new blob is found. Thus allowing them to control what actions are taken.

I’ll wrap the whole thing up in a reference implementation via a console app. So the final solution looks like this:

FileDepotBlobWatcher-SolutionLayout

The BlobLogEntry class expose the individual fields of the blob log entry (see the parsing code above or Codeplex for all this really does), the Gatekeeper to make sure only one LogScanner is trying to detect new log entries, and the LogScanner to parse a log once it’s been found.

Gatekeeper

I’ve blogged about the gatekeeper pattern before. I’ve known this as a “traffic cop” since long before folks started publishing design patterns on the internet, so to me that’s what it will always be. Regardless of the name, the purpose is to make sure only one process can do something at a time. We’re going to accomplish this by using a lease on an Azure storage blob as our control switch.

The Gatekeeper object needs to be able to start, stop, and renew the underlying blob lease. And because I’m also going to use it to store the last log file processed, I’m going to add SetText and GetText methods to write and retrieve strings to the underlying blob.

This class is fairly simple, so I’m not going detail code you can look at yourself on codeplex. So instead I’ll just call out a few highlights…

My gatekeeper constructor accepts a CloudBlockBlob for the blob on which we’ll place a lease. This gives the calling process full control over where that blob lives. It then creates a lease on the blob good for up to 60 seconds (the maximum allowed value), and attempts to renew that lease every 45 seconds. This gives me 15 seconds in the case of transient failures to successfully complete getting the lease before I run the risk of another scanner taking over.

In a couple places, we trap for a Storage Exception that has a 409 error code. This indicates that our attempt to get the lease has failed because somebody else already has a lease on the blob in question (aka another scanner has taken over).

Implementing the Gatekeeper is simply a matter of creating the CloudBlockBlob object, handing it off to the class constructor, and then calling start when we want to gain control. We can check periodically to see if we have the lease, optionally getting it if we don’t.

The final bit is to make sure the starting and stopping of a timer to renew the lease is put into the appropriate spots.

Take a look at the gatekeeper code and if you have you have questions, please feel free to post them in the comments.

LogParser

Also pretty straight forward is the parser. It takes the CloudBlockBlob object (which would be a log file) as a parameter for its constructor, then we the ParseFile method to inspect the log file.

public void ParseFile(FoundBlobDelegate callback)
{
    using (Stream stream = logFile.OpenRead())
    {
        // read the log file
        using (StreamReader reader = new StreamReader(stream))
        {
            string logEntry;
            while ((logEntry = reader.ReadLine()) != null)
            {
                // parse the log entry
                BlobLogEntry blobLog = new BlobLogEntry(logEntry);
 
                //NOTE: PutBlockList is the final write for a large block blob
                // PutBlob can also be used for small enough blobs, but also presents an overwrite of an existing one
                if (blobLog.OperationType.Equals("PutBlob") || blobLog.OperationType.Equals("PutBlockList"))
                    callback(blobLog.RequestUrl);
            }
        }
    }
}

This method opens a stream on the blob, and then reads through it line by line. Each line is parsed using the BlobLogEntry object and if the OperationType is “PutBlob” or “PutBlockList”.

Now I could have put this method into the LogScanner, but as I pointed out earlier, it’s highly likely it will take longer to parse the logs then to detect them. So in a real word implementation, the LogScanner may simply notify a pool of parsers, possibly via a queue. So separating the implementations made a certain amount of sense. Especially when I look ahead to having to deal with larger page blobs.

LogScanner

This is where most of my time on the project was spent. It has a few parallels with the Scanner in that we have a constructor that accepts some parameters (a CloudBlobClient and an instance of the Gatekeeper class), as well as Start and Stop methods.

Internally, the LogScanner object will be using the CloudBlobClient to create a CloudBlobContainer object that’s looking at the “$logs” container. We then use the gatekeeper to make sure that if I have multiple processes running log scans, only one of them can actually do the processing. Finally, it uses an internal timer object to make sure we’re scanning for new log files at a regular interval (which defaults to 5 minutes).

When we call the Start method, the LogScanner takes a delegate that the calling process can use to determine what action should be taken when a new log file is detected (such as using the LogParser to digest it). It then starts the gatekeeper process, and attempts to do an initial scan for logs (like Mike’s post said WebJobs does). Once that scan is complete, it will start the timer so we can do additional scans at the specified interval.

The stop just reverses these actions, stopping the scan timer and the gatekeeper. So the real meat of this class, is what happens when we scan for log files. So let’s walk through this a bit before I show you the code.

The first thing we need to be able to do is get a list of blobs in the $logs container. We have two scenarios we have to support with this, get everything (for an initial scan), and get just new stuff for incremental scans. The challenge is that Azure storage, only supports getting a list of blobs based on a filter on the name, not on any metadata or properties. The initial scan is fairly simple, we set our filter criteria to “blob/”, which will get all blob service logs in the container.

So let’s say we’ve already done a scan and we stored the last log file we found in our Gatekeeper, so I know where I left off. But how do I pick back up again? I could just filter for all logs and iterate through until we get back to the where we left off. Perfectly ok, but doesn’t strike me as particularly efficient. So if we think back to how the logs are named, I can parse the last log I found to go back to the year, month, day, and hour for which that log was produced. So when I pick back up on scanning, I scan for that hour and all the hours in between UTCNow and then.

Note: You could alternatively scan for day, month, or year. Depending on the frequency of your scans and the production of logs, these options could be more efficient then my hourly approach.

We start by extracting the datetime values from the last log file name (uri in the sample blow) we read from our gatekeeper…

int startPOS = uri.IndexOf("blob/") + 5;
int endPOS = uri.LastIndexOf('/');
 
return uri.Substring(startPOS, endPOS - startPOS);

We know all the URIs will have a “blob/” at the beginning since that’s the service we’re monitoring. Furthermore, the file names all end in a six digit sequence number with a ‘.log’ suffix. So if I find the position of the last ‘/’ character in the string, I can now extract the YYYY/MM/DD/HHmm portions from the URI. We can make all these assumptions because the log naming conventions are published and therefore somewhat immutable.

Note: Currently, the mm portion of log URI will always be zero per the published naming convention. This is a key assumption for our processing.

Next, we need to convert this substring to a datetime type

DateTime tmpDT;
// convert prefix to datetime
DateTime.TryParseExact(fileprefix, "yyyy/MM/dd/HHmm"null,
                       DateTimeStyles.None, out tmpDT);
return tmpDT;

This takes our URI substring, and converts it into a DateTime, leaving us to simply calculate the delta between the current UTC datetime and this value to know how many hour periods we need to filter for.

ScanPasses = ((DateTime.UtcNow - PrefixToDateTime(startingPrefix)).TotalHours + 1);

So now we know that we will do one filtered list for each hour from the last hour we found a file to the current datetime. Ideally, this could be optimized so that the gatekeeper stores the last scanned period so we don’t have to scan past hours for which there was no traffic. But my assumption is that if we’re scanning the logs, we expect traffic at fairly regular intervals. So repetitive scans of empty “hours” shouldn’t happen often. And when you add up the cost of those scans versus programmer time to optimizing things, I could scan a few eons of empty logs before the cost would match the programmer cost to fine tune this.

Now that we’re armed with that we need to do the scans of the logs, let’s look at some of the code…

// get last log file value from gatekeeper
string lastLog = gatekeeper.GetText();
 
// calculate starting prefix
if (!lastLog.Equals("blob/")) // we had a "last log" from previous runs
{
    startingPrefix = getPrefixFromURI(lastLog); // use that prefix as our starting point
    ScanPasses = ((DateTime.UtcNow - PrefixToDateTime(startingPrefix)).TotalHours + 1); // 
    pastPreviousLog = false// don't start raising "found log" events until we're past the last processed log
}

We start by getting the last log file we found from the gatekeeper. If that value is not “blob/”, then we’re doing a subsequent scan. We’ll get the data/time prefix from the log URI, and use that to calculate the number of scans we need to do. We also set a value that tells us we haven’t yet passed our previously found log file. We need this last part because subsequent scans will always resume in the same hour of the last log file we processed. And it’s possible that new log files have arrived.

Next we will enter into a loop that will execute once or each scan pass we calculated we need. If it’s a first time scan, we’ll only do one pass because our blob list filter will be all available logs.

// List the blobs using the prefix
IEnumerable<IListBlobItem> blobs = 
    logContainer.ListBlobs(string.Format("blob/{0}", startingPrefix), trueBlobListingDetails.Metadata, null);
 
// interate the list of log files
foreach (IListBlobItem item in blobs)
{
    CloudBlockBlob log = item as CloudBlockBlob;
    string LogURI = log.Uri.ToString();
    if (log != null)
    {
        if (pastPreviousLog)
        {
            // call Delegate to act on log file
            this.callback(log);
 
            // update gatekeeper blob 
            gatekeeper.SetText(LogURI);
            lastLog = LogURI;
        }
        if (lastLog.Equals(LogURI, StringComparison.OrdinalIgnoreCase))
            pastPreviousLog = true;
    }
}

For each log file, we look at the URI. If we’re past the last log file (as recorded by the gatekeeper, we will call the callback method handed into our object, alerting a calling process that a new log file has been found. We then ask the gatekeeper to save that URI as our new starting point for the next scan. Lastly, in case we had a previous log file recorded, we need to check and see if we’re at it, so we can process the additional logs.

And as we exit the log listing loop, we increment our filter criteria (so we can scan the next available hour), and decrement the scanpasses value so we know how many scans remain.

On either side of this, we also enable and disable the timer object. The only purpose of this is that on the off chance it takes us more than 5 minutes to scan the logs, we don’t double up on the scan operations.

Running the Sample

Hopefully you’ll find this solution pretty straightforward. With the classes in place, all that remains is to implement them, in this case as sample console application.

LogScanner and LogParser need some delegate methods. For LogScanner, we’ll use this …

public static void LogFound(CloudBlockBlob LogBlob)
{
    Console.WriteLine(string.Format("Parsing Log File: {0}", LogBlob.Uri));
 
    // Parse the Log
    LogParser myParser = new LogParser(LogBlob);
    //HINT: we could drop the log file into a queue and process asyncronously
    myParser.ParseFile(FoundBlob);
 
    //Option: delete the log once its processed
}

When the LogScanner finds a new log file, it will call this delegate. For my sample I’ve chosen to write the event to the console output, and immediately parse the file via the LogParser. Just keep in mind that the current implementation is a synchronous blocking call, so in a real production situation, you likely won’t want to do this. Instead, writing the event to a queue, where subscribers can then take and process the event.

We follow this up with a delegate for LogParser that will be called as we parse the log files that were found, and locate what we believe to be a blob.

public static void FoundBlob(string newBlobUri)
{
    // filter however you like, by container, file name, etc... 
    if (!newBlobUri.Contains("gatekeeper")) // ignore gatekeeper updates
        Console.WriteLine(string.Format("Found new blob: {0}", newBlobUri));
}

You’ll notice that in this method, I’m doing a wee bit of filtering based on the BlobURI. In a real implementation, you may only want to watch a handful of containers. In my sample implementation, the blob object that’s at the heart of the gatekeeper object will have the name “gatekeeper”, so I went for the simple approach to make sure I ignore any operations related to it. I thought about putting filter criteria (such as container) as an attribute of the LogParser, but ultimately settled on this approach as being far more flexible.

The final step was to go into the console app and set things in motion…

// set up our private variables.
string storageAccountString = Properties.Settings.Default.AzureStorageConnection;
 
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageAccountString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

We start by retrieving the Azure Storage Account’s connection string, and using that string to get a CloudStorageAccount object, with which we create a CloudBlobClient.

// make sure we have a gatekeerp in place
CloudBlobContainer gatekeeperContainer = blobClient.GetContainerReference("gatekeeper");
gatekeeperContainer.CreateIfNotExists(); // want to make sure the container is there... 
CloudBlockBlob gatekeeperBlob = gatekeeperContainer.GetBlockBlobReference("gatekeeper");
Gatekeeper mygatekeeper = new Gatekeeper(gatekeeperBlob, "blob/");

Using the CloudBlobClient, we create a container where our gatekeeper blob will go, then get a CloudBlockBlob that will be the gatekeeper blob (the blob we’ll put leases on). Finally, using that blob, we create the gatekeeper object which also initializes the contents.

Next, we initialize the LogScanner and tell it to start processes, calling the delegate we already defined.

LogScanner myScanner = new LogScanner(blobClient, mygatekeeper);
myScanner.ScanInterval = new TimeSpan(0, 5, 0);
myScanner.Start(LogFound);

After that, all that remains is to give myself a simple loop to run in while the LogScanner and LogParser do their work. I’ve put in one that will run for up to an hour. After the loop exits, it will stop the scanner, which will release the lease on the blob. If you stop the console app forcibly, just be aware that the gatekeeper lease will persist for up to 1 minute. So your initial scan upon launching the program likely won’t have any results unless you wait at least 1 minute before restarting.

With the sample program complete, all that remains is to set the Azure Storage Account Connection string in the program’s application settings (using a storage account that has the Blob write logging enabled), then compile and run the solution. As it runs, you can upload blobs into it (perhaps using the Publishing Console project also located in the FileDepot Codeplex project), and within 5-10 minutes, you should start seeing files show up in the BlobWatcher console app.

Magic, no longer

So with this, I hope I’ve shed a bit of light on the Azure Storage logs and how they can be used. As I look back on creating this sample, I find that I almost spent more time digging into how storage logs work, then was spent actually working on the code. The final product could use some fine tuning, as well as enhancement for page blob scenarios. But as a starting point, I’m fairly happy with it.

Admittedly, if all you really want to do is monitor for new blobs and act on them, your best approach is to use Azure WebJobs. That team has far more to time and resources than I do. And as such, they can give you a solution that will be far more robust this then my simple code sample. But replacing WebJobs was never my objective, I just wanted to help highlight how Azure storage logging can be used to do more than just track errors and capacity utilization.

Please do check out BlobWatcher at the Azure File Depot on Codeplex. And more importantly, leave feedback either here or there. I want to make sure the project is fulfilling some common needs and to that end, one can never have enough feedback.

Until next time!

 

Azure Files – Share Management

Note: If you are going to be using Azure Files from the same VM regularly, be sure to follow the instructions in this blog post to ensure that the connection is persistent.

We recently announced a new preview feature, Azure Files. This feature allows you to mount a SMB based file share into your Azure hosted PaaS and IaaS VMs. As this feature is in preview, the various related bits are also in preview state. And as with many previews, there’s some risk when you mix early release bits with current production bits that can cause difficulties. So to that end

So with this in mind, I decided that I’d do something I haven’t done in some time and that’s write some code that goes directly against the Storage API, to create, delete, and list shares created in Azure Files. And do this in a way that takes no dependencies on any “preview” bits.

We’ll start with the Create Share API. For this we’ll need our account name, one of the keys, and the name of the share we’re working on. We begin by creating the basic REST request. As described at the link above, we need to create a “PUT” verb, against the ‘2014-02-14’ version of the API, I’m also going to set the content type to ‘application/xml’ and give it a content length of 0. We’ll do this with a HttpWebRequest object as follows:

var request = (HttpWebRequest)HttpWebRequest.Create(string.Format("https://{0}.file.core.windows.net/{1}?restype=share", 
    creds.AccountName, shareName));
request.Method = "PUT";
request.Headers["x-ms-version"] = "2014-02-14";
request.ContentType = "application/xml";
request.ContentLength = 0;

The variables, used in this is the account name (creds.AccountName), and the name of the share we want to create (shareName);

Once we have the request, we then have to sign it. Now you could do this manually, building the string and doing the MACSHA hashing… But since I can take a dependency on the existing Azure Storage SDK (v4.0.3), we can just use the SharedKeyLiteAuthenticationHandler class to sign the request for us. Big thanks to my colleague Kevin Williamson, for pointing me at this critical piece which had changed since I last worked with the Storage REST API.

SharedKeyLiteAuthenticationHandler auth = new SharedKeyLiteAuthenticationHandler(SharedKeyLiteCanonicalizer.Instance, creds, creds.AccountName);
auth.SignRequest(request, null);

By leveraging this aspect of the Azure SDK, we save ourselves the hassle of having to manually generate the string to be signed (canonizing), and then actually doing the signature. If you’d like to learn more on Azure Storage authentication, I’d recommending checking out the MSDN article on the subject.

With the signed request created, we only have to execute the request, and trap for any errors.

// sent the request
HttpWebResponse response = null;
try
{
    response = (HttpWebResponse)request.GetResponse();
    Console.WriteLine("Share successfully created!");
}
catch (WebException ex)
{
    Console.WriteLine(string.Format("Create failed, error message is: {0}", ex.Message));
}

And that’s all there really is too it. If the request fails, it will throw a WebException, and can look at failure for additional details. Now if you want to learn more about the Azure Files REST API, you can find a slew of great information already out on MSDN. This includes one extremely helpful page related to the naming and references

Now what I did is take this and add in the Delete, and List commands. And roll them up into a simple little console app. So with this app, you can now run a command like…

AzureFileShareHelper -create -acct:<accountname> -key:<accountkey> -share:myshare

This will create the share for you and even return the URL use to mount the share into an Azure VM. Just change the verb to -delete or -list if you want to leverage another operation. 🙂

 

Meet Windows Azure–Christmas in June

November 2010 marked the release/launch of Windows Azure. In November of 2011, we received the 1.3 SDK and our first major updates to the service since its launch a year before. Over the next 18 months, there were numerous updates that added features. But we really didn’t have a fundamental shift in the product. All that changed on June 7th 2012.

The BIG NEWS

June 7th marked the Meet Windows Azure Virtual conference. This three hour event was broadcast on the internet from San Francisco in front of a small, live audience. And in its first hour took thecovers off of several HUGE new features:

  • Persistent Virtual Machines – IaaS style hosting of Windows or Linux based virtual machines
  • Windows Azure Web Sites – high density hosting
  • Dedicated Cache – a new distributed, in-memory dedicated cache feature
  • Windows Azure Virtual Network – create trust relationships with cloud hosted VM’s via your existing VPN gateway

Also announced were:

  • A new management portal – compatible with multiple browsers and devices (it’s a preview though, not 100% feature complete)
  • “Hosted Services” renamed to “cloud services”
  • new 1.7 SDK w/ Visual Studio 2012 support
  • updated Windows Azure Storage Pricing – transaction costs reduced by 90% and option to turn off geo-replication and save $0.032/gb
  • Media Services (already announced, but general preview now available)
  • Additional country support (89 total countries and 19 local currencies)

The reality is that bloggers all over the world area already working on posts on the new features. I had limited bandwidth these days (I’d love consulting if it wasn’t for all those pesky clients – just kidding folks), so I figured I’d provide you with some links for you to explore until I’m able to spend some time exploring the new features on your behalf and diving into them in detail. Smile

Virtual Machines, Web Sites, and a new Cache option

The first update that came out a day before the event from Bill Laing, Corporate Vice President of Server and Cloud at Microsoft (aka the person that owns the datacenter side of Windows Azure). In his Announcing New Windows Azure Services to Deliver “Hybrid Cloud” post, Bill gave a quick intro to what was coming. But this wasn’t much more than a teaser.

The next big post was from “the Gu” himself and posted as he was giving his kick-off presentation. In Meet the new Windows Azure, Scott was kind enough to dive into some of the new features complete with pictures. So if you don’t have a subscription you can see the preview of the new management portal (it’s a preview because its not yet 100% complete, so expect future updates). He also discussed the new Windows Azure Virtual Machines feature. Unlike the previous VM Role, Virtual Machines are persistent (the PaaS roles are all stateless) and MSFT is providing support not just for Windows Server 2008 R2 and Windows Server 2012 (RC) but also Linux distros CentOS 6.2, OpenSUSE, and Ubuntu. You may also see a pre-defined SQL Server 2012 image. So this indicates we may see more Microsoft server products available as Windows Azure Virtual Machine images.

The real wow factor of the event seemed to be Windows Azure Web Sites. For lack of a better explanation, this is a high density hosting solution for web sites that features both inexpensive shared hosting or dedicated (non-multi-tenant) hosting. With this you can do just a couple clicks and deploy many common packages such as WordPress to Windows Azure Web Sites in just a few minutes. And to top it all off, this supports multiple publishing models.

The distributed cache feature was the one I was really waiting for. I was fortunate enough to get early access to this feature because of a project I was working on. And I think someone at MSFT might have taken a bit of pity on me when I posted a while back that I was going to build my own distributed cache system. This new feature allows you to set aside Windows Azure Cloud Services resources (memory from our deployed compute instances) and use them to create a “ring” that is an in-memory distributed cache. Some call this a “free” cache, but I don’t like that term because you are paying for it. You’re just able to leverage any left-over memory you might have in existing instances. If there isn’t any, you’re forced to spin up new instances (maybe even a specific role that does nothing) to host it. And hosting those VM’s still costs you per hour. So “free” isn’t the word I’d use to describe the distributed cache, I prefer “awesome”.

Windows Azure Storage Pricing Changes

Now the most confusing announcement yesterday was some changes to Windows Azure pricing. It was so confusing that the storage team has published two separate blog posts on the subject. The first post was simply announcing the that the “per unit” pricing for Azure Storage transactions went from 10,000 to 100,000, all for the same $0.01 per unit. This is great news and takes away a pricing disparity between Windows Azure and Amazon Web Services.

The next big change is that the Geo-replication features that were announced last fall (I can’t recall it was at BUILD or the “Learn Windows Azure” event), can be turned off. Now Azure storage costs were already reduced to $0.125/gb back in March of 2012. Well with this latest announced, you can turn off geo-replication and save yourself an additional $0.032/gb.

Brad Calder if you read this, thanks for taking the time to help clarify these changes! I would have simply said “it’s a net win!”

Videos, Videos, Videos

Now as you can see, there’s lots to cover. Fortunately, MSFT was prepared and posted slew of new videos.

MeetWindowsAzure.com has a series of Chalk Talk videos covering many of the new features. These range from 10 to 30 minutes in length (with most being only just under 10 minutes) and are great “why should I care” introductions. And as if that weren’t enough, the WindowsAzure account/channel over on YouTube has posted over 20 “tech bite” sized videos of the new features ranging from 2 to 10 minutes in length. You can’t go wrong with these quick and simple intros.

Wrap-up

So its still pretty exciting right now. I was present for most of yesterday’s live broadcast. But I still spent a good portion of today sorting through the news to pull this post together. I think these new features merit a honest and open re-evaluation of Windows Azure for anyone that has dismissed it in the past. And for those of us that already like and use the platform, we have some great new tools to help us better deliver exciting solutions.

BTW, if you have a Windows Azure subscription and would like to test drive the preview of some of these new features, you can sign up for it here!

So until next time, I’m going to try and take some time to learn this new features and you can bet I’ll be bringing you along for he ride!  Safe travels.

PS – I wonder if there are any surprises left in store for next week at TechEd North America 2012.