Serverless Openhack – London 2018

Openhacks are awesome! At least that’s my most recent experience at the Serverless Openhack that was held in London in June of 2018.

For those unfamiliar with the concept of an open hack… I’ll warn that there appears to be multiple definitions. For the sake of the event I attended, I would describe it a guided, team oriented, gated challenge-based learning event. You are assigned to a team and the team is given a coach. The objective is to work as a team to complete a series of increasingly complex challenges that help you learn a new technology. The coach is there to help your team if you get stuck and help you explore the alternative solutions for the challenge and the pros/cons of them.

I had volunteered to be a coach for this event and prepared to by running through the challenges in two practice runs. As a hands-on learner this was a great way for me to help upskill on technologies I was already familiar with. Learning a lot from my colleagues as we each added our own expertise. In my case, Service Bus Event Hubs and Event Grid.

Back on topic…

This event was held at the Microsoft Reactor space on the east end of London and about 120 fellow geeks attended this free event. To help them out, we had 20 coaches with a few floaters. As if this wasn’t enough, the 2018 Integrate conference was in town and we had members of the Service Bus, Azure Functions, and Logic Apps teams stop by during the event. So I think its safe to say we had a lot of great minds there to help folks along.

The first day, the attendees showed up and settled in at their assigned tables. We got to know each other, familiarized ourselves with the format of the hack, and set to work on the first challenge. The hack started by laying a “what is an Azure Function” foundation. It ensured that folks had the proper tooling installed and had access to all the materials provided. This also provides the team with an opportunity to determine how they want to collaborate, where/how to share code, and divide up the work on the challenges.

As the teams progress, the challenges get increasingly difficult. They build on the work that was already done to help build out real-world business patterns, for this hack, it was a fictional organic ice cream company. What starts as a single api method, turns into a robust API with scalability, resilient cloud storage, data visualization, and robust monitoring and management.

But the best part is that Openhack is not a contest. Teams don’t compete against each other (at least officially, there’s always a few that are in it to finish first), but are encouraged to go at their own pace and explore the aspects of the technologies that they find most interesting. In some cases, teams will divide up the work by having multiple members each working on different ways to solve the same problem so they can compare learnings afterwards.

I’ve always enjoyed hands on learning and problem solving. Openhack gives you both of these in a great, collaborative package. You don’t have to make up your own problems to solve which, at least for me, helps keep me engaged.

I also enjoy mentoring others. When you learn something, you tend to only learn from your perspective. When mentoring, or coaching others, you learn as much from them and their perspective as they learn from you. So it can be an incredibly rewarding experience.

If you get the opportunity to participate in an Openhack, either as a participant or a coach, I can’t recommend strongly enough that you seize that opportunity. I’ll definitely be on the lookout for more of these in the future since a career in software/information technologies is a never-ending learning experience. An Openhack is a great way to challenge yourself to learn new things.


Loading it on with Event Grid

Last week, I had one of those moments that makes me really appreciate my role at Microsoft. I was fortunate enough to spend 2 days working along side colleagues and members of the Event Grid product group (PG) with the Event Grid. We played with some stuff that hasn’t been announced (sorry, not covering that here), as well as tested out a few scenarios for them and gave feedback on the service.

The scenario I picked was “Event Grid under load”, basically to try and DDOS the service. The objective of this scenario was to send load to a custom topic in Event Grid and see a) could I get throttling to kick in and b) what were the behaviors we saw.  The answer was a) yes and b) educational.

The Solution

Event Grid commits to handle the ingestion of up to 5,000 events per second, but we were encouraged to see how far we could push things. And while there are some solutions out there that can be used to generate simulated load, I went to what I knew… Service Fabric. I opted to create a simple micro-service that could generate traffic, then use a Service Fabric cluster to run multiple copies of that service. The service is pretty simple, about 20 lines of code inserted into the “run” method of a stateless service.

List<EventGridEvent> eventList = new List<EventGridEvent>();

for(int i=1; i <= 1000; i++)
    EventGridEvent myEvent = new EventGridEvent()
        Subject = $"Event {i}",
        EventType = "Type" + i.ToString()[0],
        EventTime = DateTime.UtcNow,
        Id = Guid.NewGuid().ToString(),
        Data = "sample data",
        DataVersion = "1.0"

TopicCredentials topicCredentials = new TopicCredentials(sasKey);
EventGridClient client = new EventGridClient(topicCredentials);
client.PublishEventsAsync(eventgridtopic, eventList);

await Task.Delay(TimeSpan.FromMilliseconds(100), cancellationToken);

I started running a single copy of this service on my development machine using my local development Service Fabric cluster. I played around with settings going from a batch of 100 events once per second and tweaking things until it was finally running as you see above with a batch of 1000 events about 10 times a second. At that point we stood up a 5 node Service Fabric cluster running Standard_D2_v2 instances so we could deploy the service there. I was able to batch up 1000 of my events because I didn’t exceed the 1mb maximum payload of a single publish operation. So if you opt for this same type of bulk send, please keep that in mind when designing your batch size.

Event Grid is pretty smart, so when an event comes in, it will persist it to storage, than look for subscribers that need to receive it. If there are none, it stops there. So to really help finish this scenario, we needed to have a way to consume the load. This could prove much more difficult then generating the load, so we instead opted for an easy solution… Event Hub. We created a subscriber on our Custom Topic that would just send all the event that came in to an Event Hub that was configured to automatically scale up to the maximum limits.

With everything in place, it was time to start testing.

The Results

We started our first real test with one copy of the load generating service on each node in the cluster, generating approximately 50,000 events per second. Using the Azure Portal, we monitored the Custom Topic and its sole subscriber closely and saw that everything appeared to be handling the load admirably. The only real deviation we saw was momentary degradation that we believe presented the Event Hub engaging its automatic scaling to handle the extra traffic. This is important because unlike a typical load test where you’d slowly scale up to the 50,000, we basically went from 0 to 50k in a few seconds.

Raja, a lead architect on Event Grid, was sitting next to me monitoring the cluster in Azure that hosted my custom topic. After getting pretty excited about what I was seeing, he agreed that we could double the traffic. So after a quick redeploy of the solution, we doubled our load to 100k events per second. Things started off great, but after a few minutes we noticed we started seeing steadily increasing failure rates on sends to the topic. When we looked at the data, the cause became apparent. Event Hub was throttling the Event Grid’s attempts to send events to it. This created back-pressure on the Event Grid cluster as it started executing its retry scenarios.

With this finding in mind, we realized that we need to change the test a bit and filter the Event Hub subscriber. We went with a simple option again and opted to have the Event Hub only subscribe to events where the Event Type was “Type 1”. In the code above, we take the first number of the loop (1-1000) and use that as the Event Type. Thus generating 9 different events types. So with the revised filtering, we returned to sending 50k events per second, with the Event Hub only getting slightly less than 5k of them. This worked fine, so we double it again to 100k. This time… because we have eliminated the back-pressure, we were able to achieve 100k without any failures.

With 100k successfully in place, Raja looked at the cluster and gave the thumbs up again for us to double traffic to 200k. We quickly saw that we’d hit our upper limit and we started getting publication errors on the custom topic. By our account, we were hitting the limit at just a nudge past 100k. But we let the load run for awhile and continued to monitor what was going on. As we watched, we saw that our throughput would drop, at times to as low as 30k per second but would never really get above that 100k limit. Raja explained that what was happening was that the cluster had capacity and was giving it to us up to a certain point. But as pressure from other tenants on the cluster came and went, our throughput would be throttled down to ensure that there were resources available for those tenants. In short, we had proven that the system was behaving exactly as intended.

So what did we learn…

The entire experience was great and I had alot of fun. But more importantly, I learned a few things. And in keeping with my role… part of the job is to make sure I pass these learnings along to others. So here’s what we have…

First, while the service can deliver beyond the 5,000 events per second that the PG commits too, we shouldn’t count on that. This burned folks back in the early days of Azure when they would build solutions using Azure Blobs and base their design on tests that showed it was exceeding the throughput targets that were committed to at that time. When things would get throttled down to something closer to those commitments, it would break folks solutions. So when you design and build a solution, be sure to design based on the targets for the service, not observed behaviors.

Secondly, back-pressure from subscriber egress (this include failed deliver retries) can affect ingestion throughput. So while its good to know that Event Grid will retry failures, you’ll want to take steps to keep the amount of resource drain from retries to the minimum. This can mean ensuring that the subscriber(s) can handle the load, but also that when testing your solution, you properly simulate the impact of all the subscribers that you may have in your production environment.

And finally, and I would hope this goes without saying… pick the right solution for the right job. Our simulated load in this case represented an atypical use case for Event Grid. Grid, unlike Event Hub, is meant to deliver the items we need to act on.. the actionable events. The nature of my simulated traffic was more akin to a logging solution where I’m just tossing everything at it. In reality, I’d probably have a log stream, and only send certain events to the grid. So this load test certainly didn’t use it in the matter that was intended.

So there you have it. I managed to throw enough load at both the Event Hub and Event Grid to cause both solutions to throttle my traffic. With a few tweaks, my simple service could also possibly be used to create a nice little (if admittedly overly expensive) HTTP DDOS engine. But that’s not what is important here (even if it was kind of fun). What is important is that we learned a few things about how this service works and how to design solutions that don’t end up creating more problems then they solve.

Until next time!

Azure Event Grid

On January 30th, Azure launched a new service that, at least in Azure terms, represents a bit of a paradigm shift. This new service, the Azure Event Grid, is unlike anything else that exists in Azure, for one simple reason… its part of the “fabric” that is Azure. In Azure, you create a database, a virtual machine, a queue, etc… but there’s no creating a “grid”. The Azure Event Grid just exists. For all purposes, its just part of Azure.

This caused me a bit of confusion at first. I’ve never claimed to be the smartest guy in the room, but I think I’m fairly savy. So if I was confused by this at first, I figured there have to be others that might have similar challenges. So I wanted to put my own version of what this service is down in writing.


Azure Event Grid (we’ll just call it “the grid” for short, and because I’m a Tron fan), conceptually, can be thought of as a model for publishing, and consuming events using a pub/sub model. It has topics to which events are published, and subscribers that allow you to get those events with filtering so you only receive the events you want. Unlike queues and Azure’s Event Hub, grid subscriptions don’t need to be polled, instead they use a message pump approach to push events to a predefined endpoint (usually a web hook). And unlike Event Hub, its not a long term buffer. Instead it does it’s best to retry delivery periodically before the message is finally discarded. For a more detailed breakdown of the differences, be sure to look at the official documentation.

Topics come in two varieties, system topics and custom topics. System topics are created and managed by the various Azure Services. Azure storage, Azure Subscriptions, Events Grid already provides system topics and more are on the way. System topics are “just part of Azure”, so there’s no need to create them and you won’t see them in the portal. They are “just there” and managed by Azure for you to use if you so desire.  If you want to publish your own events, you can create a custom topic and you’ll be able to see and manage the custom topics as you would any other Azure resource. To loop back on the early message, the topic is an input endpoint, and the subscription are output endpoints, the grid is in the middle doing the plumbing between the two so you don’t have too.

To get the messages, you create subscriptions on the topics. A subscription provides the filtering criteria describing the events you want to receive and the endpoint you want events that meet those criteria to be sent too. Where you “see” subscriptions as Azure resource objects, depends on the type of topic they are associated with. If its a custom topic, you’ll see them listed as sub-resource objects of the custom topic they belong too. For system topics, you’ll see subscribers as sub-resources of the system resources you’re subscribing too (sort of like queues within a Service Bus namespace).

One last little catch here. System topics, even though they are part of Azure, are not system wide. For example, you currently can’t subscribe to all events from every storage account within a subscription. You instead have to subscribe to each storage account you want messages on. This means individual subscriptions on each storage account. Fortunately, you can have each of those subscriptions sending events to the same endpoint, giving you consolidated processing. But you do have to manage the individual subscriptions.

System topics do have another important difference from custom topics. A system topic will have an established list of events types, and their schema. These published schemas are there to help you better determine the proper filtering when creating subscriptions on system defined topics. This doesn’t exist for custom topics because you, as you’d expect, have full control over the events that get published into it.

Publishing Events

Like I said, system topics publish their own events and you can’t publish to those same topics. So if you want to have your own events, you will need to create a custom topic and send the events to that. This is done using a simple HTTP operation. In c#, it xould look as follows:

 HttpClient client = new HttpClient();
 client.DefaultRequestHeaders.Add("aeg-sas-key", sasKey); // add in our security header

 HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Post, topicEndpoint) // create our request object
     Content = new StringContent(JsonConvert.SerializeObject(eventList), Encoding.UTF8, "application/json")
 HttpResponseMessage response = client.SendAsync(request).GetAwaiter().GetResult(); // post the events

sasKey is the security token for the custom topic. eventList is a generic List object that contains a collection of our event objects for serialization. That custom object should look something like this.

 public class GridEvent where T : class
     public string Id { get; set; }
     public string Subject { get; set; }
     public string EventType { get; set; }
     public T Data { get; set; }
     public DateTime EventTime { get; set; }

Of course you can do follow the approach above using the language of your choice. I just provided an example in C# as its my primary language. If you are using C#, you may opt to use the .NET SDK instead. If so, it looks a bit like this.

<pre>// create a list object for the events that will be send
List eventList = new List();

// loop through adding events to the list
foreach (var message in eventHubMessages)
    EventGridEvent myEvent = new EventGridEvent()
        Id = Guid.NewGuid().ToString(),
        EventTime = DateTime.UtcNow,
        EventType = $"{eventType}",
        Subject = $"{eventMessage}",
        Data = $"{eventData}",
        DataVersion = "1.0"

// create the topic credential
TopicCredentials topicCredentials = new TopicCredentials(sasKey);

// Create the client object and publish/send to topic
EventGridClient client = new EventGridClient(topicCredentials);
client.PublishEventsAsync($"{gridname}.{regionprefix}", eventList).Wait();</pre>

In this case I’m using the SDK defined EventGridEvent class (naming brought to you by the Department of Redundancy Department). But the general approach is still the same, create a collection, put the event objects into the collection, then send them to the topic endpoint. The collection is used to help batch things together because http handshakes can be resource intensive so this helps reduce the number of handshakes when we’re sending events. You can also find this full SDK example on github.

Consuming Events

As we mentioned, a subscription defines what events you want to receive from a topic and the endpoint they should be sent too. When you create the subscription, you define these settings using a pre-defined schema. The schema represents a JSON object that describes the properties of the subscription. Things like…

includedEventTypes – an array of the event types you want to receive. If not specified, it defaults to all types. If one or more types is specified and you’re creating a subscription on a system topic, the values must match registered event types for that topic.

subjectBeginsWith – the value of the event’s Subject property starts with the specified value. What is contained in the Subject property will depend on the topic you are subscribing too.

subjectEndsWith – the value of the event’s Subject property ends with the specified value

subjectIsCaseSensitive – if the two subject matches should be case sensitive or not

Now the obvious question is why doesn’t Event Grid just support something like REGEX for matching on the Subject property (which is just a string value). I won’t speak for the product team on this but, but my theory is that when you’re building a solution like Azure Event Grid… one that commits to dealing with potentially tens of thousands of events and doing so with single digit millisecond latency… you have to make choices. REGEX, while simple to use, is a bit more complex under the covers and therefore more computationally intensive then a simple string match on the beginning or ending of a string. So I suspect they’ve done this  to maximize throughput while minimizing latency.

When you create the subscription, assuming you’re sending to an https/webhook endpoint, you’ll need to make sure the receiver is prepared to respond to the validation event. Validation of endpoints is important, because without it, Event Grid turns into a giant DDOS engine. The validation event is Event Grid’s way of ensuring that events are only being sent to endpoints that are prepared to receive them. When the subscription is created, it sends a special event to the endpoint and expects a specific response back in return. If its not received, the subscriber will be invalidated and events won’t be published. This helps prevent abuse by ensuring that all endpoints agree to receive events. Another important point here, unless you want to write your own validation code, when creating an Azure Function to be used as an endpoint, be sure to use the Event Grid trigger and not the more generic HTTP trigger. If you want an example of using the HTTP trigger, then I’d recommend this post by Brandon Hurlburt.

Now at this point I could show you code for processing the event once its been delivered. But fortunately the product team has already done a really good job of documenting how to do this. And given that we’ve already covered how to wire things up, I don’t know that I can add anything meaningful.

Event – End of Line

So that’s really it. Event Grid, because its part of Azure, is incredibly easy to use. Its really just a matter of saying “I want to subscribe to these events, and here’s where to send them”.  This makes it a great example of serverless approach because the wiring is taken care of for you so you can focus on just processing the event itself. That said, this service is still very new, so the number of Azure system topics is somewhat limited. But the team behind the Event Grid is working to change this and is also great about listening to suggestions on how to improve the service. If you have problems using the event grid and need some advice, post on StackOverflow with the tag “azure-eventgrid”. If you have a suggestion about the service, something you think is needed or could be improved, then make use of the official Azure Feedback forum.

Lastly, I’d like to thank Bahram, Dan R, and the rest of the Service Bus team for the great work they did bringing Event Grid to market. Also a shout out to Cicil Phillip for his post on sending to event grid with http and C#.

Until next time!

PS – yes, this post was written while listening the Tron Legacy soundtrack on repeat. Please don’t judge. 🙂

Azure Resource Manager Template Tips and Tricks

Its interesting the places technology interests take you. A simple idea that can send you down the rabbit hole and helps you discover new things you never even imagined were possible. Its one such journey that has led me to this particular posting.

This particular journey begins in mid February when I was fortunate enough to be able to participate in a hackfest around the open source solution Nether. As part of this event, I was tasked with helping refactoring the Azure Resource Manager deployment templates. They wanted something that had some consistency, as well as increased flexibility. The week following this, I was on-site with one of my ISV partners where we had a similar need. Both these projects helped drive my understanding and skill with ARM templates to an entirely new level. Along the way I learned a few tips/tricks that I figured I’d pass along to you.

JSON is “object” notation

The first learning is to realize that an ARM template isn’t just a bunch of strings, its defining objects that represent resources you want the Azure providers to create for you. An ARM template is a JSON (javascript object notation) file consisting (for the most part) of key/value pairs, object declarations (stuff inside curly brackets) and arrays (stuff inside square brackets). Furthermore, ARM templates provide us with various functions that can be use to create, manipulate, and insert things in the template.

Now, if you look at something like a simple Windows VM’s ip configuration, we can see this.

"ipConfigurations": [
        "name": "ipconfig1",
        "properties": {
             "privateIPAllocationMethod": "Dynamic",
             "publicIPAddress": {
                  "id": "[resourceId('Microsoft.Network/publicIPAddresses',variables('publicIPAddressName'))]"
             "subnet": {
                 "id": "[variables('subnetRef')]"

This section is an array (square brackets) of objects (curly brackets). And this particular example is associating the VM (well, its NIC actually) with a public IP address and the subnet by setting the values for those particular properties of the IP configuration “object”. But… what about if you’re using a load balancer?

"ipConfigurations": [
        "name": "ipconfig1",
        "properties": {
            "privateIPAllocationMethod": "Dynamic",
            "subnet": {
                 "id": "[variables('subnetRef')]"
            "loadBalancerBackendAddressPools": [
                     "id": "[concat(variables('lbID'), '/backendAddressPools/BackendPool1')]"
            "loadBalancerInboundNatRules": [
                     "id": "[concat(variables('lbID'),'/inboundNatRules/RDP-VM', copyindex())]"

Now the same “object” has a different set of properties. Gone is the publicIPAddress setting, and added is the loadBalancerBackendAddressPools and loadBalancerInboundNatRules. Not a big deal, unless you’re trying to create a template for a VM that can be easily deployed in either configuration. But if we look at the sections I’ve selected above, we realize that our template can actually look more like this.

"ipConfigurations": [
        "name": "ipconfig1",
        "properties": "[parameters('ipConfig')]"

In this example, we still have an array with one object, but rather then defining the individual properties, we’ve instead said that the properties are contained in a parameter that was passed into the template itself. A parameter that looks as follows:

"ipConfig": {
    "value": {
        "privateIPAllocationMethod": "Dynamic",
        "privateIPAddress": "[parameters('privateIP')]",
        "subnet": {
            "id": "[parameters('subnetResourceId')]"
        "publicIPAddress": {
            "id": "[resourceId('Microsoft.Network/publicIPAddresses', variables('publicIPName'))]"

We could also just as easily construct the object in a variable. Which can be really helpful if we have a common set of settings we want to share across multiple objects in the same template.

This realization also opens up a whole new world of possibilities as we can now pass objects as parameters into a template,

"ipConfig": {
    "type": "object",
    "metadata": {
        "description": "The IP configuration for the VM"

and receive as output from a template

"outputs": {
    "subnetIDs" : {
        "type" : "object",
        "value": {
            "frontEnd" : "[variables('subnetFrontEndRef')]",
            "backEnd" : "[variables('subnetBackEndRef')]",
            "management" : "[variables('subnetManagementRef')]"

By using objects and not just simple data types (strings, integers, etc…), we make it a big easier to group values together and pass them around.

Using variables to transform

I mentioned declaring objects in the variable section for reuse. But we can also use variables to transform things. Lets say you’re creating a template for a SQL database. The database needs an edition and a requestedserviceObjectiveName (tier). You could have both values passed into template and then set the properties using those values. But perhaps you want to simplify that for the template’s end user to avoid something like a request for a “Standard” edition with a “P4” service tier. So the template declares an input parameter that looks something like the following.

"databaseSKU": {
    "type": "string",
    "defaultValue": "Basic",
    "allowedValues": [
        "Standard S1",
        "Standard S2",
        "Standard S3",
        "Premium P1",
        "Premium P2",
        "Premium P4",
        "Premium P6",
        "Premium P11",
        "Premium P15"
    "metadata": {
        "description": "Specifies the database pricing/performance."

The user just declares that they want a “Basic”, or “Standard S2”… and the template transforms that into the appropriate settings. In the variables section of the template, we then create a a collection of objects that we can access using the parameter value as a key. Each object in the collection sets the values that can be used to set the properties of the database.

"databasePricingTiers" : {
    "Basic" : {
        "edition": "Basic",
        "requestedServiceObjectiveName": "Basic"
    "Standard" : {
        "edition": "Standard",
        "requestedServiceObjectiveName": "S0"

Since each item in the collection is an object, we can even use it set an entire section of the database configuration just like we did the IP configuration earlier.  Something like..

“properties”: “[variables(‘databasePricingTiers’)[parameters(‘databaseSKU’)]]”

We can even take this a step further, and have more complex templates use simplified sizings such as “small”, “medium”, and “large”, which are used to control all kinds of individual settings across different resources.

Linked Templates

IMHO, there are two advantages to the techniques I just mentioned. Used properly, I feel they can make a template easier to maintain. But just as importantly, these allow for reuse. And reuse is most evident when we start talking linked templates.

A linked template is one that’s called from another template. Its accomplished by providing the URL for where the template is located. This means that the template has to be somewhere that it can be linked to. A web site, or the raw github source link works well.  But sometimes you don’t want to expose your templates publicly.

This is where the powershell script I have in my repo comes in. Among other things, it creates a storage account and uploads all the templates to it.

Get-ChildItem -File $scriptRoot/* -Exclude *params.json -filter deploy-*.json | Set-AzureStorageBlobContent `
    -Context $storageAccount.Context `
    -Container $containerName `

This snippet has been designed to go with the naming conventions I’m using. So it will only get files that start with “deploy-“ and end in “.json”. It also ignores any files that end in “params.json”, so I can include parameter files locally for testing purposes and not have to worry about uploading them accidentally. My GitHub repo has taken this a step further and ignores any files that end in privateparams.json so I don’t accidentally check them in.

I’d like to call out the work of Stuart Leeks on this. He did the up front work as part of the Nether project I mentioned earlier. I just adapted it for my needs and added a few minor enhancements in. I’ve worked with Stuart on a few things over and years and its always been a pleasure and great learning experience. So I really appreciate what I learned from him and for him as a result of some of the work on the Nether project. Back to the task at hand.

With the files uploaded, we then have to link to them. This is why some of my templates have you pass in a templateBaseURL and templateSaaSToken.  I’ve parameterized these values allow me to construct the full URI for where the files will be located. Thus I could pass in the following for templateBaseURL if I just wanted to access them from my GitHub repository:

the templateSaaSToken is there in case you want to use a shared access signature for a blob container to access the files.

Any ARM template can pass values out. But when combined with linked templates, we can now take those outputs and pass them into subsequent templates.  Something like…

"sqlServerFQDN": { "value": "[<strong>reference('SQLDatabaseTemplate').outputs.</strong>databaseServerFQDN.value]" }

In this case “reference” says we are referencing the runtime values of an object in the current template (in this case of a linked template). From there we want its outputs and specifically the one named databaseServerFQDN, and finally its value property. In the template that outputs these values they are declared like…

"outputs": {
    "databaseServerFQDN" : {
        "type" : "string",
        "value": "[reference(variables('sqlDBServerName')).fullyQualifiedDomainName]"
    "databaseName" : {
        "type" : "string",
        "value": "[parameters('databaseName')]"

Outputs is an object, that contains a collection of other objects. Note the property outputs.databaseServerFQDN.value. We could also get databaseServerFQDN.type if we wanted. Or access the database name properties.

What’s also important here is the reference function. You may have seen this used in other places and thought it was interchangeable with the resourceID function. But its when you work with linked templates that it really shines. The reference function is really telling the resource provider to wait until the item I’m getting a reference to has completed, then give me access to its run time properties. This means that you don’t even need a “dependson” for the other template as the resource function will wait for that template to complete already. But me, I like putting it in anyways. Just to be safe.

The other big item here is that when we call a linked template, we have to give it a name. And here’s why… Each template is essentially run independently by Azure’s resource manager. So if you have a master template, that’s using 4 linked templates and then you check the resource group’s deployment history, you’d actually see 5 deployments.

multiple deployments resulting from a single master template with multiple linked templates.

Now the reason these deployment names are important is because the resource manager will track them and won’t allow two deployments with the same name to run at the same time. This isn’t a big deal most of the time. But earlier in this post, I described creating a reusable virtual machine template. That template is used by a parent template to create a resource. And if I have 2-3 of those parents running, I need to make sure that names don’t collide.

Now the handy way to avoid this is to reference a run-time value with the ARM template… deployment(). This exposes properties about the deployment such as the name. So when calling a linked template, we can actually craft a unique name by doing something like…

concat(deployment().name, ‘-vm’)

This allows each deployment template to take the parent’s name and add its own unique suffix on. Thus (hopefully) helping avoid having to deal with non-unique nested names. If you look at the image to the right, you’ll see deployments like jumpboxTemplate and jumpboxTemplate-vm. The later deployment is a reusable template that is linked from the former. And I’m using the value of deployment to set name of the vm template deployment. The same is also present in loadbalancedvmTemplate-lbvms000 and 0001. In that case, this is two VMs being deployed using the same linked template, but in this case being done multiple times as part of a copy loop in the parent template.

Other Misc Learnings

As if all this wasn’t enough, there were a couple other tips I wanted to pass along.

When I was working with Stuart on the Nether project, we wanted a template that would add a consumer group to an existing Service Bus event hub. Unfortunately, all the Service Bus templates we could find only showed the creation of the consumer group as part of creating the event hub via an approach called nested resources. I was able to quickly figure out how to create the consumer itself, but the challenge was how to then reference it.

When working with nested resources it is important to understand the paths present in both the resource type and its name. In the case of our consumer group, we were quickly able to determine that the proper resource type would be Microsoft.EventHub/Namespaces/EventHubs/ConsumerGroups.

You might assume that now that you have the type path, you’d just specify the resource name as something like “myconsumer”. But with nested resources, its gets more complicated The above type represents 3 nested tiers. As such, the name needs to follow suit and have the same number of tiers. So I had to actually name to something more like //.

Stuart pointed me to a tip he learned on another project. Namely that these two values are combined like the teeth on a zipper to create the path to the resource:


Once I realized this, a light bulb went off. This full name actually reflects the same type of value you usually get back from a call to the resourceId function. This function accepts two parameters, a type, and a name, and essentially zips them together while also adding on the leading value based on the current resource group (subscription and the like). You can even see this full path when you look at the properties for an existing object in the Azure portal.

Now the second tip is about the provider API versions. I often asked why I put these values into a variable and what should be the right value. Well, I put them in a variable because it means there’s less I have to accidentally mess up when creating a template. It also means that if/when I want to update the version of an API I’m using, I only have to change it once.

But as for the big question about how do we know what versions of the API exist… I got that tip from Michael Collier (former Azure MVP and currently one of my colleagues) who in turn got it from another old friend, Neil MacKenzie. They pointed out that you can get these pretty easily via Powershell.

(Get-AzureRmResourceProvider -ProviderNamespace Microsoft.Compute).ResourceTypes | where {$_.ResourceTypeName -eq 'virtualMachines'} | select -ExpandProperty ApiVersions

This powershell command will spit out the available API versions for Microsoft.Compute/VirtualMachines.

New versions are shipped all the time and its great to know I can be aware of them without having wait for someone to publish a sample template with those values in them.

My last item comes from another colleague, Greg Oliver. Greg has found what when you’re working on templates, you really get slowed down waiting for each deployment to finish, then get deleted, then start the deployment over again. So he’s taken to adding an ‘index’ parameter to his templates. Then, when he runs them, he simply increments the value (index++). Then, while the new deployment is running, you can go ahead and start deleting the old one. There can be several “old” iterations in the process of deleting while you continue to work on your template. Something like this could also be used  as part of my suffix approach, but Greg has gone the extra mile to make the iteration its own parameter. Awesome time saving tip!

All in all, I think these are some great, if little known, ARM tips.

Deployment Complete

I wish I could say that these tips and tricks will make building ARM templates easier. Unfortunately, they won’t. Building templates requires lots of hands on practice, patience, and time. But I hope the tips I’ve discussed here might help you craft templates that are easier to maintain and reuse.

To help illustrate all these tricks (and a few less impressive ones), I’ve created a series of linked templates and put them in a single folder on GitHub. These include a PowerShell script to run the deployment as well as sample parameter files. I hope to continue to tweak these as I learn more including adding into the PowerShell script some options to help prevent issues with dns name collisions. Hopefully they’ll work without any issues, but if you run into something, please drop me a line and let me know.

Until next time!

Azure Administration with Certificates

You know those annoying “back in my day” stories your crazy uncle would tell at the holidays? Well I have one of those. Fortunately for all of us, I’m also here to tell you how to accomplish this using today’s tech. But before we get to the cool stuff, I want to hop into the wayback machine for a few. So come along Sherman. BTW, that’s an old cartoon joke for you “yung’uns”.

Once upon a time, we had the Azure Management Service

Before the world of the Azure Resource Manager (ARM), there was ASM, the “Service Management” interface. Much like ARM, it was a REST based API for interacting with Azure’s resources. But this predated Azure Active Directory, so the only way to interact with it outside of a username/password was via “management certificates”. These were just any old certificate that you happened to have around, but what made them special is that when you uploaded them to the Azure portal, you could then use them to interact with the management API via the command line. And best yet, these certificates gave you co-administrator authority. Essentially allowing you to do almost anything with the subscription that you wanted.

By the time ARM showed up about 2 years ago, we had Azure AD. And folks were starting to look at Role Based Access Control (RBAC) and the notion of “just enough security”. Basically they didn’t want to give a certificate that could be passed around between developers. So Azure stepped up its RBAC implementation, and more and more services started supporting role-based security.

Adding a new user to Azure AD and granting it permissions is easy enough. But what if you need to have a process execute something against the ARM api? You’d have to set up a user identity, complete with password for the process. Then store the username/password somewhere secure. This is where the management certificates had a nice advantage. A certificate can be installed on a server and flagged as “not exportable”. So the certificate becomes a credential that a process can use. But how do we do this in the new Azure AD world?

Azure AD has the concept of Applications. And these applications could be associated with a certificate. And together, they form a service principle. A principle that gives us the best of both worlds.

Creating your certificate

The first step in setting this up is to create a certificate we can use. And not any old certificate will do, there’s a couple gotcha’s here. First off, to use a certificate as a service principle in Azure, it needs to have a common name. I don’t know why, but this seems to be something that’s enforced by Azure AD. The second is (and I hope this is considered a no-brainer), don’t generate a certificate using a SHA1 hash algorithm. If you missed the news, SHA1 is no longer considered secure enough and is being deprecated across the industry.

With these requirements in hand, the next step is to set about creating the certificate. On Windows, you may be tempted to reach for that old stand-by, MakeCert. Please don’t. Instead, open up PowerShell and find the New-SelfSignedCertificate cmdlet.

Using this cmdlet, we can use this snippet of PowerShell to generate a self-signed certificate

$certSubject =
Read-Host -Prompt “Issue By/To for the certificate”

$cert = New-SelfSignedCertificate `
    -CertStoreLocation Cert:\CurrentUser\My `
    -Subject "CN=$($certSubject)" `
    -KeySpec KeyExchange `
    -HashAlgorithm SHA256

This creates the certificate using the SHA256 hash, the common name we specified, and places it into the current user’s Personal/Certificate store. You can then use the MMC (Microsoft Management Console, windows key + R, then MMC) to locate the certificate and export it using the “save as file” option.

Creating the Application and Principle

The PowerShell above doesn’t just create the certificate, it also returns some of its attributes so we can continue to use them. So the next step is to start constructing our service principle. We’ll continue using PowerShell and start by capturing the details of the certificate that we’ll need later.

# Get certificate thumbprint
$certThumbprint = $cert.Thumbprint

# Get public key and properties from selected cert
$keyValue = [System.Convert]::ToBase64String($cert.GetRawCertData())
$keyId = [guid]::NewGuid()
$startDate = $cert.NotBefore
$endDate = $cert.NotAfter

These lines capture the certificate thumbprint and other data such as the certificate start/end dates. We also create a GUID that will uniquely identify the credential. Then, we use these values to create a PowerShell credential, a PSAADKeyCredential object.

Import-Module `
-Name AzureRM.Resources

$keyCredential =
New-Object -TypeName Microsoft.Azure.Commands.Resources.Models.ActiveDirectory.PSADKeyCredential

$keyCredential.StartDate = $startDate
$keyCredential.EndDate = $endDate
$keyCredential.KeyId = $keyId
$keyCredential.CertValue = $keyValue

We start by importing the Azure Resource Manager module that contains what we need to create the Azure AD application/principle. We then create a blank credential object and populate it with the values we saved off earlier.

With this done, we need to log into Azure (when run, you’ll need to enter your Azure username/password credentials), and create the Azure AD Application using the PowerShell credential object that’s based on our newly created certificate.


# Define Azure AD App values for new Service Principal
$adAppName =
Read-Host -Prompt “Enter unique Azure AD App name”

#these aren't needed for our uses, but need to be completed anyways
$adAppHomePage = "<a href="http://$(">http://$(</a>$adAppName)"
$adAppIdentifierUri = "<a href="http://$(">http://$(</a>$adAppName)"

# Create Azure AD App object for new Service Principal
$adApp =
New-AzureRmADApplication `
    -DisplayName $adAppName `
    -HomePage $adAppHomePage `
    -IdentifierUris $adAppIdentifierUri `
    -KeyCredentials $keyCredential

Write-Output “New Azure AD App Id: $($adApp.ApplicationId)”

Its important to keep in mind, that the New-AzureRMADApplication cmd requires you to have permissions in the subscription’s Azure AD tenant to create the application. If your Azure AD tenant is tightly managed, or perhaps even bound to an on-prem Active Directory or Office365 tenant, this isn’t likely going to be the case. So this step may need to be executed by someone with the proper Azure AD administrative permissions.

With the application created,  we save the Application ID (save this for later, we’ll need it) and we’re then ready to create its associated service principle.

$principleID = New-AzureRmADServicePrincipal  `
    -ApplicationId $adApp.ApplicationId

Granting Permissions and Using the Credential

With the application and service principle created, we can now grant it permissions like we would a user. But instead of specifying the user, we specify the Service Principle via the Application ID that we saved earlier. . You should only give the principle the minimum permissions, but since I’m only doing an example here, I’ll give you full access as an owner of the subscription.

New-AzureRmRoleAssignment `
    -RoleDefinitionName Owner `
    -ServicePrincipalName $adApp.ApplicationId

Now a special note if you’re scripting all this. Between adding the principle and assigning it to a role, there’s some lag. This is because in the cloud the service that’s doing the add and the one that’s granting permissions may not be the same. So there’s some latency regarding the data between the two services. So you may see a few seconds (generally less than 10-15) before the addition of the principle and being able to add the permission succeeds. Not a big deal if you’re doing these commands by hand, but if you’re scripting the process, you’ll need to address this.

With the principle created and given permission to access the subscription, we can then use it to log into Azure.

Login-AzureRmAccount `
    -ServicePrincipal `
    -TenantId $tenantId `
    -ApplicationId $appId `
    -CertificateThumbprint $certThumbprint

The tenantId is the Azure AD tenant we’re logging in to, the $appId is the application we created (I told you to save that), and the $certThumbprint is the thumbprint of the certificate that’s installed on the machine that will be used to log in. Under the covers the certificate is retrieved and then presented to Azure AD to authenticate our login request.

With this done, you’re now able to continue on, doing whatever other PowerShell commands you need.

Session Terminated

So that’s about it. Hopefully my explanation is straight forward enough. I’ve put two scripts up on github that show what we’ve just covered end to end. But before I end this completely I need to give some credit where its due. This post is based on the great work of Keith Mayer. You can even find his original version on github as well. What he pulled together is SOOO much more elegant than anything I could have done. So major kudos to him.

Until next time!

Azure Logic Apps, Functions, and Service Bus

Here we are yet again. Me writing something on this blog if for no other reason then to document something I learned. There’s no real narrative behind this one other than I built another POC for a partner and in the process found some things I wanted to pull together.

The story here is about digging beyond the Logic App designer and interacting with Service Bus queues, topics, and event hubs. Access and manipulating the message properties as we start chaining Logic App workflows together with functions and custom code.

Since we’re going beyond what the designer currently supports, we’ll look exclusively at the “code view” for everything.

Sending to a Queue

The first step was to create a workflow that would accept an HTTP request and use that to create a message in an Azure Service Bus queue. In doing this ‘simple’ task I learned two things, how to compose an object and how to set a custom message property.

The compose action allows you to construct a JSON object from various inputs. I wanted to be able to send a message to a queue for further processing as well as to event hub for logging. So being able to compose the object once and reuse it was VERY handy.

"ComposeJobMsg": {
   "inputs": {
       "JobID": "@{body('SaveJobtoDatabase')?['OutputParameters']['JobID']}",
       "customer": "@{triggerBody()?['customer']}",
       "job_payload": "@triggerBody()?['job_payload']",
       "job_type": "@{triggerBody()?['job_type']}"
   "runAfter": {
       "SaveJobtoDatabase": [
   "type": "Compose"

This action takes input from the workflow trigger and the result of a previous stored procedure, SaveJobToDatabase, and constructs a simple JSON object with four properties (with horribly inconsistent naming conventions I know).

With the message object created, I can now send it to a queue, specifying the output of the compose operation as the ContentData for my queue message. The SendToQueue action’s body looks like this:

"body": {
   "ContentData": "@{encodeBase64(string(outputs('ComposeJobMsg')))}",
   "ContentType": "JSON",
   "Properties": {
       "job_type": "@{triggerBody()?['job_type']}"

There are a few things going on here I want to point out. We’re taking the output of the compose action, outputs(‘ComposeJobMsg’), and converting it from a JSON object to a string. We then base64 encode that string to ensure it will survive transport through the queue. We’re also starting the ContentData value with ‘@{‘ to designate that we’re using a parameter value and we want to treat it as a string. Using ‘{‘ to inform the Logic App that its a string is unnecessary, but sometimes its nice to err on the side of caution. You can learn more about the use of expressions like ‘@‘ and ‘{‘ in the Workflow Definition Language documentation.

Next up, we make sure to set the ContentType as “JSON”.  And finally I add a custom property, “job_type” and set its value to parameter that was on the workflow trigger (again treating that value as a string).

Queue as a Trigger

This is where things started to get interesting. I created a second workflow that is triggered “when a message is received’ and set it to run at 30 second intervals. But this created a problem with trying to update the workflow. Currently (this is something that’s being worked on), the Logic Apps connector takes advantage of Service Bus Queues’ long polling capabilities. Long polling is great because it helps reduce the latency between when a message arrives and it can be processed. So even though the workflow was set to check every 30 seconds… when a message want sent to the queue, it triggered the workflow almost immediately.

The reason for this is that the workflow is not actually running at 30 second intervals, but instead starts polling the queue and waits for that to time out, then it’ll wait 30 seconds and poll again. Where this creates an issue is that if you are in active development, you’re likely going to be changing the workflow every few minutes. Tweak this… run a test… adjust that, run a test. When the workflow start’s polling, its going to wait about 10 minutes for that operation to time out. So even though the “save” works fine, any changes you’re making won’t take affect until the next time the workflow is triggered (after the long-polling times out).

The recommendation I was given, that worked really great (thanks Jeff), is to change the  interval to something like once a day. Then via the portal, we can use the “run trigger” feature to kick off a one-time run of the workflow. So what I would do is I’d modify the workflow, submit a test message to the queue, then manually trigger it. Admittedly, its not as smooth as I’d like, but it gets the job done. The product team seems aware of this so I’m hopeful this workaround won’t be needed for the long term. Once development is complete, the “production” version of the workflow can use a normal timing setting, as long as we’re aware of and OK with the long polling behavior.

Accessing Queue Message Properties

I wanted to be able to consume events both via a Logic App workflow, as well as from some C# code. The workflow portion would look at the job_type property I set above, and use that in a condition to control routing of the message to another queue. If you’re using the drag/drop workflow designer, its pretty easy to get at the queue message ContentData. If you click on it in the designer, the code behind will insert something like this:


Something was pointed out to me (thanks again Jeff!) and the lighbulb went off.  Note that ContentData is the exact same property we set when we sent the message to the queue up above. So if we wanted to access the job_type value we set, we simply access the Properties collection like so:


You can’t currently do this via the designer, so you’ll need to flip over to code view if you want to access the individual properties within the Properties collection.

But what if we want at the actual payload of ContentData? The JSON object is there, we just have to reverse what we did when we put it into the message. We’ll use a couple Workflow Definition Language functions undo the base64 encoding and get the string content. That string is JSON, so we use the json method to convert it to an object. Once its an object again, we can then access any properties within it, such as the JobID we set when we composed the original object.


In C#, if we want to get at the contents of our object, we do a similar process to get the body of the BrokeredMessage object, and transform that JSON payload into an object.

// get the message body
var body = message.GetBody<Stream>();
string jsonJob = new StreamReader(body, true).ReadToEnd();

// convert message body to object
dynamic job = JsonConvert.DeserializeObject(jsonJob);

What about Event Hub?

There isn’t a connector for Event Hub (at least as of the authoring of this post). So I created an Azure function (code on github) to do this for me. It accepted a few parameters and put it into the event hub so I could later process them via stream analytics. Calling it from the workflow was then pretty straight forward.

"LogToEventHub": {
   "inputs": {
       "body": {
           "JobID": "@{json(base64toString(triggerBody()['ContentData']))['JobID']}",
           "customer": "@{json(base64toString(triggerBody()['ContentData']))['customer']}",
           "job_payload": "@string(outputs('ComposeLogMsg'))",
           "status": "routing"
       "function": {
           "id": "<insert your function reference here>"
   "runAfter": {
       "ComposeLogMsg": [
   "type": "Function"

Just like when we were sending content to the queue, make sure you know what format the objects should go into event hub should be in. My function is called via HTTP, so parameters of the body need to be strings. I opted to use compose to create the payload, then convert that to a string to be output to the event hub. Make sure you know what you’re passing and how it needs to be done as forgetting the proper ‘{‘ or ‘@’ can cause a real headache.

One final Gotcha, WebJobs

Now this one is a truly personal note. When you add Azure Function to a resource group, its currently a valid target for a VSTS web publish. In fact, if you look at the Function in the portal, you can click an option to view the hosting Web App. Once in that web app, you could see any web jobs you may or may not have accidentally deployed to the wrong location (yeah, it happened). I’ve been told that this will eventually be disabled. But in the interim, I wanted to share this little tidbit so nobody else wastes a late night hour trying to figure out why event messages are being consumed when the web jobs that were processing it all are all stopped (or so you thought).

Lesson learned. 🙂

Until next time!

Placement Constraints with Service Fabric

I call this blog my notepad for a reason. Its where I can write things down if for no other reason then to help me remember. There’s the added benefit that over the years a few of you have found it and actually come here to read these scribbles and from what I heard, you sometimes even learn something.

That said, I wanted to share my latest discovery, not just for myself, but for the sake of anyone else that has this need.

Placement Constraint Challenges

As I mentioned in my post on Network Isolation with Service Fabric, you can put placement constraints on services to ensure that they are placed on specific node types. However, this can be challenging when you’re moving your application packages between clusters/environments.

The first one you usually run into is that the moment you put a placement constraint on the service, you can’t deploy it to the local development cluster. When this happens, folks starting trying to alter the local cluster’s manifest. Yes, the local cluster does have a manifest and no, you really don’t want to start changing that unless you have to.

So I set about to find a better way. In the process, I found a couple of simple, but IMHO poorly documented features that make this fairy easy to manage. And manage in a way where the “config is code” aspects of the service and fabric manifests are maintained.

Application vs Service Manifest

There are a couple ways to configure placement constraints. One of the ones you’ll find most readily is the use of code at publication. I’ll be honest, i hate this approach. And I suspect many developers do. This approach while entirely valid, requires you to write even more code for something that can easily be managed declarative via the service and package manifests.

If you dig a bit more, you’ll eventually run across the declaration of constraints in the service manifest.

  <StatelessServiceType ServiceTypeName="Stateless1">

This IMHO, is perfectly valid. However, it has the unfortunate side affect I mentioned earlier. Namely that, despite hours of trying, there’s no easy way to alter this constraint via any type of configuration setting at deployment time. But what I recently discovered was that this can also be done via the application manifest!

  <Service Name="Stateless1">
    <StatelessService ServiceTypeName="Stateless1Type" InstanceCount="[Stateless1_InstanceCount]">
      <SingletonPartition />

I never knew this was even possible until a few days ago. And admittedly, I can’t find this documented ANYWHERE. It was by simple lucky I tried this to see if it would even work. Not only did it work, but I found that any placement constraint declared in the application manifest would override whatever was in the service manifest.

But this still doesn’t solve the root problem. But it does put us a step closer to addressing it.

Environment Specific Application Parameters

As I was trying to solve this problem, Aman Bhardwaj of the service fabric team sent me this link. That link discusses options you have for managing environment specific settings for service fabric applications. Most of the article centers around the use of configuration overrides to change the values of service configuration settings (that are in the PackageRoot/Config/Settings.xml) at publication. We don’t need most of that, what we need is actually much simpler.

The article discusses that you can parameterize the application manifest. This will substitute values defined in the ApplicationParameter files (you get two by default when you create a new service fabric application), In fact, the template generated by visual studio already has an example of this as it sets the service instance count. And since, as I just mentioned we can put the placement constraints into the application manifest… well I think you can see where this is going.

We start by going into the application manifest and adding a new parameter.

  <Parameter Name="Stateless1_InstanceCount" DefaultValue="-1" />
  <Parameter Name="Stateless1_PlacementConstraints" DefaultValue="" />

Note that I’m leaving the value blank. This is completely acceptable as it tells the the service fabric that you have no constraints. In fact, during my testing any value that could not be properly evaluated as a placement constraints will be ignored. So you could put in “azureisawesome” and it will have the same affect as leaving this blank. However, we’ll just keep it meaningful and leave it blank.

With the parameter declared, we’re next going to update the constraint declaration in the application manifest to use it.

  <Service Name="Stateless1">
    <StatelessService ServiceTypeName="Stateless1Type" InstanceCount="[Stateless1_InstanceCount]">
      <SingletonPartition />

So “(isDMZ==true)” (btw, this is a sample custom placement property that was defined when my cluster was created) has become “[Stateless1_PlacementConstraints]”. With this done, now all we need to do is go define environment specific values in the ApplicationParameter files.

You’re most likely going to leave the Local.xml parameter as blank. But the Cloud.xml (or any other environment specific file you provide, is where you’ll specify the environment specific setting.

<?xml version="1.0" encoding="utf-8"?>
<Application xmlns:xsd="" xmlns:xsi="" Name="fabric:/SampleApp1" xmlns="">
    <Parameter Name="Stateless1_InstanceCount" Value="-1" />
    <Parameter Name="Stateless1_PlacementConstraints" Value="(isDMZ==false)" />

With these in place, when you deploy the service, you simply have to ensure you specify the proper configuration file when you deploy.


This is of course the publication dialog for Visual Studio, but that parameter file is also a parameter of the Deploy-FabricAppliation.ps1 file that is part of the project and use to do publications via the command line. So these same parameter files can be used for unattended or automated deployments.

Off to the festival

And there we have it! Which is good because that’s all the time I have for today. My son is heading off to school this week which means my wife and I will officially be “empty nesters”. Today we’re taking him and his girlfriend to the local Renaissance Festival for one last family outing before he leaves. So I hope this helps you all and I bid you a fond “huzzah” until next time!

PS – I have place a sample app with many of these settings already in place on github.