Serverless Openhack – London 2018

Openhacks are awesome! At least that’s my most recent experience at the Serverless Openhack that was held in London in June of 2018.

For those unfamiliar with the concept of an open hack… I’ll warn that there appears to be multiple definitions. For the sake of the event I attended, I would describe it a guided, team oriented, gated challenge-based learning event. You are assigned to a team and the team is given a coach. The objective is to work as a team to complete a series of increasingly complex challenges that help you learn a new technology. The coach is there to help your team if you get stuck and help you explore the alternative solutions for the challenge and the pros/cons of them.

I had volunteered to be a coach for this event and prepared to by running through the challenges in two practice runs. As a hands-on learner this was a great way for me to help upskill on technologies I was already familiar with. Learning a lot from my colleagues as we each added our own expertise. In my case, Service Bus Event Hubs and Event Grid.

Back on topic…

This event was held at the Microsoft Reactor space on the east end of London and about 120 fellow geeks attended this free event. To help them out, we had 20 coaches with a few floaters. As if this wasn’t enough, the 2018 Integrate conference was in town and we had members of the Service Bus, Azure Functions, and Logic Apps teams stop by during the event. So I think its safe to say we had a lot of great minds there to help folks along.

The first day, the attendees showed up and settled in at their assigned tables. We got to know each other, familiarized ourselves with the format of the hack, and set to work on the first challenge. The hack started by laying a “what is an Azure Function” foundation. It ensured that folks had the proper tooling installed and had access to all the materials provided. This also provides the team with an opportunity to determine how they want to collaborate, where/how to share code, and divide up the work on the challenges.

As the teams progress, the challenges get increasingly difficult. They build on the work that was already done to help build out real-world business patterns, for this hack, it was a fictional organic ice cream company. What starts as a single api method, turns into a robust API with scalability, resilient cloud storage, data visualization, and robust monitoring and management.

But the best part is that Openhack is not a contest. Teams don’t compete against each other (at least officially, there’s always a few that are in it to finish first), but are encouraged to go at their own pace and explore the aspects of the technologies that they find most interesting. In some cases, teams will divide up the work by having multiple members each working on different ways to solve the same problem so they can compare learnings afterwards.

I’ve always enjoyed hands on learning and problem solving. Openhack gives you both of these in a great, collaborative package. You don’t have to make up your own problems to solve which, at least for me, helps keep me engaged.

I also enjoy mentoring others. When you learn something, you tend to only learn from your perspective. When mentoring, or coaching others, you learn as much from them and their perspective as they learn from you. So it can be an incredibly rewarding experience.

If you get the opportunity to participate in an Openhack, either as a participant or a coach, I can’t recommend strongly enough that you seize that opportunity. I’ll definitely be on the lookout for more of these in the future since a career in software/information technologies is a never-ending learning experience. An Openhack is a great way to challenge yourself to learn new things.


Loading it on with Event Grid

Last week, I had one of those moments that makes me really appreciate my role at Microsoft. I was fortunate enough to spend 2 days working along side colleagues and members of the Event Grid product group (PG) with the Event Grid. We played with some stuff that hasn’t been announced (sorry, not covering that here), as well as tested out a few scenarios for them and gave feedback on the service.

The scenario I picked was “Event Grid under load”, basically to try and DDOS the service. The objective of this scenario was to send load to a custom topic in Event Grid and see a) could I get throttling to kick in and b) what were the behaviors we saw.  The answer was a) yes and b) educational.

The Solution

Event Grid commits to handle the ingestion of up to 5,000 events per second, but we were encouraged to see how far we could push things. And while there are some solutions out there that can be used to generate simulated load, I went to what I knew… Service Fabric. I opted to create a simple micro-service that could generate traffic, then use a Service Fabric cluster to run multiple copies of that service. The service is pretty simple, about 20 lines of code inserted into the “run” method of a stateless service.

List<EventGridEvent> eventList = new List<EventGridEvent>();

for(int i=1; i <= 1000; i++)
    EventGridEvent myEvent = new EventGridEvent()
        Subject = $"Event {i}",
        EventType = "Type" + i.ToString()[0],
        EventTime = DateTime.UtcNow,
        Id = Guid.NewGuid().ToString(),
        Data = "sample data",
        DataVersion = "1.0"

TopicCredentials topicCredentials = new TopicCredentials(sasKey);
EventGridClient client = new EventGridClient(topicCredentials);
client.PublishEventsAsync(eventgridtopic, eventList);

await Task.Delay(TimeSpan.FromMilliseconds(100), cancellationToken);

I started running a single copy of this service on my development machine using my local development Service Fabric cluster. I played around with settings going from a batch of 100 events once per second and tweaking things until it was finally running as you see above with a batch of 1000 events about 10 times a second. At that point we stood up a 5 node Service Fabric cluster running Standard_D2_v2 instances so we could deploy the service there. I was able to batch up 1000 of my events because I didn’t exceed the 1mb maximum payload of a single publish operation. So if you opt for this same type of bulk send, please keep that in mind when designing your batch size.

Event Grid is pretty smart, so when an event comes in, it will persist it to storage, than look for subscribers that need to receive it. If there are none, it stops there. So to really help finish this scenario, we needed to have a way to consume the load. This could prove much more difficult then generating the load, so we instead opted for an easy solution… Event Hub. We created a subscriber on our Custom Topic that would just send all the event that came in to an Event Hub that was configured to automatically scale up to the maximum limits.

With everything in place, it was time to start testing.

The Results

We started our first real test with one copy of the load generating service on each node in the cluster, generating approximately 50,000 events per second. Using the Azure Portal, we monitored the Custom Topic and its sole subscriber closely and saw that everything appeared to be handling the load admirably. The only real deviation we saw was momentary degradation that we believe presented the Event Hub engaging its automatic scaling to handle the extra traffic. This is important because unlike a typical load test where you’d slowly scale up to the 50,000, we basically went from 0 to 50k in a few seconds.

Raja, a lead architect on Event Grid, was sitting next to me monitoring the cluster in Azure that hosted my custom topic. After getting pretty excited about what I was seeing, he agreed that we could double the traffic. So after a quick redeploy of the solution, we doubled our load to 100k events per second. Things started off great, but after a few minutes we noticed we started seeing steadily increasing failure rates on sends to the topic. When we looked at the data, the cause became apparent. Event Hub was throttling the Event Grid’s attempts to send events to it. This created back-pressure on the Event Grid cluster as it started executing its retry scenarios.

With this finding in mind, we realized that we need to change the test a bit and filter the Event Hub subscriber. We went with a simple option again and opted to have the Event Hub only subscribe to events where the Event Type was “Type 1”. In the code above, we take the first number of the loop (1-1000) and use that as the Event Type. Thus generating 9 different events types. So with the revised filtering, we returned to sending 50k events per second, with the Event Hub only getting slightly less than 5k of them. This worked fine, so we double it again to 100k. This time… because we have eliminated the back-pressure, we were able to achieve 100k without any failures.

With 100k successfully in place, Raja looked at the cluster and gave the thumbs up again for us to double traffic to 200k. We quickly saw that we’d hit our upper limit and we started getting publication errors on the custom topic. By our account, we were hitting the limit at just a nudge past 100k. But we let the load run for awhile and continued to monitor what was going on. As we watched, we saw that our throughput would drop, at times to as low as 30k per second but would never really get above that 100k limit. Raja explained that what was happening was that the cluster had capacity and was giving it to us up to a certain point. But as pressure from other tenants on the cluster came and went, our throughput would be throttled down to ensure that there were resources available for those tenants. In short, we had proven that the system was behaving exactly as intended.

So what did we learn…

The entire experience was great and I had alot of fun. But more importantly, I learned a few things. And in keeping with my role… part of the job is to make sure I pass these learnings along to others. So here’s what we have…

First, while the service can deliver beyond the 5,000 events per second that the PG commits too, we shouldn’t count on that. This burned folks back in the early days of Azure when they would build solutions using Azure Blobs and base their design on tests that showed it was exceeding the throughput targets that were committed to at that time. When things would get throttled down to something closer to those commitments, it would break folks solutions. So when you design and build a solution, be sure to design based on the targets for the service, not observed behaviors.

Secondly, back-pressure from subscriber egress (this include failed deliver retries) can affect ingestion throughput. So while its good to know that Event Grid will retry failures, you’ll want to take steps to keep the amount of resource drain from retries to the minimum. This can mean ensuring that the subscriber(s) can handle the load, but also that when testing your solution, you properly simulate the impact of all the subscribers that you may have in your production environment.

And finally, and I would hope this goes without saying… pick the right solution for the right job. Our simulated load in this case represented an atypical use case for Event Grid. Grid, unlike Event Hub, is meant to deliver the items we need to act on.. the actionable events. The nature of my simulated traffic was more akin to a logging solution where I’m just tossing everything at it. In reality, I’d probably have a log stream, and only send certain events to the grid. So this load test certainly didn’t use it in the matter that was intended.

So there you have it. I managed to throw enough load at both the Event Hub and Event Grid to cause both solutions to throttle my traffic. With a few tweaks, my simple service could also possibly be used to create a nice little (if admittedly overly expensive) HTTP DDOS engine. But that’s not what is important here (even if it was kind of fun). What is important is that we learned a few things about how this service works and how to design solutions that don’t end up creating more problems then they solve.

Until next time!

Azure Event Grid

On January 30th, Azure launched a new service that, at least in Azure terms, represents a bit of a paradigm shift. This new service, the Azure Event Grid, is unlike anything else that exists in Azure, for one simple reason… its part of the “fabric” that is Azure. In Azure, you create a database, a virtual machine, a queue, etc… but there’s no creating a “grid”. The Azure Event Grid just exists. For all purposes, its just part of Azure.

This caused me a bit of confusion at first. I’ve never claimed to be the smartest guy in the room, but I think I’m fairly savy. So if I was confused by this at first, I figured there have to be others that might have similar challenges. So I wanted to put my own version of what this service is down in writing.


Azure Event Grid (we’ll just call it “the grid” for short, and because I’m a Tron fan), conceptually, can be thought of as a model for publishing, and consuming events using a pub/sub model. It has topics to which events are published, and subscribers that allow you to get those events with filtering so you only receive the events you want. Unlike queues and Azure’s Event Hub, grid subscriptions don’t need to be polled, instead they use a message pump approach to push events to a predefined endpoint (usually a web hook). And unlike Event Hub, its not a long term buffer. Instead it does it’s best to retry delivery periodically before the message is finally discarded. For a more detailed breakdown of the differences, be sure to look at the official documentation.

Topics come in two varieties, system topics and custom topics. System topics are created and managed by the various Azure Services. Azure storage, Azure Subscriptions, Events Grid already provides system topics and more are on the way. System topics are “just part of Azure”, so there’s no need to create them and you won’t see them in the portal. They are “just there” and managed by Azure for you to use if you so desire.  If you want to publish your own events, you can create a custom topic and you’ll be able to see and manage the custom topics as you would any other Azure resource. To loop back on the early message, the topic is an input endpoint, and the subscription are output endpoints, the grid is in the middle doing the plumbing between the two so you don’t have too.

To get the messages, you create subscriptions on the topics. A subscription provides the filtering criteria describing the events you want to receive and the endpoint you want events that meet those criteria to be sent too. Where you “see” subscriptions as Azure resource objects, depends on the type of topic they are associated with. If its a custom topic, you’ll see them listed as sub-resource objects of the custom topic they belong too. For system topics, you’ll see subscribers as sub-resources of the system resources you’re subscribing too (sort of like queues within a Service Bus namespace).

One last little catch here. System topics, even though they are part of Azure, are not system wide. For example, you currently can’t subscribe to all events from every storage account within a subscription. You instead have to subscribe to each storage account you want messages on. This means individual subscriptions on each storage account. Fortunately, you can have each of those subscriptions sending events to the same endpoint, giving you consolidated processing. But you do have to manage the individual subscriptions.

System topics do have another important difference from custom topics. A system topic will have an established list of events types, and their schema. These published schemas are there to help you better determine the proper filtering when creating subscriptions on system defined topics. This doesn’t exist for custom topics because you, as you’d expect, have full control over the events that get published into it.

Publishing Events

Like I said, system topics publish their own events and you can’t publish to those same topics. So if you want to have your own events, you will need to create a custom topic and send the events to that. This is done using a simple HTTP operation. In c#, it xould look as follows:

 HttpClient client = new HttpClient();
 client.DefaultRequestHeaders.Add("aeg-sas-key", sasKey); // add in our security header

 HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Post, topicEndpoint) // create our request object
     Content = new StringContent(JsonConvert.SerializeObject(eventList), Encoding.UTF8, "application/json")
 HttpResponseMessage response = client.SendAsync(request).GetAwaiter().GetResult(); // post the events

sasKey is the security token for the custom topic. eventList is a generic List object that contains a collection of our event objects for serialization. That custom object should look something like this.

 public class GridEvent where T : class
     public string Id { get; set; }
     public string Subject { get; set; }
     public string EventType { get; set; }
     public T Data { get; set; }
     public DateTime EventTime { get; set; }

Of course you can do follow the approach above using the language of your choice. I just provided an example in C# as its my primary language. If you are using C#, you may opt to use the .NET SDK instead. If so, it looks a bit like this.

<pre>// create a list object for the events that will be send
List eventList = new List();

// loop through adding events to the list
foreach (var message in eventHubMessages)
    EventGridEvent myEvent = new EventGridEvent()
        Id = Guid.NewGuid().ToString(),
        EventTime = DateTime.UtcNow,
        EventType = $"{eventType}",
        Subject = $"{eventMessage}",
        Data = $"{eventData}",
        DataVersion = "1.0"

// create the topic credential
TopicCredentials topicCredentials = new TopicCredentials(sasKey);

// Create the client object and publish/send to topic
EventGridClient client = new EventGridClient(topicCredentials);
client.PublishEventsAsync($"{gridname}.{regionprefix}", eventList).Wait();</pre>

In this case I’m using the SDK defined EventGridEvent class (naming brought to you by the Department of Redundancy Department). But the general approach is still the same, create a collection, put the event objects into the collection, then send them to the topic endpoint. The collection is used to help batch things together because http handshakes can be resource intensive so this helps reduce the number of handshakes when we’re sending events. You can also find this full SDK example on github.

Consuming Events

As we mentioned, a subscription defines what events you want to receive from a topic and the endpoint they should be sent too. When you create the subscription, you define these settings using a pre-defined schema. The schema represents a JSON object that describes the properties of the subscription. Things like…

includedEventTypes – an array of the event types you want to receive. If not specified, it defaults to all types. If one or more types is specified and you’re creating a subscription on a system topic, the values must match registered event types for that topic.

subjectBeginsWith – the value of the event’s Subject property starts with the specified value. What is contained in the Subject property will depend on the topic you are subscribing too.

subjectEndsWith – the value of the event’s Subject property ends with the specified value

subjectIsCaseSensitive – if the two subject matches should be case sensitive or not

Now the obvious question is why doesn’t Event Grid just support something like REGEX for matching on the Subject property (which is just a string value). I won’t speak for the product team on this but, but my theory is that when you’re building a solution like Azure Event Grid… one that commits to dealing with potentially tens of thousands of events and doing so with single digit millisecond latency… you have to make choices. REGEX, while simple to use, is a bit more complex under the covers and therefore more computationally intensive then a simple string match on the beginning or ending of a string. So I suspect they’ve done this  to maximize throughput while minimizing latency.

When you create the subscription, assuming you’re sending to an https/webhook endpoint, you’ll need to make sure the receiver is prepared to respond to the validation event. Validation of endpoints is important, because without it, Event Grid turns into a giant DDOS engine. The validation event is Event Grid’s way of ensuring that events are only being sent to endpoints that are prepared to receive them. When the subscription is created, it sends a special event to the endpoint and expects a specific response back in return. If its not received, the subscriber will be invalidated and events won’t be published. This helps prevent abuse by ensuring that all endpoints agree to receive events. Another important point here, unless you want to write your own validation code, when creating an Azure Function to be used as an endpoint, be sure to use the Event Grid trigger and not the more generic HTTP trigger. If you want an example of using the HTTP trigger, then I’d recommend this post by Brandon Hurlburt.

Now at this point I could show you code for processing the event once its been delivered. But fortunately the product team has already done a really good job of documenting how to do this. And given that we’ve already covered how to wire things up, I don’t know that I can add anything meaningful.

Event – End of Line

So that’s really it. Event Grid, because its part of Azure, is incredibly easy to use. Its really just a matter of saying “I want to subscribe to these events, and here’s where to send them”.  This makes it a great example of serverless approach because the wiring is taken care of for you so you can focus on just processing the event itself. That said, this service is still very new, so the number of Azure system topics is somewhat limited. But the team behind the Event Grid is working to change this and is also great about listening to suggestions on how to improve the service. If you have problems using the event grid and need some advice, post on StackOverflow with the tag “azure-eventgrid”. If you have a suggestion about the service, something you think is needed or could be improved, then make use of the official Azure Feedback forum.

Lastly, I’d like to thank Bahram, Dan R, and the rest of the Service Bus team for the great work they did bringing Event Grid to market. Also a shout out to Cicil Phillip for his post on sending to event grid with http and C#.

Until next time!

PS – yes, this post was written while listening the Tron Legacy soundtrack on repeat. Please don’t judge. 🙂

Azure Resource Manager Template Tips and Tricks

Its interesting the places technology interests take you. A simple idea that can send you down the rabbit hole and helps you discover new things you never even imagined were possible. Its one such journey that has led me to this particular posting.

This particular journey begins in mid February when I was fortunate enough to be able to participate in a hackfest around the open source solution Nether. As part of this event, I was tasked with helping refactoring the Azure Resource Manager deployment templates. They wanted something that had some consistency, as well as increased flexibility. The week following this, I was on-site with one of my ISV partners where we had a similar need. Both these projects helped drive my understanding and skill with ARM templates to an entirely new level. Along the way I learned a few tips/tricks that I figured I’d pass along to you.

JSON is “object” notation

The first learning is to realize that an ARM template isn’t just a bunch of strings, its defining objects that represent resources you want the Azure providers to create for you. An ARM template is a JSON (javascript object notation) file consisting (for the most part) of key/value pairs, object declarations (stuff inside curly brackets) and arrays (stuff inside square brackets). Furthermore, ARM templates provide us with various functions that can be use to create, manipulate, and insert things in the template.

Now, if you look at something like a simple Windows VM’s ip configuration, we can see this.

"ipConfigurations": [
        "name": "ipconfig1",
        "properties": {
             "privateIPAllocationMethod": "Dynamic",
             "publicIPAddress": {
                  "id": "[resourceId('Microsoft.Network/publicIPAddresses',variables('publicIPAddressName'))]"
             "subnet": {
                 "id": "[variables('subnetRef')]"

This section is an array (square brackets) of objects (curly brackets). And this particular example is associating the VM (well, its NIC actually) with a public IP address and the subnet by setting the values for those particular properties of the IP configuration “object”. But… what about if you’re using a load balancer?

"ipConfigurations": [
        "name": "ipconfig1",
        "properties": {
            "privateIPAllocationMethod": "Dynamic",
            "subnet": {
                 "id": "[variables('subnetRef')]"
            "loadBalancerBackendAddressPools": [
                     "id": "[concat(variables('lbID'), '/backendAddressPools/BackendPool1')]"
            "loadBalancerInboundNatRules": [
                     "id": "[concat(variables('lbID'),'/inboundNatRules/RDP-VM', copyindex())]"

Now the same “object” has a different set of properties. Gone is the publicIPAddress setting, and added is the loadBalancerBackendAddressPools and loadBalancerInboundNatRules. Not a big deal, unless you’re trying to create a template for a VM that can be easily deployed in either configuration. But if we look at the sections I’ve selected above, we realize that our template can actually look more like this.

"ipConfigurations": [
        "name": "ipconfig1",
        "properties": "[parameters('ipConfig')]"

In this example, we still have an array with one object, but rather then defining the individual properties, we’ve instead said that the properties are contained in a parameter that was passed into the template itself. A parameter that looks as follows:

"ipConfig": {
    "value": {
        "privateIPAllocationMethod": "Dynamic",
        "privateIPAddress": "[parameters('privateIP')]",
        "subnet": {
            "id": "[parameters('subnetResourceId')]"
        "publicIPAddress": {
            "id": "[resourceId('Microsoft.Network/publicIPAddresses', variables('publicIPName'))]"

We could also just as easily construct the object in a variable. Which can be really helpful if we have a common set of settings we want to share across multiple objects in the same template.

This realization also opens up a whole new world of possibilities as we can now pass objects as parameters into a template,

"ipConfig": {
    "type": "object",
    "metadata": {
        "description": "The IP configuration for the VM"

and receive as output from a template

"outputs": {
    "subnetIDs" : {
        "type" : "object",
        "value": {
            "frontEnd" : "[variables('subnetFrontEndRef')]",
            "backEnd" : "[variables('subnetBackEndRef')]",
            "management" : "[variables('subnetManagementRef')]"

By using objects and not just simple data types (strings, integers, etc…), we make it a big easier to group values together and pass them around.

Using variables to transform

I mentioned declaring objects in the variable section for reuse. But we can also use variables to transform things. Lets say you’re creating a template for a SQL database. The database needs an edition and a requestedserviceObjectiveName (tier). You could have both values passed into template and then set the properties using those values. But perhaps you want to simplify that for the template’s end user to avoid something like a request for a “Standard” edition with a “P4” service tier. So the template declares an input parameter that looks something like the following.

"databaseSKU": {
    "type": "string",
    "defaultValue": "Basic",
    "allowedValues": [
        "Standard S1",
        "Standard S2",
        "Standard S3",
        "Premium P1",
        "Premium P2",
        "Premium P4",
        "Premium P6",
        "Premium P11",
        "Premium P15"
    "metadata": {
        "description": "Specifies the database pricing/performance."

The user just declares that they want a “Basic”, or “Standard S2”… and the template transforms that into the appropriate settings. In the variables section of the template, we then create a a collection of objects that we can access using the parameter value as a key. Each object in the collection sets the values that can be used to set the properties of the database.

"databasePricingTiers" : {
    "Basic" : {
        "edition": "Basic",
        "requestedServiceObjectiveName": "Basic"
    "Standard" : {
        "edition": "Standard",
        "requestedServiceObjectiveName": "S0"

Since each item in the collection is an object, we can even use it set an entire section of the database configuration just like we did the IP configuration earlier.  Something like..

“properties”: “[variables(‘databasePricingTiers’)[parameters(‘databaseSKU’)]]”

We can even take this a step further, and have more complex templates use simplified sizings such as “small”, “medium”, and “large”, which are used to control all kinds of individual settings across different resources.

Linked Templates

IMHO, there are two advantages to the techniques I just mentioned. Used properly, I feel they can make a template easier to maintain. But just as importantly, these allow for reuse. And reuse is most evident when we start talking linked templates.

A linked template is one that’s called from another template. Its accomplished by providing the URL for where the template is located. This means that the template has to be somewhere that it can be linked to. A web site, or the raw github source link works well.  But sometimes you don’t want to expose your templates publicly.

This is where the powershell script I have in my repo comes in. Among other things, it creates a storage account and uploads all the templates to it.

Get-ChildItem -File $scriptRoot/* -Exclude *params.json -filter deploy-*.json | Set-AzureStorageBlobContent `
    -Context $storageAccount.Context `
    -Container $containerName `

This snippet has been designed to go with the naming conventions I’m using. So it will only get files that start with “deploy-“ and end in “.json”. It also ignores any files that end in “params.json”, so I can include parameter files locally for testing purposes and not have to worry about uploading them accidentally. My GitHub repo has taken this a step further and ignores any files that end in privateparams.json so I don’t accidentally check them in.

I’d like to call out the work of Stuart Leeks on this. He did the up front work as part of the Nether project I mentioned earlier. I just adapted it for my needs and added a few minor enhancements in. I’ve worked with Stuart on a few things over and years and its always been a pleasure and great learning experience. So I really appreciate what I learned from him and for him as a result of some of the work on the Nether project. Back to the task at hand.

With the files uploaded, we then have to link to them. This is why some of my templates have you pass in a templateBaseURL and templateSaaSToken.  I’ve parameterized these values allow me to construct the full URI for where the files will be located. Thus I could pass in the following for templateBaseURL if I just wanted to access them from my GitHub repository:

the templateSaaSToken is there in case you want to use a shared access signature for a blob container to access the files.

Any ARM template can pass values out. But when combined with linked templates, we can now take those outputs and pass them into subsequent templates.  Something like…

"sqlServerFQDN": { "value": "[<strong>reference('SQLDatabaseTemplate').outputs.</strong>databaseServerFQDN.value]" }

In this case “reference” says we are referencing the runtime values of an object in the current template (in this case of a linked template). From there we want its outputs and specifically the one named databaseServerFQDN, and finally its value property. In the template that outputs these values they are declared like…

"outputs": {
    "databaseServerFQDN" : {
        "type" : "string",
        "value": "[reference(variables('sqlDBServerName')).fullyQualifiedDomainName]"
    "databaseName" : {
        "type" : "string",
        "value": "[parameters('databaseName')]"

Outputs is an object, that contains a collection of other objects. Note the property outputs.databaseServerFQDN.value. We could also get databaseServerFQDN.type if we wanted. Or access the database name properties.

What’s also important here is the reference function. You may have seen this used in other places and thought it was interchangeable with the resourceID function. But its when you work with linked templates that it really shines. The reference function is really telling the resource provider to wait until the item I’m getting a reference to has completed, then give me access to its run time properties. This means that you don’t even need a “dependson” for the other template as the resource function will wait for that template to complete already. But me, I like putting it in anyways. Just to be safe.

The other big item here is that when we call a linked template, we have to give it a name. And here’s why… Each template is essentially run independently by Azure’s resource manager. So if you have a master template, that’s using 4 linked templates and then you check the resource group’s deployment history, you’d actually see 5 deployments.

multiple deployments resulting from a single master template with multiple linked templates.

Now the reason these deployment names are important is because the resource manager will track them and won’t allow two deployments with the same name to run at the same time. This isn’t a big deal most of the time. But earlier in this post, I described creating a reusable virtual machine template. That template is used by a parent template to create a resource. And if I have 2-3 of those parents running, I need to make sure that names don’t collide.

Now the handy way to avoid this is to reference a run-time value with the ARM template… deployment(). This exposes properties about the deployment such as the name. So when calling a linked template, we can actually craft a unique name by doing something like…

concat(deployment().name, ‘-vm’)

This allows each deployment template to take the parent’s name and add its own unique suffix on. Thus (hopefully) helping avoid having to deal with non-unique nested names. If you look at the image to the right, you’ll see deployments like jumpboxTemplate and jumpboxTemplate-vm. The later deployment is a reusable template that is linked from the former. And I’m using the value of deployment to set name of the vm template deployment. The same is also present in loadbalancedvmTemplate-lbvms000 and 0001. In that case, this is two VMs being deployed using the same linked template, but in this case being done multiple times as part of a copy loop in the parent template.

Other Misc Learnings

As if all this wasn’t enough, there were a couple other tips I wanted to pass along.

When I was working with Stuart on the Nether project, we wanted a template that would add a consumer group to an existing Service Bus event hub. Unfortunately, all the Service Bus templates we could find only showed the creation of the consumer group as part of creating the event hub via an approach called nested resources. I was able to quickly figure out how to create the consumer itself, but the challenge was how to then reference it.

When working with nested resources it is important to understand the paths present in both the resource type and its name. In the case of our consumer group, we were quickly able to determine that the proper resource type would be Microsoft.EventHub/Namespaces/EventHubs/ConsumerGroups.

You might assume that now that you have the type path, you’d just specify the resource name as something like “myconsumer”. But with nested resources, its gets more complicated The above type represents 3 nested tiers. As such, the name needs to follow suit and have the same number of tiers. So I had to actually name to something more like //.

Stuart pointed me to a tip he learned on another project. Namely that these two values are combined like the teeth on a zipper to create the path to the resource:


Once I realized this, a light bulb went off. This full name actually reflects the same type of value you usually get back from a call to the resourceId function. This function accepts two parameters, a type, and a name, and essentially zips them together while also adding on the leading value based on the current resource group (subscription and the like). You can even see this full path when you look at the properties for an existing object in the Azure portal.

Now the second tip is about the provider API versions. I often asked why I put these values into a variable and what should be the right value. Well, I put them in a variable because it means there’s less I have to accidentally mess up when creating a template. It also means that if/when I want to update the version of an API I’m using, I only have to change it once.

But as for the big question about how do we know what versions of the API exist… I got that tip from Michael Collier (former Azure MVP and currently one of my colleagues) who in turn got it from another old friend, Neil MacKenzie. They pointed out that you can get these pretty easily via Powershell.

(Get-AzureRmResourceProvider -ProviderNamespace Microsoft.Compute).ResourceTypes | where {$_.ResourceTypeName -eq 'virtualMachines'} | select -ExpandProperty ApiVersions

This powershell command will spit out the available API versions for Microsoft.Compute/VirtualMachines.

New versions are shipped all the time and its great to know I can be aware of them without having wait for someone to publish a sample template with those values in them.

My last item comes from another colleague, Greg Oliver. Greg has found what when you’re working on templates, you really get slowed down waiting for each deployment to finish, then get deleted, then start the deployment over again. So he’s taken to adding an ‘index’ parameter to his templates. Then, when he runs them, he simply increments the value (index++). Then, while the new deployment is running, you can go ahead and start deleting the old one. There can be several “old” iterations in the process of deleting while you continue to work on your template. Something like this could also be used  as part of my suffix approach, but Greg has gone the extra mile to make the iteration its own parameter. Awesome time saving tip!

All in all, I think these are some great, if little known, ARM tips.

Deployment Complete

I wish I could say that these tips and tricks will make building ARM templates easier. Unfortunately, they won’t. Building templates requires lots of hands on practice, patience, and time. But I hope the tips I’ve discussed here might help you craft templates that are easier to maintain and reuse.

To help illustrate all these tricks (and a few less impressive ones), I’ve created a series of linked templates and put them in a single folder on GitHub. These include a PowerShell script to run the deployment as well as sample parameter files. I hope to continue to tweak these as I learn more including adding into the PowerShell script some options to help prevent issues with dns name collisions. Hopefully they’ll work without any issues, but if you run into something, please drop me a line and let me know.

Until next time!

Azure Administration with Certificates

You know those annoying “back in my day” stories your crazy uncle would tell at the holidays? Well I have one of those. Fortunately for all of us, I’m also here to tell you how to accomplish this using today’s tech. But before we get to the cool stuff, I want to hop into the wayback machine for a few. So come along Sherman. BTW, that’s an old cartoon joke for you “yung’uns”.

Once upon a time, we had the Azure Management Service

Before the world of the Azure Resource Manager (ARM), there was ASM, the “Service Management” interface. Much like ARM, it was a REST based API for interacting with Azure’s resources. But this predated Azure Active Directory, so the only way to interact with it outside of a username/password was via “management certificates”. These were just any old certificate that you happened to have around, but what made them special is that when you uploaded them to the Azure portal, you could then use them to interact with the management API via the command line. And best yet, these certificates gave you co-administrator authority. Essentially allowing you to do almost anything with the subscription that you wanted.

By the time ARM showed up about 2 years ago, we had Azure AD. And folks were starting to look at Role Based Access Control (RBAC) and the notion of “just enough security”. Basically they didn’t want to give a certificate that could be passed around between developers. So Azure stepped up its RBAC implementation, and more and more services started supporting role-based security.

Adding a new user to Azure AD and granting it permissions is easy enough. But what if you need to have a process execute something against the ARM api? You’d have to set up a user identity, complete with password for the process. Then store the username/password somewhere secure. This is where the management certificates had a nice advantage. A certificate can be installed on a server and flagged as “not exportable”. So the certificate becomes a credential that a process can use. But how do we do this in the new Azure AD world?

Azure AD has the concept of Applications. And these applications could be associated with a certificate. And together, they form a service principle. A principle that gives us the best of both worlds.

Creating your certificate

The first step in setting this up is to create a certificate we can use. And not any old certificate will do, there’s a couple gotcha’s here. First off, to use a certificate as a service principle in Azure, it needs to have a common name. I don’t know why, but this seems to be something that’s enforced by Azure AD. The second is (and I hope this is considered a no-brainer), don’t generate a certificate using a SHA1 hash algorithm. If you missed the news, SHA1 is no longer considered secure enough and is being deprecated across the industry.

With these requirements in hand, the next step is to set about creating the certificate. On Windows, you may be tempted to reach for that old stand-by, MakeCert. Please don’t. Instead, open up PowerShell and find the New-SelfSignedCertificate cmdlet.

Using this cmdlet, we can use this snippet of PowerShell to generate a self-signed certificate

$certSubject =
Read-Host -Prompt “Issue By/To for the certificate”

$cert = New-SelfSignedCertificate `
    -CertStoreLocation Cert:\CurrentUser\My `
    -Subject "CN=$($certSubject)" `
    -KeySpec KeyExchange `
    -HashAlgorithm SHA256

This creates the certificate using the SHA256 hash, the common name we specified, and places it into the current user’s Personal/Certificate store. You can then use the MMC (Microsoft Management Console, windows key + R, then MMC) to locate the certificate and export it using the “save as file” option.

Creating the Application and Principle

The PowerShell above doesn’t just create the certificate, it also returns some of its attributes so we can continue to use them. So the next step is to start constructing our service principle. We’ll continue using PowerShell and start by capturing the details of the certificate that we’ll need later.

# Get certificate thumbprint
$certThumbprint = $cert.Thumbprint

# Get public key and properties from selected cert
$keyValue = [System.Convert]::ToBase64String($cert.GetRawCertData())
$keyId = [guid]::NewGuid()
$startDate = $cert.NotBefore
$endDate = $cert.NotAfter

These lines capture the certificate thumbprint and other data such as the certificate start/end dates. We also create a GUID that will uniquely identify the credential. Then, we use these values to create a PowerShell credential, a PSAADKeyCredential object.

Import-Module `
-Name AzureRM.Resources

$keyCredential =
New-Object -TypeName Microsoft.Azure.Commands.Resources.Models.ActiveDirectory.PSADKeyCredential

$keyCredential.StartDate = $startDate
$keyCredential.EndDate = $endDate
$keyCredential.KeyId = $keyId
$keyCredential.CertValue = $keyValue

We start by importing the Azure Resource Manager module that contains what we need to create the Azure AD application/principle. We then create a blank credential object and populate it with the values we saved off earlier.

With this done, we need to log into Azure (when run, you’ll need to enter your Azure username/password credentials), and create the Azure AD Application using the PowerShell credential object that’s based on our newly created certificate.


# Define Azure AD App values for new Service Principal
$adAppName =
Read-Host -Prompt “Enter unique Azure AD App name”

#these aren't needed for our uses, but need to be completed anyways
$adAppHomePage = "<a href="http://$(">http://$(</a>$adAppName)"
$adAppIdentifierUri = "<a href="http://$(">http://$(</a>$adAppName)"

# Create Azure AD App object for new Service Principal
$adApp =
New-AzureRmADApplication `
    -DisplayName $adAppName `
    -HomePage $adAppHomePage `
    -IdentifierUris $adAppIdentifierUri `
    -KeyCredentials $keyCredential

Write-Output “New Azure AD App Id: $($adApp.ApplicationId)”

Its important to keep in mind, that the New-AzureRMADApplication cmd requires you to have permissions in the subscription’s Azure AD tenant to create the application. If your Azure AD tenant is tightly managed, or perhaps even bound to an on-prem Active Directory or Office365 tenant, this isn’t likely going to be the case. So this step may need to be executed by someone with the proper Azure AD administrative permissions.

With the application created,  we save the Application ID (save this for later, we’ll need it) and we’re then ready to create its associated service principle.

$principleID = New-AzureRmADServicePrincipal  `
    -ApplicationId $adApp.ApplicationId

Granting Permissions and Using the Credential

With the application and service principle created, we can now grant it permissions like we would a user. But instead of specifying the user, we specify the Service Principle via the Application ID that we saved earlier. . You should only give the principle the minimum permissions, but since I’m only doing an example here, I’ll give you full access as an owner of the subscription.

New-AzureRmRoleAssignment `
    -RoleDefinitionName Owner `
    -ServicePrincipalName $adApp.ApplicationId

Now a special note if you’re scripting all this. Between adding the principle and assigning it to a role, there’s some lag. This is because in the cloud the service that’s doing the add and the one that’s granting permissions may not be the same. So there’s some latency regarding the data between the two services. So you may see a few seconds (generally less than 10-15) before the addition of the principle and being able to add the permission succeeds. Not a big deal if you’re doing these commands by hand, but if you’re scripting the process, you’ll need to address this.

With the principle created and given permission to access the subscription, we can then use it to log into Azure.

Login-AzureRmAccount `
    -ServicePrincipal `
    -TenantId $tenantId `
    -ApplicationId $appId `
    -CertificateThumbprint $certThumbprint

The tenantId is the Azure AD tenant we’re logging in to, the $appId is the application we created (I told you to save that), and the $certThumbprint is the thumbprint of the certificate that’s installed on the machine that will be used to log in. Under the covers the certificate is retrieved and then presented to Azure AD to authenticate our login request.

With this done, you’re now able to continue on, doing whatever other PowerShell commands you need.

Session Terminated

So that’s about it. Hopefully my explanation is straight forward enough. I’ve put two scripts up on github that show what we’ve just covered end to end. But before I end this completely I need to give some credit where its due. This post is based on the great work of Keith Mayer. You can even find his original version on github as well. What he pulled together is SOOO much more elegant than anything I could have done. So major kudos to him.

Until next time!

Azure Logic Apps, Functions, and Service Bus

Here we are yet again. Me writing something on this blog if for no other reason then to document something I learned. There’s no real narrative behind this one other than I built another POC for a partner and in the process found some things I wanted to pull together.

The story here is about digging beyond the Logic App designer and interacting with Service Bus queues, topics, and event hubs. Access and manipulating the message properties as we start chaining Logic App workflows together with functions and custom code.

Since we’re going beyond what the designer currently supports, we’ll look exclusively at the “code view” for everything.

Sending to a Queue

The first step was to create a workflow that would accept an HTTP request and use that to create a message in an Azure Service Bus queue. In doing this ‘simple’ task I learned two things, how to compose an object and how to set a custom message property.

The compose action allows you to construct a JSON object from various inputs. I wanted to be able to send a message to a queue for further processing as well as to event hub for logging. So being able to compose the object once and reuse it was VERY handy.

"ComposeJobMsg": {
   "inputs": {
       "JobID": "@{body('SaveJobtoDatabase')?['OutputParameters']['JobID']}",
       "customer": "@{triggerBody()?['customer']}",
       "job_payload": "@triggerBody()?['job_payload']",
       "job_type": "@{triggerBody()?['job_type']}"
   "runAfter": {
       "SaveJobtoDatabase": [
   "type": "Compose"

This action takes input from the workflow trigger and the result of a previous stored procedure, SaveJobToDatabase, and constructs a simple JSON object with four properties (with horribly inconsistent naming conventions I know).

With the message object created, I can now send it to a queue, specifying the output of the compose operation as the ContentData for my queue message. The SendToQueue action’s body looks like this:

"body": {
   "ContentData": "@{encodeBase64(string(outputs('ComposeJobMsg')))}",
   "ContentType": "JSON",
   "Properties": {
       "job_type": "@{triggerBody()?['job_type']}"

There are a few things going on here I want to point out. We’re taking the output of the compose action, outputs(‘ComposeJobMsg’), and converting it from a JSON object to a string. We then base64 encode that string to ensure it will survive transport through the queue. We’re also starting the ContentData value with ‘@{‘ to designate that we’re using a parameter value and we want to treat it as a string. Using ‘{‘ to inform the Logic App that its a string is unnecessary, but sometimes its nice to err on the side of caution. You can learn more about the use of expressions like ‘@‘ and ‘{‘ in the Workflow Definition Language documentation.

Next up, we make sure to set the ContentType as “JSON”.  And finally I add a custom property, “job_type” and set its value to parameter that was on the workflow trigger (again treating that value as a string).

Queue as a Trigger

This is where things started to get interesting. I created a second workflow that is triggered “when a message is received’ and set it to run at 30 second intervals. But this created a problem with trying to update the workflow. Currently (this is something that’s being worked on), the Logic Apps connector takes advantage of Service Bus Queues’ long polling capabilities. Long polling is great because it helps reduce the latency between when a message arrives and it can be processed. So even though the workflow was set to check every 30 seconds… when a message want sent to the queue, it triggered the workflow almost immediately.

The reason for this is that the workflow is not actually running at 30 second intervals, but instead starts polling the queue and waits for that to time out, then it’ll wait 30 seconds and poll again. Where this creates an issue is that if you are in active development, you’re likely going to be changing the workflow every few minutes. Tweak this… run a test… adjust that, run a test. When the workflow start’s polling, its going to wait about 10 minutes for that operation to time out. So even though the “save” works fine, any changes you’re making won’t take affect until the next time the workflow is triggered (after the long-polling times out).

The recommendation I was given, that worked really great (thanks Jeff), is to change the  interval to something like once a day. Then via the portal, we can use the “run trigger” feature to kick off a one-time run of the workflow. So what I would do is I’d modify the workflow, submit a test message to the queue, then manually trigger it. Admittedly, its not as smooth as I’d like, but it gets the job done. The product team seems aware of this so I’m hopeful this workaround won’t be needed for the long term. Once development is complete, the “production” version of the workflow can use a normal timing setting, as long as we’re aware of and OK with the long polling behavior.

Accessing Queue Message Properties

I wanted to be able to consume events both via a Logic App workflow, as well as from some C# code. The workflow portion would look at the job_type property I set above, and use that in a condition to control routing of the message to another queue. If you’re using the drag/drop workflow designer, its pretty easy to get at the queue message ContentData. If you click on it in the designer, the code behind will insert something like this:


Something was pointed out to me (thanks again Jeff!) and the lighbulb went off.  Note that ContentData is the exact same property we set when we sent the message to the queue up above. So if we wanted to access the job_type value we set, we simply access the Properties collection like so:


You can’t currently do this via the designer, so you’ll need to flip over to code view if you want to access the individual properties within the Properties collection.

But what if we want at the actual payload of ContentData? The JSON object is there, we just have to reverse what we did when we put it into the message. We’ll use a couple Workflow Definition Language functions undo the base64 encoding and get the string content. That string is JSON, so we use the json method to convert it to an object. Once its an object again, we can then access any properties within it, such as the JobID we set when we composed the original object.


In C#, if we want to get at the contents of our object, we do a similar process to get the body of the BrokeredMessage object, and transform that JSON payload into an object.

// get the message body
var body = message.GetBody<Stream>();
string jsonJob = new StreamReader(body, true).ReadToEnd();

// convert message body to object
dynamic job = JsonConvert.DeserializeObject(jsonJob);

What about Event Hub?

There isn’t a connector for Event Hub (at least as of the authoring of this post). So I created an Azure function (code on github) to do this for me. It accepted a few parameters and put it into the event hub so I could later process them via stream analytics. Calling it from the workflow was then pretty straight forward.

"LogToEventHub": {
   "inputs": {
       "body": {
           "JobID": "@{json(base64toString(triggerBody()['ContentData']))['JobID']}",
           "customer": "@{json(base64toString(triggerBody()['ContentData']))['customer']}",
           "job_payload": "@string(outputs('ComposeLogMsg'))",
           "status": "routing"
       "function": {
           "id": "<insert your function reference here>"
   "runAfter": {
       "ComposeLogMsg": [
   "type": "Function"

Just like when we were sending content to the queue, make sure you know what format the objects should go into event hub should be in. My function is called via HTTP, so parameters of the body need to be strings. I opted to use compose to create the payload, then convert that to a string to be output to the event hub. Make sure you know what you’re passing and how it needs to be done as forgetting the proper ‘{‘ or ‘@’ can cause a real headache.

One final Gotcha, WebJobs

Now this one is a truly personal note. When you add Azure Function to a resource group, its currently a valid target for a VSTS web publish. In fact, if you look at the Function in the portal, you can click an option to view the hosting Web App. Once in that web app, you could see any web jobs you may or may not have accidentally deployed to the wrong location (yeah, it happened). I’ve been told that this will eventually be disabled. But in the interim, I wanted to share this little tidbit so nobody else wastes a late night hour trying to figure out why event messages are being consumed when the web jobs that were processing it all are all stopped (or so you thought).

Lesson learned. 🙂

Until next time!

Placement Constraints with Service Fabric

I call this blog my notepad for a reason. Its where I can write things down if for no other reason then to help me remember. There’s the added benefit that over the years a few of you have found it and actually come here to read these scribbles and from what I heard, you sometimes even learn something.

That said, I wanted to share my latest discovery, not just for myself, but for the sake of anyone else that has this need.

Placement Constraint Challenges

As I mentioned in my post on Network Isolation with Service Fabric, you can put placement constraints on services to ensure that they are placed on specific node types. However, this can be challenging when you’re moving your application packages between clusters/environments.

The first one you usually run into is that the moment you put a placement constraint on the service, you can’t deploy it to the local development cluster. When this happens, folks starting trying to alter the local cluster’s manifest. Yes, the local cluster does have a manifest and no, you really don’t want to start changing that unless you have to.

So I set about to find a better way. In the process, I found a couple of simple, but IMHO poorly documented features that make this fairy easy to manage. And manage in a way where the “config is code” aspects of the service and fabric manifests are maintained.

Application vs Service Manifest

There are a couple ways to configure placement constraints. One of the ones you’ll find most readily is the use of code at publication. I’ll be honest, i hate this approach. And I suspect many developers do. This approach while entirely valid, requires you to write even more code for something that can easily be managed declarative via the service and package manifests.

If you dig a bit more, you’ll eventually run across the declaration of constraints in the service manifest.

  <StatelessServiceType ServiceTypeName="Stateless1">

This IMHO, is perfectly valid. However, it has the unfortunate side affect I mentioned earlier. Namely that, despite hours of trying, there’s no easy way to alter this constraint via any type of configuration setting at deployment time. But what I recently discovered was that this can also be done via the application manifest!

  <Service Name="Stateless1">
    <StatelessService ServiceTypeName="Stateless1Type" InstanceCount="[Stateless1_InstanceCount]">
      <SingletonPartition />

I never knew this was even possible until a few days ago. And admittedly, I can’t find this documented ANYWHERE. It was by simple lucky I tried this to see if it would even work. Not only did it work, but I found that any placement constraint declared in the application manifest would override whatever was in the service manifest.

But this still doesn’t solve the root problem. But it does put us a step closer to addressing it.

Environment Specific Application Parameters

As I was trying to solve this problem, Aman Bhardwaj of the service fabric team sent me this link. That link discusses options you have for managing environment specific settings for service fabric applications. Most of the article centers around the use of configuration overrides to change the values of service configuration settings (that are in the PackageRoot/Config/Settings.xml) at publication. We don’t need most of that, what we need is actually much simpler.

The article discusses that you can parameterize the application manifest. This will substitute values defined in the ApplicationParameter files (you get two by default when you create a new service fabric application), In fact, the template generated by visual studio already has an example of this as it sets the service instance count. And since, as I just mentioned we can put the placement constraints into the application manifest… well I think you can see where this is going.

We start by going into the application manifest and adding a new parameter.

  <Parameter Name="Stateless1_InstanceCount" DefaultValue="-1" />
  <Parameter Name="Stateless1_PlacementConstraints" DefaultValue="" />

Note that I’m leaving the value blank. This is completely acceptable as it tells the the service fabric that you have no constraints. In fact, during my testing any value that could not be properly evaluated as a placement constraints will be ignored. So you could put in “azureisawesome” and it will have the same affect as leaving this blank. However, we’ll just keep it meaningful and leave it blank.

With the parameter declared, we’re next going to update the constraint declaration in the application manifest to use it.

  <Service Name="Stateless1">
    <StatelessService ServiceTypeName="Stateless1Type" InstanceCount="[Stateless1_InstanceCount]">
      <SingletonPartition />

So “(isDMZ==true)” (btw, this is a sample custom placement property that was defined when my cluster was created) has become “[Stateless1_PlacementConstraints]”. With this done, now all we need to do is go define environment specific values in the ApplicationParameter files.

You’re most likely going to leave the Local.xml parameter as blank. But the Cloud.xml (or any other environment specific file you provide, is where you’ll specify the environment specific setting.

<?xml version="1.0" encoding="utf-8"?>
<Application xmlns:xsd="" xmlns:xsi="" Name="fabric:/SampleApp1" xmlns="">
    <Parameter Name="Stateless1_InstanceCount" Value="-1" />
    <Parameter Name="Stateless1_PlacementConstraints" Value="(isDMZ==false)" />

With these in place, when you deploy the service, you simply have to ensure you specify the proper configuration file when you deploy.


This is of course the publication dialog for Visual Studio, but that parameter file is also a parameter of the Deploy-FabricAppliation.ps1 file that is part of the project and use to do publications via the command line. So these same parameter files can be used for unattended or automated deployments.

Off to the festival

And there we have it! Which is good because that’s all the time I have for today. My son is heading off to school this week which means my wife and I will officially be “empty nesters”. Today we’re taking him and his girlfriend to the local Renaissance Festival for one last family outing before he leaves. So I hope this helps you all and I bid you a fond “huzzah” until next time!

PS – I have place a sample app with many of these settings already in place on github.

Containers for Windows Developers (a reference)

So in about 36 hours I’ll be presenting at the 2016 Cloud Develop conference in Columbus, OH on “Containers for Windows Developers”. I know on the surface this may seem like a departure from my normal Azure focus. But I firmly believe (and did prior to our Windows Containers for ACS announcement) that containers are going to be a real driving force in the cloud.

I started exploring this in the fall of 2015 shortly after we announced containers for Windows Server 2016 and have continued to follow it. I’ve had to set it aside for other priorities but continued to return to this from time to time. And while the story is still not complete (Windows Server 2016 is still in preview), I’m pleased to see the story is now mature enough that I’m comfortable talking about it. Pretty handy since we’re this close to the conference with a session I proposed several months ago.

What I found as I dug into this that while there’s alot of documentation out there, but I don’t feel it speaks to a key audience. Namely Windows developers. Most of the materials I found are focused on folks that are either already familiar with containers (from Linux) or are from the operations side of the house. So I wanted to come up with materials from the viewpoint of the Windows developer.

Unfortunately, the entire story is something for another day. The purpose of this post is to provide some links that I found useful as I set about learning containers on Windows server. There’s simply to much to list on a slide so this post is a placeholder to those links that I can reference from the presentation

  • Enjoy!


Windows Containers: What, Why and How – Build 2015

Setting the Stage: The Application Platform in Windows Server 2016 – Build 2016

Windows Server containers, Docker, and an introduction to Azure Container Service – Azure Con 2015


Windows Containers Overview – MSDN Documentation

Windows Containers on Windows 10 – MSDN Documentation

What is Docker? – Tim Butler

Docker Basics: A practical starters guide – Tim Butler – unknown

The presentation that is based on my journey/experience is available on github. Please note that the content can (and likely will) continue to be edited right up until just minutes before I present it. Maybe I’ll see you there!

Network Isolation/Security with Azure Service Fabric

There are times you really need to take things beyond the “file new” experience and implement a more advanced scenario. And with these opportunities, there are times you realize that what you need likely isn’t a “one off” kind of thing. There are larger implications to what you need that can help solve a myriad of problems. This is the story of one these scenarios.

I was recently working with a partner as they explored Service Fabric. They liked what they saw, but there was a “but” (there almost always is). This partner is in the government space, and one of the requirements they had is that all public facing services are isolated and secured from any “back end” services (in a DMZ). If you’ve been doing IT for any length of time, this shouldn’t come as news. But the question they had for me was how to do this with Service Fabric.

There were a couple ways to address this that immediately came to mind. We could deploy the front end web application as an Azure Web App, hosted in an App Service Environment that was joined to the same VNet as the Service Fabric Cluster. We could also set up two Service Fabric clusters, again joined by a single VNet. The issue with both of these is that the front and back ends of the solution would need to be deployed and managed separately. Not a huge deal admittedly. But this did complicate the provisioning and deployment processes a bit, as well as seemed to run counter to the idea of a Service Fabric “application”, composed of multiple services as a single entity. I was fortunate that I had previously engaged my friend and colleague Kal to bring his considerable Service Fabric experience into play with this partner, and he suggested a third option, one we all found fairly intriguing.

A Service Fabric cluster has Node Types which are directly related to VM Scale Sets. Taking advantage of this, we could place different node types into different subnets and place Network Security Groups (NSGs) on the subnets to provide the level of isolation the partner required. We would then use Placement Constraints to ensure that the services within an application are only hosted in the proper subnet by using constraints specific to the node type, or types, in that subnet.

We ran the idea by Mark Fussell,  the lead Project Manager of the Service Fabric team. As we talked, we realized that folks had secured a cluster from all external access, but there didn’t appear to be a public, previously documented version of what we were proposing. Mark was supportive of the idea, and even offered up that in some of the “larger” Service Fabric clusters, the placement constraint approach has been used to ensure that the services that make up the Service Fabric Cluster remain isolated from those that comprise the applications deployed within it.

Our mission clear, I set to work! We were going to create a Azure Resource Manager template to create our “DMZ’d Service Fabric Cluster”.

Network Topology 

The first step was to create the overall network topology.


We have the front end subnet, which has a public load balancer that would handle traffic from the internet via a load balancer. There is a back end subnet with an internal load balancer that does not allow any connections from outside of the virtual network (using a private IP). Finally, we have a management subnet that contains the cluster services, including the web portal (on port 19080) and TCP client API (19000). For good measure, we’re also going to toss an RDP jump box into this subnet so if something goes wrong with any of the nodes in the cluster, we can remote in and troubleshoot (something that I used the heck out of while crafting this template).

With this in place, we then define the VM Scale Sets, and bind their network configurations to the proper subnets as follows:

"networkInterfaceConfigurations": [ 
    "name": "[variables('nodesMgmnt')['nicName']]", 
    "properties": { 
      "ipConfigurations": [ 
          "name": "[concat(variables('nodesMgmnt')['nicName'],'-',0)]", 
          "properties": { 
            "loadBalancerBackendAddressPools": [ 
                "id": "[variables('lbMgmnt')['PoolID']]"
            "subnet": { 
              "id": "[variables('subnetManagement')['Ref']]"
      "primary": true

With the VM Scale Sets in place, then we moved on to the Service Fabric Cluster to define each Node Type. Here’s the cluster node type definition for the management subnet node type.

  "name": "[variables('nodesMgmnt')['TypeName']]", 
  "applicationPorts": { 
    "endPort": "[variables('svcFabCluster')['applicationEndPort']]", 
    "startPort": "[variables('svcFabCluster')['applicationStartPort']]"
  "clientConnectionEndpointPort": "[variables('svcFabCluster')['tcpGatewayPort']]", 
  "durabilityLevel": "Bronze", 
  "ephemeralPorts": { 
    "endPort": "[variables('svcFabCluster')['ephemeralEndPort']]", 
    "startPort": "[variables('svcFabCluster')['ephemeralStartPort']]"
  "httpGatewayEndpointPort": "[variables('svcFabCluster')['httpGatewayPort']]", 
  "isPrimary": true, 
  "placementProperties": { 
    "isDMZ": "false"
  "vmInstanceCount": "[variables('nodesMgmnt')['capacity']]"

The “name” of this Node Type, must match the name of a VM Scale Set, that’s how the two get wired together. Since this sample is for our “management” node type, it would also be the only one with the isPrimary property set too true.

At This point, we debugged the template and made sure the cluster to ensure it was valid and the cluster would come up “green”. The next (and harder step) is to start securing the cluster.

Note: If you create a cluster via the Azure portal with multiple node types, each node type will get its own subnet. However, we were after a reusable ARM template so we had to configure things ourselves.

Network Security

Unfortunately, when we set out to create this, there wasn’t much publicly available on the ports that were needed within a fabric cluster. So we had to do some guesswork, some heavy digging, as well as make a wishes for some good luck. So in this section I’m hoping to lay out some of what we learned to save others the effort.

First off, we started by blocking all inbound connections on the three subnets. I then opened ports 19080 (used by the Service Fabric web portal) and 19000 (used by the Fabric Client and Powershell) for the “management” subnet so I could interact with the cluster remotely. This was all done via the Azure Portal, interactively so we could test the rules out then use the Resource Explorer to export them to our template. We assumed that with these rules in place, we would see some of the nodes in the cluster go “red” or unhealthy. But we didn’t!

It took a day or so, but we eventually figured out that we were seeing two separate systems collide. Firstly, when a VM is brought up, the Service Fabric extension is inserted into it. This extension then registers the node with the cluster. As part of that process there’s a series of connections that are established. These connections are not ephemeral, remaining up for the life of the node. Our mistake was in assuming these connections, like we encourage most of our partners to do when building applications, were only temporary and established when they were needed.

Since these are established, persistent connections, they are not impacted when new NSG rules are applied. This makes sense since the NSG rules are there to interrogate any new connection requests, not look over everything that’s already been established. So the nodes would remain green until we rebooted them (tearing down their connections) and they tried (and failed) to re-establish their connection to the cluster.

This sorted out, we set about trying to place the remainder of the rules in place for the subnets. We knew we wanted internet connectivity to any application/service ports in the front end, as well as application/service ports in the backend from within the VNet. But what we were missing was the ports that Service Fabric needed. We found most of these in the cluster manifest:

  <ClientConnectionEndpoint Port="19000" /> 
  <LeaseDriverEndpoint Port="1026" /> 
  <ClusterConnectionEndpoint Port="1025" /> 
  <HttpGatewayEndpoint Port="19080" Protocol="http" /> 
  <ServiceConnectionEndpoint Port="1027" /> 
  <ApplicationEndpoints StartPort="20000" EndPort="30000" /> 
  <EphemeralEndpoints StartPort="49152" EndPort="65534" /> 

This worked fine at first. We stood up the cluster with these rules properly in place and the nodes were all green. However, when we’d tried to deploy an app to the cluster, it would always time out during the copy step. I spent a couple hours troubleshooting this one to eventually realize that it was something inside the cluster that was still blocked. I spent a bit of time trying to look at WireShark and Netstat runs inside of the nodes to determine what could still be the blocker. This could have carried on for some time had it not been for Vaishnav Kidambi pointing out that Service Fabric uses SMB to copy the application/service packages around to the nodes in the cluster. We added on a rule for that, and things started to work!

Note: As a result of this work, the Service Fabric product team has acknowledged that there’s a need for better documentation on the ports used by Service Fabric. So keep an eye out for additions to the official documentation.

Here’s what the final set of inbound rules for the Network Security Group (NSG) associated with the management subnet looked like.


A quick rundown… I’ll start at the highest priority (at the bottom) and work my way up since that’s how the NSG applies the rules. Rule 4000 blocks all traffic into the subnet. Rule 3950 and 3960 enable RDP connections within the VNet, and to the RDP jumpbox (at internal IP from the internet. The next three rules (3920-3940) allow the connections needed by Service Fabric within the VNet only (thus allowing all the service fabric agents on the nodes to communicate). And finally, the first two rules (3900 and 3910) open up external connections for ports 19080 and 19000. Rules 3960, 3900, and 3910 are unique to the management subnet. I’ll get to why 19000 and 19080 are unique to this subnet in a moment.

Dynamic vs Static Ports

One sidebar for a moment. Connectivity between the front and back end is restricted to a set of ports you set when you run the template (it defaults to 80 and 443). In Service Fabric terms, this is called a static port. When you build services you also have the option of asking the Fabric for a port to use, a dynamic port. As of the writing of this article, the Azure load balancer does not support these dynamic ports. So to leverage them via the load balancer and our network isolation, we’d have to have a way to update both each time a port is allocated or released. Not ideal.

My thought is that most of the use of dynamic ports is likely going to be between services that have a trusted relationship. This relationship would likely results in the services being places inside the same subnet. If you needed to expose something they were doing to the “outside world”, you will likely set up a gateway/façade service that in turn might be load balanced. Its this gateway service that would be exposed on a static port so that it can easily be reached via a load balancer and secured with NSG rules.

Restricting Service Placement

With the network topology set, and the security rules for each of the subnets sorted, next up was ensuring that application services get placed into the proper locations. Service Fabric services can be given placement constraints. These constraints, defined in the Service Manifest, are checked against Placement Properties for each node type to determine which nodes types should host a service instance. These are commonly used for things like restricting services that require more memory to nodes that have more memory available or situations where specific types of hardware are required (a GPU for example).

Each node type gets a default placement property, NodeTypeName, which you can reference in a service manifest like so.

  <!-- This is the name of your ServiceType. 
       This name must match the string used in RegisterServiceType call in Program.cs. -->
  <StatelessServiceType ServiceTypeName="Web2Type"> 

Now we may want to have other constraints beyond just NodeTypeName. Placement Properties can be assigned to the various Node Types in the cluster Manifest. Or, if you’re doing this via an ARM template such as I was, you can declare them directly in the template via a property within the NodeType definition/declaration.

"placementProperties": { 
  "isDMZ": "true" 

If you look at the node type definition I used earlier, you’ll where this property collection goes. In that template “isDMZ” is false.

Combined, the placement properties, as well as the placement constraints will help ensure that each of the services will go into the subnet that has already been configured to secure host it. But this does pose a challenge. If we declare the placement constraint in the service manifest as I show above, this does restrict which clusters we can deploy the service too. If a cluster doesn’t have our placement properties declared, the service will fail to deploy. We could address this by removing and then added the placement constraints later (not ideal) or altering the cluster manifests (again not ideal). But there are two other options. First, we could craft our own definition of the application/service types and register them with the cluster, then copy the packages to the cluster.

Note: Fore more on Placement Constraints, please check out my new blog post.

This article contains a section that talks about doing this via C# or Powershell. Another option, and one I think I actually prefer (but admittedly haven’t tried), is to use a build event to alter the manifest. You can then trigger this event based on various parameters to control if it happens when you’re doing a local build, vs a cloud build. Perhaps even going so far as reading a value from the Application Parameters or Publication Profile files. But for now, I’ll need to set these aside. There’s also a third option I’m investing but I’m not confident enough to bring it up yet. I hope to eventually circle back on these.

There is one other placement constraint (I mentioned I’d get to this). There are two things unique to the management node type/subnet. The first is that it’s the only subnet I would open ports 19000 and 19080 on. The reason for this is because this is the only node type in the cluster manifest that is marked as “isPrimary”. A service fabric cluster can only have one “primary” node type. This node type is the one where all the “system” services will be placed (Naming, FileStore, Cluster Manager, etc…). So setting “isPrimary” ensures that these services will be placed into this subnet, allowing me to keep them separate from any application services. I previously mentioned that this approach was proposed by Mark Fussell of the Service Fabric team. It’s a pattern that’s used by some larger clusters to help ensure that fabric management resource demands can be scaled independently of application needs.

Between placement of the management services on the primary node type, and restricting application placement via constraints, we can now put each of our services only where we want them to be.

Using the JumpBox

A common technique in cloud solutions is to leverage a “jump box”. Allowing direct, remote access to a virtual machine is sensitive and risky. To help manage this risk, there’s usually one or more, restricted access points that are used as gatekeepers. You access one of these gatekeepers as a leaping off point to access resources inside the security boundary. We’ve set up this approach, allowing you to RDP into a jump box from which you would then  RDP into the other boxes within the VNet.

Using this template, you’ll need to address all your VM instances via IP. Since we’re using dynamic IPs within the VNet, you can RDP into a box using a fairly simple address scheme. The third area of the IP address represents the subnet you want to access (1=front end, 2=back end, 3=management) and the final area is the specific machine. Azure reserves the first three address in a subnet rate for its own use, so you can start at 4 for the VMs in the front end or management subnets. For the back end subjet, I’ve used as the private IP for the internal load balancer. So the nodes in that subnet start at 5.

The next step would be to adapt the “allowJumpBoxRDP”security rule on the management subnet so that it only allows connections from trusted sources (say your on-prem network).

Many diet colas died to bring you this information

So there you have it. I’ll admit that on the surface it may not seem like much. But if you’ve ever built an ARM template, you know how much effort it requires. Add into this all the stuff I had to learn/discover to get it to a functional state and validate it by deploying apps to it (which required more debugging and bug fixes) and..well… we’re talking quite a bit of effort. So I’m hoping that this article and the template will help a few folks avoid what I had to go through.

The entire template (complete with jump box), can be found in my github repo. I also have a version that is a secure cluster using certificates and Azure AD. I’m going to continue to try and polish it, and I’m also looking at getting it published (with additional guidance on usage) in the Azure QuickStart Templates repository. So be sure to let me know of any suggestions or bugs you find. I’ll do my best to get them worked in.

Until next time!

PS – thank you to everyone that helped contribute to this effort: Kal, Jason, Corey, Patrick, Mike, Shenlong, Vaishnav, Chacko, and Mikkel

Extending Logic Apps with Azure Functions

Note: This article is based on services that were in preview at the time it was written. The user experience and any issues/challenges mentioned below may differ from what was eventually released for general availability.

I remember when Azure was only three services. Technically, it was four, but let’s not mince details. Today, there are dozens of services, each with a multitude of features. So it’s impossible to be an expert in them all. As a technology specialist, an architect, I try to understand the basics of most of them, but only really go deep in areas where I have a specific interest. But from time to time, the partners I work with have asks that require me to go into areas I would have otherwise skimmed over.

One such request was to help a partner solve a challenge they were facing with correlating transactions being sent to them by an upstream solution provider they were working with. A simple enough request on the surface, but in this case my partner also had a short run-way to implement this. As such, they wanted to avoid having to do a large amount of coding. They wanted to embrace the speed and flexibility that PaaS gives you.

They looked at Logic Apps, a spiritual successor to BizTalk orchestrations. Unfortunately, Logic Apps doesn’t currently have “message box” type of functionality that would allow for the correlation of multiple messages. However, Logic Apps do have the ability to integrate with another service, Azure Functions, which allows us to extend the features of Logic Apps through custom code.

So I set about creating a simple proof of concept to prove out how something like this could work.

Overview of the solution

The workflow would happen in two parts. The first step would be correlate the two inbound transactions we’d receive that together comprise a “complete order”. A second workflow would then be triggered to monitor an installation process that would occur.

For the first workflow, I wanted to keep it simple. My POC used two transactions that were the same format, each containing a customer name, and a product type. The workflow would receive them via a REST API call. Each transaction would then be written to an Azure SQL DB and the workflow would respond back to the requestor with a “success” response code (200). After this, the workflow would call out to an Azure function to determine if the transaction was complete. If it was, we would drop a message into a queue that would trigger the second half of the workflow. If it wasn’t, we’d just end there and wait for the next transaction to come in. Here’s what the first workflow would look like in Azure Logic Apps.


The second part we’ll discuss later. But these simple steps would give me my “message box”.

Now this article won’t cover getting started with Logic Apps or Functions. There’s enough IMHO written on these subjects already that I don’t feel I have anything new to add. So if you haven’t work with these before, I highly recommend at least covering their “getting started” materials.

Triggering the workflow

Logic apps gives you a multitude of “triggers” that can be used to start a workflow. For this proof of concept, I opted to trigger the workflow manually when an HTTP request is received. Since it’s just a POC I don’t bother to secure this endpoint. This also means I can easily test my workflow using a tool like Fiddler or Postman.

I do this using a “manual trigger”, specifically the “Request – When an HTTP request is received”. You’ll find this under the “Show Microsoft Managed APIs” list. I also keep the request body simple, just a string that is the “customer” identifier, and another that is the “product” identifier. In the Logic App, the request body schema looks like this:

    "properties": {
        "customer": {
            "type": "string"
        "product": {
            "type": "string"
    "title": "Product Order",
    "type": "object"

So its a simple JSON object called “Product Order” that contains two sting properties, “customer” and “product”. I don’t think it could get much simpler.

Saving the transaction to our “message box”

So the first step in re-creating BizTalk’s functionality is having a place to store the messages. Logic Apps has the ability to perform actions on Azure SQL DB, so I opted to leverage this connector. So I created a database and put a table in it that looked like this:

RowID – a GUID that has a default value of “newid()”

Customer – a string

Product – a string

Complete – boolean value, defaults too false

There are several SQL Connectors available in Logic Apps, including an “insert row” action. However, I struggled to get it to work. I wanted to use the defaults I had set up in the database for RowID and Complete fields. And the insert row action told me I had to specify the values for all columns in the table. I opted instead to create a stored procedure in the database, and use the “Execute stored procedure” connector. The stored procedure accepts two parameters, the Customer and Product strings from the HTTP request body.

With the message safely saved in the database, we now respond back to the request letting them know we have received and processed it. This is done via the “Response” action. I could have done this immediately after the request was received, but I wanted to make sure to wait until after I had saved the message. This way I’m really indicating to the requestor that I’ve received AND processed their request.

Correlating the orders

With the orders now saved, I can begin the process of adding my custom business logic to correlate the orders. I created an Azure Function app, and defined a new function named “CheckOrderComplete”. This will be a C# based function triggered (again) by an HTTP trigger. I choose C# because the partner I’m working with does much of their work in C#, so it was a good fit. The HTTP trigger made sense since we’re already using HTTP operations for the workflow trigger. Why not remain consistent. The objective of the function would be to query the database and see of I had the two transactions I needed to have a “complete” transaction on my end.

Azure Functions provides a C# developer reference that was really helpful. However, it still took a bit of trial and error my first time. So I’m going to try and break down my full down my full csx file into the various changes I made. The first change was that I needed to make sure I referenced the assembly that would allow me to interact with the SQL DB where the requests had been stored.

#r "Newtonsoft.Json"
#r "System.Data"   

The Newtonsoft line was already there since I’m dealing with an HTTP request that would need to have its JSON payload translated into C#. But I had to reference the external System.Data assembly, so I added the second line so I’d have the SQL Client. The #r is like adding a reference in a Visual Studio project. This assembly is already available in the Azure Functions hosting environment (think of it as the Nuget package already being installed), so no other action was necessary to make it available for my use.

Next up, I had to add a “using” clause to the code so I could leverage the assembly I just referenced. This works just like it would in a Visual Studio project.

using System.Data.SqlClient;   

This function is called via an unsecured (it is only a POC after all) web hook. So the parameters are going to come to use as a json object that I need to deserialize and validated…

string jsonContent = await req.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(jsonContent);
if (data.customer == null || data.product == null) {
    return req.CreateResponse(HttpStatusCode.BadRequest, new {
        error = "Please pass customer/product properties in the input object"

And now that I know my customer and product parameters are both here, I can use those to check the database. This looks like any other SQL command I’d execute.

SqlConnection sqlConnection1 = new SqlConnection("<em>{your connection string}</em>");
SqlCommand cmd = new SqlCommand();

cmd.CommandText = "select count(Distinct Product) from Orders where Customer = '" + data.customer + "' AND complete = 0";  
cmd.CommandType = System.Data.CommandType.Text;
cmd.Connection = sqlConnection1;

int productCnt;  
int.TryParse(cmd.ExecuteScalar().ToString(), out productCnt);
log.Info("Product Count for Customer '" + data.customer + "' is " + productCnt.ToString());

I took the easy route and hard-coded my connection string since this is a POC. The function will run in an App Service, so I can set application environment variables and store it there. This would definitely be the preferred approach for production code.

With the sql check complete, I have a count of the number of orders for the customer that have not been completed. I’m expecting at least 2. So we can now check the count and respond to the caller with a succeed or fail.

if (productCnt > 1) {
     return req.CreateResponse(HttpStatusCode.OK, new {
         greeting = $"Customer Order is complete!"
} else {
     return req.CreateResponse(HttpStatusCode.BadRequest, new {
         greeting = $"Order Incomplete!"

This check is pretty basic I’ll admit. But for my POC is does the job. I retrieves a list of orders (yeah, imbedded SQL… injection worries… I know) where I have at least two orders that are not complete for a given customer. The real example would be far more robust and also (I hope) more secure. You could also let the function perform additional operations such as updating both items as complete. Its up to you.

With the function created (and tested), we can then connect it to the Logic App workflow. The two product teams have made this really simple. With that action complete, we then add a condition check using the Status Code that was returned from my function. If its equal to 200, we drop a message into a queue that another workflow will pick up to complete processing of the order.


Long Running Workflows

I mentioned above that there would be a second workflow. This second part may take minutes, hours, or even days/weeks to complete. Having a single workflow run that long it problematic. It could loose state in mid process and a host of other problems. So I wanted to avoid that.

The plan here is that when you send the final event message in our first workflow, in this case a Service Bus queue, you can set optional parameters. Items like ScheduledEnqueueTimeUTC which allows you to send the message now, but have it not be visible until perhaps an hour in the future. Then the second workflow would be triggered by the receipt of this event message. That workflow can then check to see if the process has been completed (perhaps using another Azure Function), and when complete dropping a message into yet another queue to signal completion. If its not yet complete, it drops the message back into the queue again with a scheduled enqueue time again set in the future.

This allows that workflow, which could take weeks, ensure that its “state” is maintained, even if the workflow itself needs to restart.


So it may not seem like much. But I was pretty excited to find a way to accomplish what my partner was after in only 34 lines of code (once you remove all the wrapper stuff). And they were pleased with the end product.

Admittedly, as I write this, Logic Apps and Functions are both still in preview. And there have some rough edges. But there’s a significant amount of potential to be leveraged here so I have little doubt that they will be cleaning those edges up. And should you want to build this yourself, I’ve added some of the code to my personal github repository.

Enjoy, and until next time!