Placement Constraints with Service Fabric

I call this blog my notepad for a reason. Its where I can write things down if for no other reason then to help me remember. There’s the added benefit that over the years a few of you have found it and actually come here to read these scribbles and from what I heard, you sometimes even learn something.

That said, I wanted to share my latest discovery, not just for myself, but for the sake of anyone else that has this need.

Placement Constraint Challenges

As I mentioned in my post on Network Isolation with Service Fabric, you can put placement constraints on services to ensure that they are placed on specific node types. However, this can be challenging when you’re moving your application packages between clusters/environments.

The first one you usually run into is that the moment you put a placement constraint on the service, you can’t deploy it to the local development cluster. When this happens, folks starting trying to alter the local cluster’s manifest. Yes, the local cluster does have a manifest and no, you really don’t want to start changing that unless you have to.

So I set about to find a better way. In the process, I found a couple of simple, but IMHO poorly documented features that make this fairy easy to manage. And manage in a way where the “config is code” aspects of the service and fabric manifests are maintained.

Application vs Service Manifest

There are a couple ways to configure placement constraints. One of the ones you’ll find most readily is the use of code at publication. I’ll be honest, i hate this approach. And I suspect many developers do. This approach while entirely valid, requires you to write even more code for something that can easily be managed declarative via the service and package manifests.

If you dig a bit more, you’ll eventually run across the declaration of constraints in the service manifest.

 
<ServiceTypes> 
  <StatelessServiceType ServiceTypeName="Stateless1"> 
    <PlacementConstraints>(NodeTypeName==BackEnd)</PlacementConstraints> 
  </StatelessServiceType> 
</ServiceTypes> 

This IMHO, is perfectly valid. However, it has the unfortunate side affect I mentioned earlier. Namely that, despite hours of trying, there’s no easy way to alter this constraint via any type of configuration setting at deployment time. But what I recently discovered was that this can also be done via the application manifest!

 
<DefaultServices>
  <Service Name="Stateless1">
    <StatelessService ServiceTypeName="Stateless1Type" InstanceCount="[Stateless1_InstanceCount]">
      <SingletonPartition />
      <PlacementConstraints>(isDMZ==true)</PlacementConstraints>
    </StatelessService>
  </Service>
</DefaultServices>

I never knew this was even possible until a few days ago. And admittedly, I can’t find this documented ANYWHERE. It was by simple lucky I tried this to see if it would even work. Not only did it work, but I found that any placement constraint declared in the application manifest would override whatever was in the service manifest.

But this still doesn’t solve the root problem. But it does put us a step closer to addressing it.

Environment Specific Application Parameters

As I was trying to solve this problem, Aman Bhardwaj of the service fabric team sent me this link. That link discusses options you have for managing environment specific settings for service fabric applications. Most of the article centers around the use of configuration overrides to change the values of service configuration settings (that are in the PackageRoot/Config/Settings.xml) at publication. We don’t need most of that, what we need is actually much simpler.

The article discusses that you can parameterize the application manifest. This will substitute values defined in the ApplicationParameter files (you get two by default when you create a new service fabric application), In fact, the template generated by visual studio already has an example of this as it sets the service instance count. And since, as I just mentioned we can put the placement constraints into the application manifest… well I think you can see where this is going.

We start by going into the application manifest and adding a new parameter.

 
<Parameters>
  <Parameter Name="Stateless1_InstanceCount" DefaultValue="-1" />
  <Parameter Name="Stateless1_PlacementConstraints" DefaultValue="" />
</Parameters>

Note that I’m leaving the value blank. This is completely acceptable as it tells the the service fabric that you have no constraints. In fact, during my testing any value that could not be properly evaluated as a placement constraints will be ignored. So you could put in “azureisawesome” and it will have the same affect as leaving this blank. However, we’ll just keep it meaningful and leave it blank.

With the parameter declared, we’re next going to update the constraint declaration in the application manifest to use it.

 
<DefaultServices>
  <Service Name="Stateless1">
    <StatelessService ServiceTypeName="Stateless1Type" InstanceCount="[Stateless1_InstanceCount]">
      <SingletonPartition />
      <PlacementConstraints>[Stateless1_PlacementConstraints]</PlacementConstraints>
    </StatelessService>
  </Service>
</DefaultServices>

So “(isDMZ==true)” has become “[Stateless1_PlacementConstraints]”. With this done, now all we need to do is go define environment specific values in the ApplicationParameter files.

You’re most likely going to leave the Local.xml parameter as blank. But the Cloud.xml (or any other environment specific file you provide, is where you’ll specify the environment specific setting.

 
<?xml version="1.0" encoding="utf-8"?>
<Application xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="fabric:/SampleApp1" xmlns="http://schemas.microsoft.com/2011/01/fabric">
  <Parameters>
    <Parameter Name="Stateless1_InstanceCount" Value="-1" />
    <Parameter Name="Stateless1_PlacementConstraints" Value="(isDMZ==false)" />
  </Parameters>
</Application>

With these in place, when you deploy the service, you simply have to ensure you specify the proper configuration file when you deploy.

image

This is of course the publication dialog for Visual Studio, but that parameter file is also a parameter of the Deploy-FabricAppliation.ps1 file that is part of the project and use to do publications via the command line. So these same parameter files can be used for unattended or automated deployments.

Off to the festival

And there we have it! Which is good because that’s all the time I have for today. My son is heading off to school this week which means my wife and I will officially be “empty nesters”. Today we’re taking him and his girlfriend to the local Renaissance Festival for one last family outing before he leaves. So I hope this helps you all and I bid you a fond “huzzah” until next time!

PS – I have place a sample app with many of these settings already in place on github.

Containers for Windows Developers (a reference)

So in about 36 hours I’ll be presenting at the 2016 Cloud Develop conference in Columbus, OH on “Containers for Windows Developers”. I know on the surface this may seem like a departure from my normal Azure focus. But I firmly believe (and did prior to our Windows Containers for ACS announcement) that containers are going to be a real driving force in the cloud.

I started exploring this in the fall of 2015 shortly after we announced containers for Windows Server 2016 and have continued to follow it. I’ve had to set it aside for other priorities but continued to return to this from time to time. And while the story is still not complete (Windows Server 2016 is still in preview), I’m pleased to see the story is now mature enough that I’m comfortable talking about it. Pretty handy since we’re this close to the conference with a session I proposed several months ago.

What I found as I dug into this that while there’s alot of documentation out there, but I don’t feel it speaks to a key audience. Namely Windows developers. Most of the materials I found are focused on folks that are either already familiar with containers (from Linux) or are from the operations side of the house. So I wanted to come up with materials from the viewpoint of the Windows developer.

Unfortunately, the entire story is something for another day. The purpose of this post is to provide some links that I found useful as I set about learning containers on Windows server. There’s simply to much to list on a slide so this post is a placeholder to those links that I can reference from the presentation

  • Enjoy!

Videos

Windows Containers: What, Why and How – Build 2015

Setting the Stage: The Application Platform in Windows Server 2016 – Build 2016

Windows Server containers, Docker, and an introduction to Azure Container Service – Azure Con 2015

Documentation

Windows Containers Overview – MSDN Documentation

Windows Containers on Windows 10 – MSDN Documentation

What is Docker? – Tim Butler

Docker Basics: A practical starters guide – Tim Butler

http://learningdocker.com/ – unknown

The presentation that is based on my journey/experience is available on github. Please note that the content can (and likely will) continue to be edited right up until just minutes before I present it. Maybe I’ll see you there!

Network Isolation/Security with Azure Service Fabric

There are times you really need to take things beyond the “file new” experience and implement a more advanced scenario. And with these opportunities, there are times you realize that what you need likely isn’t a “one off” kind of thing. There are larger implications to what you need that can help solve a myriad of problems. This is the story of one these scenarios.

I was recently working with a partner as they explored Service Fabric. They liked what they saw, but there was a “but” (there almost always is). This partner is in the government space, and one of the requirements they had is that all public facing services are isolated and secured from any “back end” services (in a DMZ). If you’ve been doing IT for any length of time, this shouldn’t come as news. But the question they had for me was how to do this with Service Fabric.

There were a couple ways to address this that immediately came to mind. We could deploy the front end web application as an Azure Web App, hosted in an App Service Environment that was joined to the same VNet as the Service Fabric Cluster. We could also set up two Service Fabric clusters, again joined by a single VNet. The issue with both of these is that the front and back ends of the solution would need to be deployed and managed separately. Not a huge deal admittedly. But this did complicate the provisioning and deployment processes a bit, as well as seemed to run counter to the idea of a Service Fabric “application”, composed of multiple services as a single entity. I was fortunate that I had previously engaged my friend and colleague Kal to bring his considerable Service Fabric experience into play with this partner, and he suggested a third option, one we all found fairly intriguing.

A Service Fabric cluster has Node Types which are directly related to VM Scale Sets. Taking advantage of this, we could place different node types into different subnets and place Network Security Groups (NSGs) on the subnets to provide the level of isolation the partner required. We would then use Placement Constraints to ensure that the services within an application are only hosted in the proper subnet by using constraints specific to the node type, or types, in that subnet.

We ran the idea by Mark Fussell,  the lead Project Manager of the Service Fabric team. As we talked, we realized that folks had secured a cluster from all external access, but there didn’t appear to be a public, previously documented version of what we were proposing. Mark was supportive of the idea, and even offered up that in some of the “larger” Service Fabric clusters, the placement constraint approach has been used to ensure that the services that make up the Service Fabric Cluster remain isolated from those that comprise the applications deployed within it.

Our mission clear, I set to work! We were going to create a Azure Resource Manager template to create our “DMZ’d Service Fabric Cluster”.

Network Topology 

The first step was to create the overall network topology.

image

We have the front end subnet, which has a public load balancer that would handle traffic from the internet via a load balancer. There is a back end subnet with an internal load balancer that does not allow any connections from outside of the virtual network (using a private IP). Finally, we have a management subnet that contains the cluster services, including the web portal (on port 19080) and TCP client API (19000). For good measure, we’re also going to toss an RDP jump box into this subnet so if something goes wrong with any of the nodes in the cluster, we can remote in and troubleshoot (something that I used the heck out of while crafting this template).

With this in place, we then define the VM Scale Sets, and bind their network configurations to the proper subnets as follows:

"networkInterfaceConfigurations": [ 
  { 
    "name": "[variables('nodesMgmnt')['nicName']]", 
    "properties": { 
      "ipConfigurations": [ 
        { 
          "name": "[concat(variables('nodesMgmnt')['nicName'],'-',0)]", 
          "properties": { 
            "loadBalancerBackendAddressPools": [ 
              { 
                "id": "[variables('lbMgmnt')['PoolID']]"
              } 
            ], 
            "subnet": { 
              "id": "[variables('subnetManagement')['Ref']]"
            } 
          } 
        } 
      ], 
      "primary": true
    } 
  } 
]

With the VM Scale Sets in place, then we moved on to the Service Fabric Cluster to define each Node Type. Here’s the cluster node type definition for the management subnet node type.

{ 
  "name": "[variables('nodesMgmnt')['TypeName']]", 
  "applicationPorts": { 
    "endPort": "[variables('svcFabCluster')['applicationEndPort']]", 
    "startPort": "[variables('svcFabCluster')['applicationStartPort']]"
  }, 
  "clientConnectionEndpointPort": "[variables('svcFabCluster')['tcpGatewayPort']]", 
  "durabilityLevel": "Bronze", 
  "ephemeralPorts": { 
    "endPort": "[variables('svcFabCluster')['ephemeralEndPort']]", 
    "startPort": "[variables('svcFabCluster')['ephemeralStartPort']]"
  }, 
  "httpGatewayEndpointPort": "[variables('svcFabCluster')['httpGatewayPort']]", 
  "isPrimary": true, 
  "placementProperties": { 
    "isDMZ": "false"
  },            
  "vmInstanceCount": "[variables('nodesMgmnt')['capacity']]"
} 

The “name” of this Node Type, must match the name of a VM Scale Set, that’s how the two get wired together. Since this sample is for our “management” node type, it would also be the only one with the isPrimary property set too true.

At This point, we debugged the template and made sure the cluster to ensure it was valid and the cluster would come up “green”. The next (and harder step) is to start securing the cluster.

Note: If you create a cluster via the Azure portal with multiple node types, each node type will get its own subnet. However, we were after a reusable ARM template so we had to configure things ourselves.

Network Security

Unfortunately, when we set out to create this, there wasn’t much publicly available on the ports that were needed within a fabric cluster. So we had to do some guesswork, some heavy digging, as well as make a wishes for some good luck. So in this section I’m hoping to lay out some of what we learned to save others the effort.

First off, we started by blocking all inbound connections on the three subnets. I then opened ports 19080 (used by the Service Fabric web portal) and 19000 (used by the Fabric Client and Powershell) for the “management” subnet so I could interact with the cluster remotely. This was all done via the Azure Portal, interactively so we could test the rules out then use the Resource Explorer to export them to our template. We assumed that with these rules in place, we would see some of the nodes in the cluster go “red” or unhealthy. But we didn’t!

It took a day or so, but we eventually figured out that we were seeing two separate systems collide. Firstly, when a VM is brought up, the Service Fabric extension is inserted into it. This extension then registers the node with the cluster. As part of that process there’s a series of connections that are established. These connections are not ephemeral, remaining up for the life of the node. Our mistake was in assuming these connections, like we encourage most of our partners to do when building applications, were only temporary and established when they were needed.

Since these are established, persistent connections, they are not impacted when new NSG rules are applied. This makes sense since the NSG rules are there to interrogate any new connection requests, not look over everything that’s already been established. So the nodes would remain green until we rebooted them (tearing down their connections) and they tried (and failed) to re-establish their connection to the cluster.

This sorted out, we set about trying to place the remainder of the rules in place for the subnets. We knew we wanted internet connectivity to any application/service ports in the front end, as well as application/service ports in the backend from within the VNet. But what we were missing was the ports that Service Fabric needed. We found most of these in the cluster manifest:

 
<Endpoints> 
  <ClientConnectionEndpoint Port="19000" /> 
  <LeaseDriverEndpoint Port="1026" /> 
  <ClusterConnectionEndpoint Port="1025" /> 
  <HttpGatewayEndpoint Port="19080" Protocol="http" /> 
  <ServiceConnectionEndpoint Port="1027" /> 
  <ApplicationEndpoints StartPort="20000" EndPort="30000" /> 
  <EphemeralEndpoints StartPort="49152" EndPort="65534" /> 
</Endpoints> 

This worked fine at first. We stood up the cluster with these rules properly in place and the nodes were all green. However, when we’d tried to deploy an app to the cluster, it would always time out during the copy step. I spent a couple hours troubleshooting this one to eventually realize that it was something inside the cluster that was still blocked. I spent a bit of time trying to look at WireShark and Netstat runs inside of the nodes to determine what could still be the blocker. This could have carried on for some time had it not been for Vaishnav Kidambi pointing out that Service Fabric uses SMB to copy the application/service packages around to the nodes in the cluster. We added on a rule for that, and things started to work!

Note: As a result of this work, the Service Fabric product team has acknowledged that there’s a need for better documentation on the ports used by Service Fabric. So keep an eye out for additions to the official documentation.

Here’s what the final set of inbound rules for the Network Security Group (NSG) associated with the management subnet looked like.

image

A quick rundown… I’ll start at the highest priority (at the bottom) and work my way up since that’s how the NSG applies the rules. Rule 4000 blocks all traffic into the subnet. Rule 3950 and 3960 enable RDP connections within the VNet, and to the RDP jumpbox (at internal IP 10.0.3.4) from the internet. The next three rules (3920-3940) allow the connections needed by Service Fabric within the VNet only (thus allowing all the service fabric agents on the nodes to communicate). And finally, the first two rules (3900 and 3910) open up external connections for ports 19080 and 19000. Rules 3960, 3900, and 3910 are unique to the management subnet. I’ll get to why 19000 and 19080 are unique to this subnet in a moment.

Dynamic vs Static Ports

One sidebar for a moment. Connectivity between the front and back end is restricted to a set of ports you set when you run the template (it defaults to 80 and 443). In Service Fabric terms, this is called a static port. When you build services you also have the option of asking the Fabric for a port to use, a dynamic port. As of the writing of this article, the Azure load balancer does not support these dynamic ports. So to leverage them via the load balancer and our network isolation, we’d have to have a way to update both each time a port is allocated or released. Not ideal.

My thought is that most of the use of dynamic ports is likely going to be between services that have a trusted relationship. This relationship would likely results in the services being places inside the same subnet. If you needed to expose something they were doing to the “outside world”, you will likely set up a gateway/façade service that in turn might be load balanced. Its this gateway service that would be exposed on a static port so that it can easily be reached via a load balancer and secured with NSG rules.

Restricting Service Placement

With the network topology set, and the security rules for each of the subnets sorted, next up was ensuring that application services get placed into the proper locations. Service Fabric services can be given placement constraints. These constraints, defined in the Service Manifest, are checked against Placement Properties for each node type to determine which nodes types should host a service instance. These are commonly used for things like restricting services that require more memory to nodes that have more memory available or situations where specific types of hardware are required (a GPU for example).

Each node type gets a default placement property, NodeTypeName, which you can reference in a service manifest like so.

 
<ServiceTypes> 
  <!-- This is the name of your ServiceType. 
       This name must match the string used in RegisterServiceType call in Program.cs. -->
  <StatelessServiceType ServiceTypeName="Web2Type"> 
    <PlacementConstraints>(NodeTypeName==BackEnd)</PlacementConstraints> 
  </StatelessServiceType> 
</ServiceTypes> 

Now we may want to have other constraints beyond just NodeTypeName. Placement Properties can be assigned to the various Node Types in the cluster Manifest. Or, if you’re doing this via an ARM template such as I was, you can declare them directly in the template via a property within the NodeType definition/declaration.

 
"placementProperties": { 
  "isDMZ": "true" 
},

If you look at the node type definition I used earlier, you’ll where this property collection goes. In that template “isDMZ” is false.

Combined, the placement properties, as well as the placement constraints will help ensure that each of the services will go into the subnet that has already been configured to secure host it. But this does pose a challenge. If we declare the placement constraint in the service manifest as I show above, this does restrict which clusters we can deploy the service too. If a cluster doesn’t have our placement properties declared, the service will fail to deploy. We could address this by removing and then added the placement constraints later (not ideal) or altering the cluster manifests (again not ideal). But there are two other options. First, we could craft our own definition of the application/service types and register them with the cluster, then copy the packages to the cluster.

Note: Fore more on Placement Constraints, please check out my new blog post.

This article contains a section that talks about doing this via C# or Powershell. Another option, and one I think I actually prefer (but admittedly haven’t tried), is to use a build event to alter the manifest. You can then trigger this event based on various parameters to control if it happens when you’re doing a local build, vs a cloud build. Perhaps even going so far as reading a value from the Application Parameters or Publication Profile files. But for now, I’ll need to set these aside. There’s also a third option I’m investing but I’m not confident enough to bring it up yet. I hope to eventually circle back on these.

There is one other placement constraint (I mentioned I’d get to this). There are two things unique to the management node type/subnet. The first is that it’s the only subnet I would open ports 19000 and 19080 on. The reason for this is because this is the only node type in the cluster manifest that is marked as “isPrimary”. A service fabric cluster can only have one “primary” node type. This node type is the one where all the “system” services will be placed (Naming, FileStore, Cluster Manager, etc…). So setting “isPrimary” ensures that these services will be placed into this subnet, allowing me to keep them separate from any application services. I previously mentioned that this approach was proposed by Mark Fussell of the Service Fabric team. It’s a pattern that’s used by some larger clusters to help ensure that fabric management resource demands can be scaled independently of application needs.

Between placement of the management services on the primary node type, and restricting application placement via constraints, we can now put each of our services only where we want them to be.

Using the JumpBox

A common technique in cloud solutions is to leverage a “jump box”. Allowing direct, remote access to a virtual machine is sensitive and risky. To help manage this risk, there’s usually one or more, restricted access points that are used as gatekeepers. You access one of these gatekeepers as a leaping off point to access resources inside the security boundary. We’ve set up this approach, allowing you to RDP into a jump box from which you would then  RDP into the other boxes within the VNet.

Using this template, you’ll need to address all your VM instances via IP. Since we’re using dynamic IPs within the VNet, you can RDP into a box using a fairly simple address scheme. The third area of the IP address represents the subnet you want to access (1=front end, 2=back end, 3=management) and the final area is the specific machine. Azure reserves the first three address in a subnet rate for its own use, so you can start at 4 for the VMs in the front end or management subnets. For the back end subjet, I’ve used 10.0.2.4 as the private IP for the internal load balancer. So the nodes in that subnet start at 5.

The next step would be to adapt the “allowJumpBoxRDP”security rule on the management subnet so that it only allows connections from trusted sources (say your on-prem network).

Many diet colas died to bring you this information

So there you have it. I’ll admit that on the surface it may not seem like much. But if you’ve ever built an ARM template, you know how much effort it requires. Add into this all the stuff I had to learn/discover to get it to a functional state and validate it by deploying apps to it (which required more debugging and bug fixes) and..well… we’re talking quite a bit of effort. So I’m hoping that this article and the template will help a few folks avoid what I had to go through.

The entire template (complete with jump box), can be found in my github repo. I’m going to continue to try and polish it, and I’m also looking at getting it published (with additional guidance on usage) in the Azure QuickStart Templates repository. So be sure to let me know of any suggestions or bugs you find. I’ll do my best to get them worked in.

Until next time!

PS – thank you to everyone that helped contribute to this effort: Kal, Jason, Corey, Patrick, Mike, Shenlong, Vaishnav, Chacko, and Mikkel

Extending Logic Apps with Azure Functions

Note: This article is based on services that were in preview at the time it was written. The user experience and any issues/challenges mentioned below may differ from what was eventually released for general availability.

I remember when Azure was only three services. Technically, it was four, but let’s not mince details. Today, there are dozens of services, each with a multitude of features. So it’s impossible to be an expert in them all. As a technology specialist, an architect, I try to understand the basics of most of them, but only really go deep in areas where I have a specific interest. But from time to time, the partners I work with have asks that require me to go into areas I would have otherwise skimmed over.

One such request was to help a partner solve a challenge they were facing with correlating transactions being sent to them by an upstream solution provider they were working with. A simple enough request on the surface, but in this case my partner also had a short run-way to implement this. As such, they wanted to avoid having to do a large amount of coding. They wanted to embrace the speed and flexibility that PaaS gives you.

They looked at Logic Apps, a spiritual successor to BizTalk orchestrations. Unfortunately, Logic Apps doesn’t currently have “message box” type of functionality that would allow for the correlation of multiple messages. However, Logic Apps do have the ability to integrate with another service, Azure Functions, which allows us to extend the features of Logic Apps through custom code.

So I set about creating a simple proof of concept to prove out how something like this could work.

Overview of the solution

The workflow would happen in two parts. The first step would be correlate the two inbound transactions we’d receive that together comprise a “complete order”. A second workflow would then be triggered to monitor an installation process that would occur.

For the first workflow, I wanted to keep it simple. My POC used two transactions that were the same format, each containing a customer name, and a product type. The workflow would receive them via a REST API call. Each transaction would then be written to an Azure SQL DB and the workflow would respond back to the requestor with a “success” response code (200). After this, the workflow would call out to an Azure function to determine if the transaction was complete. If it was, we would drop a message into a queue that would trigger the second half of the workflow. If it wasn’t, we’d just end there and wait for the next transaction to come in. Here’s what the first workflow would look like in Azure Logic Apps.

image

The second part we’ll discuss later. But these simple steps would give me my “message box”.

Now this article won’t cover getting started with Logic Apps or Functions. There’s enough IMHO written on these subjects already that I don’t feel I have anything new to add. So if you haven’t work with these before, I highly recommend at least covering their “getting started” materials.

Triggering the workflow

Logic apps gives you a multitude of “triggers” that can be used to start a workflow. For this proof of concept, I opted to trigger the workflow manually when an HTTP request is received. Since it’s just a POC I don’t bother to secure this endpoint. This also means I can easily test my workflow using a tool like Fiddler or Postman.

I do this using a “manual trigger”, specifically the “Request – When an HTTP request is received”. You’ll find this under the “Show Microsoft Managed APIs” list. I also keep the request body simple, just a string that is the “customer” identifier, and another that is the “product” identifier. In the Logic App, the request body schema looks like this:

{
    "properties": {
        "customer": {
            "type": "string"
        },
        "product": {
            "type": "string"
        }
    },
    "title": "Product Order",
    "type": "object"
}

So its a simple JSON object called “Product Order” that contains two sting properties, “customer” and “product”. I don’t think it could get much simpler.

Saving the transaction to our “message box”

So the first step in re-creating BizTalk’s functionality is having a place to store the messages. Logic Apps has the ability to perform actions on Azure SQL DB, so I opted to leverage this connector. So I created a database and put a table in it that looked like this:

RowID – a GUID that has a default value of “newid()”

Customer – a string

Product – a string

Complete – boolean value, defaults too false

There are several SQL Connectors available in Logic Apps, including an “insert row” action. However, I struggled to get it to work. I wanted to use the defaults I had set up in the database for RowID and Complete fields. And the insert row action told me I had to specify the values for all columns in the table. I opted instead to create a stored procedure in the database, and use the “Execute stored procedure” connector. The stored procedure accepts two parameters, the Customer and Product strings from the HTTP request body.

With the message safely saved in the database, we now respond back to the request letting them know we have received and processed it. This is done via the “Response” action. I could have done this immediately after the request was received, but I wanted to make sure to wait until after I had saved the message. This way I’m really indicating to the requestor that I’ve received AND processed their request.

Correlating the orders

With the orders now saved, I can begin the process of adding my custom business logic to correlate the orders. I created an Azure Function app, and defined a new function named “CheckOrderComplete”. This will be a C# based function triggered (again) by an HTTP trigger. I choose C# because the partner I’m working with does much of their work in C#, so it was a good fit. The HTTP trigger made sense since we’re already using HTTP operations for the workflow trigger. Why not remain consistent. The objective of the function would be to query the database and see of I had the two transactions I needed to have a “complete” transaction on my end.

Azure Functions provides a C# developer reference that was really helpful. However, it still took a bit of trial and error my first time. So I’m going to try and break down my full down my full csx file into the various changes I made. The first change was that I needed to make sure I referenced the assembly that would allow me to interact with the SQL DB where the requests had been stored.

   
#r "Newtonsoft.Json"
#r "System.Data"   

The Newtonsoft line was already there since I’m dealing with an HTTP request that would need to have its JSON payload translated into C#. But I had to reference the external System.Data assembly, so I added the second line so I’d have the SQL Client. The #r is like adding a reference in a Visual Studio project. This assembly is already available in the Azure Functions hosting environment (think of it as the Nuget package already being installed), so no other action was necessary to make it available for my use.

Next up, I had to add a “using” clause to the code so I could leverage the assembly I just referenced. This works just like it would in a Visual Studio project.

using System.Data.SqlClient;   

This function is called via an unsecured (it is only a POC after all) web hook. So the parameters are going to come to use as a json object that I need to deserialize and validated…

string jsonContent = await req.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(jsonContent);
  
if (data.customer == null || data.product == null) {
    return req.CreateResponse(HttpStatusCode.BadRequest, new {
        error = "Please pass customer/product properties in the input object"
    });
} 

And now that I know my customer and product parameters are both here, I can use those to check the database. This looks like any other SQL command I’d execute.

  
SqlConnection sqlConnection1 = new SqlConnection("<em>{your connection string}</em>");
SqlCommand cmd = new SqlCommand();

cmd.CommandText = "select count(Distinct Product) from Orders where Customer = '" + data.customer + "' AND complete = 0";  
cmd.CommandType = System.Data.CommandType.Text;
cmd.Connection = sqlConnection1;

sqlConnection1.Open();
  
int productCnt;  
int.TryParse(cmd.ExecuteScalar().ToString(), out productCnt);
log.Info("Product Count for Customer '" + data.customer + "' is " + productCnt.ToString());
  
sqlConnection1.Close();    

I took the easy route and hard-coded my connection string since this is a POC. The function will run in an App Service, so I can set application environment variables and store it there. This would definitely be the preferred approach for production code.

With the sql check complete, I have a count of the number of orders for the customer that have not been completed. I’m expecting at least 2. So we can now check the count and respond to the caller with a succeed or fail.

if (productCnt > 1) {
     return req.CreateResponse(HttpStatusCode.OK, new {
         greeting = $"Customer Order is complete!"
     }); 
} else {
     return req.CreateResponse(HttpStatusCode.BadRequest, new {
         greeting = $"Order Incomplete!"
     }); 
}

This check is pretty basic I’ll admit. But for my POC is does the job. I retrieves a list of orders (yeah, imbedded SQL… injection worries… I know) where I have at least two orders that are not complete for a given customer. The real example would be far more robust and also (I hope) more secure. You could also let the function perform additional operations such as updating both items as complete. Its up to you.

With the function created (and tested), we can then connect it to the Logic App workflow. The two product teams have made this really simple. With that action complete, we then add a condition check using the Status Code that was returned from my function. If its equal to 200, we drop a message into a queue that another workflow will pick up to complete processing of the order.

clip_image002

Long Running Workflows

I mentioned above that there would be a second workflow. This second part may take minutes, hours, or even days/weeks to complete. Having a single workflow run that long it problematic. It could loose state in mid process and a host of other problems. So I wanted to avoid that.

The plan here is that when you send the final event message in our first workflow, in this case a Service Bus queue, you can set optional parameters. Items like ScheduledEnqueueTimeUTC which allows you to send the message now, but have it not be visible until perhaps an hour in the future. Then the second workflow would be triggered by the receipt of this event message. That workflow can then check to see if the process has been completed (perhaps using another Azure Function), and when complete dropping a message into yet another queue to signal completion. If its not yet complete, it drops the message back into the queue again with a scheduled enqueue time again set in the future.

This allows that workflow, which could take weeks, ensure that its “state” is maintained, even if the workflow itself needs to restart.

Summary

So it may not seem like much. But I was pretty excited to find a way to accomplish what my partner was after in only 34 lines of code (once you remove all the wrapper stuff). And they were pleased with the end product.

Admittedly, as I write this, Logic Apps and Functions are both still in preview. And there have some rough edges. But there’s a significant amount of potential to be leveraged here so I have little doubt that they will be cleaning those edges up. And should you want to build this yourself, I’ve added some of the code to my personal github repository.

Enjoy, and until next time!

Multiple Windows with Windows 10 and JavaScript

So a little known fact of Universal Windows Programs (UWPs), is that they can have more than a single window. At least it was little know to me until I was asked by a partner if there was a way to do it. As it turns out, there’s actually two options. There’s the multiple window approach used by applications like the Mail app. There’s also the projection manager approach where you intentionally want to display a window on a seperate screen from the main window. These approaches have actually been around since Windows 8 and center on the use of the ApplicationView object.

In poking around, I found that there’s a solid collection of Windows 8.1 and Visual Studio 2013 samples for C++, C#, and even JavaScript. There was even an excellent blog post about implementing multiple windows/views with Javascript, but it was again focused on Windows 8.1. But I needed an example of this approach with Windows 10 and Visual Studio 2015. And unfortunately, the Windows 8 example wasn’t working. So I set about fixing it.

WinJS 4 and the missing MSAppView

It turns out when moving from Windows 8 to Windows 10, you’re also moving from WinJS 2 to WinJS 4. In WinJS 4, some of the more “Microsoft-ish” ways of doing things was dropped in favor of allowing JavaScript developers to leverage the way they’ve always done things. The primary change was the deprecation of the MSAppView object and its various functions. This object represented a Windows Store app’s window (app view) and gave us methods like close and postMessage as well as a property that was the viewId of the window.

With this object removed, some of the functionality of the afformentioned Windows 8.1 sample was broken. So to get it working, I had to find a replacement, or better yet, build one.

myAppView

I started by crafting a replacement for the deprecated MSAppView which I called simply enough myAppView. Giving this object some of the sames functions/properties as its predecessor.

    function myAppView(window) {
        this.viewId = MSApp.getViewId(window),
        this.window = window,
        this.postMessage = function (message, domain) {
            this.window.postMessage(message, domain);
        }
    };

With this in place, I also needed to implement two more missing fuctions, the MSApp.createNewView (the WinJS implementation of CoreApplication.CreateNewView). Since these would be “global” functions, I opted to wrap them in a WinJS namespace called CustomAppView. Here’s the createNewView implementation.

    createNewView: function(page) {
        var newWindow = window.open(page, null, "msHideView=yes");
        //var newWindow = window.open(page);

        return new myAppView(newWindow);
    }

This would instantiate a new window using the common JavaScript function window.open. However, what we’ve done is add the value myHideView=yes in the optional “replace” parameter. This directive means the window exists, but isn’t yet visible. I searched the internet and for the life of me couldn’t find a single referrence for this. Thankfully, I was able to track someone down inside of the team responsible for WinJS and they shared this little gem with me. Then using the handle for this new and invisible window, I create an instance of the myAppView object. This object exposes the same functions/properties as its predicesor, allowing much of the sample code to remain in place.

We also had to craft a replacement for MSApp.getViewOpener:

    getViewOpener: function () {
        var openerWindow = window.opener;
        var parentWindow = window.parent;
        return new myAppView(parentWindow);
    }

You can see the full implementation of the myAppView and the two functions on GitHub.

 Out with the old, in with the new

In the existing sample, I was focused on getting “Scenario 1” working. This centered around the ViewManager object and functions for managing the view. The first of those was one to create a new view from a URL. In here, I had to replace the existing MSApp.createNewView with my new implementation.

    //var newView = MSApp.createNewView(page);
    //BMS
    var newView = CustomAppView.createNewView(page);

So we call the method we crafted above to create the window and return a copy of the myAppView object.

Next up, I had to replace MSApp.getViewOpener, which will be used to help me keep track of who “owns” the current window to assist in sending message between the windows.

    //this.opener = MSApp.getViewOpener();
    //BMS
    this.opener = CustomAppView.getViewOpener();

I played a bit to see which works better, the window.opener and window.parent, and found that parent gave me the best results. However, its important to note that according to the official documentation, this could return a window that “may not necessarily be the same window reported by the window.parent or window.opener properties”. But I found one that worked for my needs, so hopefully it will work for you also.

I’m still no expert

I’d like to stress, I know just enough JavaScript to continue to find it frustrating. The challenges I have with it are well known by my colleagues. However, I do appreciate opportunities like this that force me to dig into it and learn more. And this is one of those times where I hope my frustration will help save you some. I’ve posted the complete project up on GitHub so you can pull and try yourself. Just be forewarned that I’ve only really focused on Scenario 1, so the others may still have issues. Smile I’ll try to keep working on this from time to time to make the sample a bit better (it really needs some inline comments).

Until next time!

Azure Automation And SQL DB

Authors Note: The version of this script on GitHub was updated in May of 2016 to include the Azure AD tenant ID where the credential exists, as well as the Subscription ID to make sure the proper subscription is selected. This should help situations where there are multiple subscriptions. You will also need to included these variables values when setting up your job schedule.

What a difference two months makes. Two months ago, I helped an old friend out with a customer by working up a sample of using Azure Automation to resize an Azure SQL DB. At the time, importing modules wasn’t as seemless an experience as anyone would have liked. So I intended to blog about it, but simply didn’t get around to it.

Until today! With our recent addition of the Azure Automation module gallery, this has just gotten easier. Given that someone invested in helping build that functionality out. I figured I couldn’t slack off and not write about this any longer. So here we go…

Resizing Azure SQL DB’s

First off we need to address a bit of a misconception. While Azure SQL DB prices are stated in terms of “per month”, the reality is that you are billed “per hour” (see the FAQ). You can adjust both the service tier, performance level, or eDTU’s as often as you want. Within a given hour, the price you pay will be the highest option you exercised during that hour.

This means that if you either have predictable performance needs (highs or lows), or are actively monitoring your database utilization (see my previous blog post), you can tweek your database capacity accordingly.

To take advantage of this, we can use Azure Automation and some PowerShell.

Preparing the Database for resizing

Before I dive into setting up the automation itself, lets discuss a few details. My approach will leverage Azure’s Role Based Access Control (RBAC) features. So manipulating the database will be done by an identity that I’m setting up specifically for Automation and it will have access to the database(s) I want to manipulate.

Presuming you have access as an Administrator to the Azure AD instance that’s associated with your subscription’s management, locate Active Directory in the portal (search or browse) and select it. This should re-direct you to the old portal (at the time of this writing, Azure AD is not currently available in the new portal). Once there, add a new user.

image

You can use your own identity for this, but I’m much more a fan of setting up “service” identities for these things. If you don’t have administrative access AD in your subscription (not uncommon if its federated with your on-premises domain), then you’ll need to work with whomever manages your Active Directory domain to get the service identity created.

With the identity created,  we can now go to the Azure SQL Database server that’s hosting our database, and add the service identity as a user with the “Owner” role.

image

This grants the service identity to resize (amoung other things), all the databases on this server.

Setting up ourAutomation

Now we have ot set up an Automation Account in our Azure subscription. Its this account that will be the container for all our automation tasks, their assets, and any schedules we have them operate under. In the new Azure portal, click on “+ New”, then select “Management”. Alternatively, you can enter “Automation” into the search box and follow that path.

image

Give your Automation account a globally unique name, and make sure its associated with the proper subscription, resource group, and region (Automation isn only available in a handful of regions currently).

With the account created, we’ll first add the service identity we created earlier to the Automation account as a Credential asset. A Credential asset will be used by our automation script to execute the commands to resize the database. Other assets include PowerShell Modules, certificates, pre-set variables, etc… This also allows you to have multiple scripts that reuse the same Credential asset without each having to be updated each time the credential changes (say for periodic password changes).

When I first did this a couple months ago, I had to manually add in another PowerShell module. But they’ve started importing more of them by default, so that’s no longer necessary (unless you are tied to specific modules that are not available by default or you need to tag a newer/older version of something that’s already included).

With the credential in place, we can now add the Runbook. I’ve created one for you already, so just download that, and then in the Azure Automation account, import and existing runbook.

image

Set the type to “PowerShell Workflow” and give it a meaningful name (one will likely default), and give it a description. The import should only take a few seconds, then you can select the new Runbook. Once in, you can go into “edit” mode to either made changes, or test it. I highly recommend a test run this way first. To execute a test, you’ll provide the parameters called out in the script…

CredentialAssetName – the name we can the Credential Asset we created

ResourceGroupName – My script works on new Resource Manager based Azure SQL DB’s. So we want to know what resource group the database is in.

ServerName – the name of the database server (just the first part, leave out the ‘.database.windows.net’). This will need to be in all lower case.

DatabaseName – the name of the database to be resized

NewEdition – what is the new edition we’re resizing to (Basic, Standard, etc…)

NewPricingTier – what tier are we moving to S1, S2 (leave blank if the Edition is basic)

Fill in each appropriately and “Start” the test. Depending on the size of your database, this could take some time, so go get something to drink, check email, etc…

Done? Good! Hopefully the test ran well, So lets now publish our runbook. Publishing is important because it allows us to edit and test one version of the runbook while another is used by any scheduled occurances.

image

With the publish complete, our final step is to schedule this run. In my case, I know each afternoon, I want to set the database back down to “Basic”, because I won’t be using it in the evening (its just a dev database). I’ve got the option of creating a new schedule, or associating the job with an existing one. This allows multiple runbooks to be executed on a single schedule entry.

image

With the schedule set, we just need to set the parameters, much like we did when we tested the script.

image

Click “Ok” a couple times, and our runbook will be all set to run on our schedule. We can also execute the runbook “on demand” by selecting it in the portal, and clicking on “Start”.

Note: Once created, the portal does not currently let us edit the schedule. You’ll need to remove and readd-it. But you can turn schedules on/off.

Monitoring our Runbook

So each time the RunBook is executed by the schedule this is referred to as a job. Either in the Automation Account, or in the Runbook itself, we can select “jobs” and view any jobs that have recently. This will include both ones that were done “on demand”, or via the scheduler. In this you can look at the output, as well as the logs that were written.

image

And there we have it!

So what’s next?

For me, next up is lunch. For you, hopefully this will work out of the box for you and help you save a few bucks here and there. I would also recommend you check out some of the scripts in the gallery and think of other interesting uses for Azure Automation.

Until next time!

PS – HUGE thanks to the team that has open sourced Live Writer. Works like a dream!

Azure SQL DB Usage Visualization with Power BI

It’s taken a week longer then I had hoped, but I’m finally ready for my next installment of my “blog more” effort. As many of you are aware, I’m not a data person. However, there’s no ignoring the fact that applications are either a conduit for, or generator of data. So it’s inevitable that I have to step out of my comfort zone from time to time. What’s important is to grab these opportunities and learn something from them.

The most recent example of this was when I had a partner that was faced with the deprecation of the Azure SQL DB Web/Business pricing tiers. Their solution is cost sensitive and when they looked at the Azure portal’s recommendations for the new tiers for their many databases, they had a bit of sticker shock.

With some help from folks like Guy Haycock of the Azure SQL team and a good starting blog post on SQL DB usage monitoring, the partner was able to determine that the high tier recommendations were due to their daily database backup processes. Since the new SQL tiers offer point in time restore my partner didn’t need to do their own backups and as a result their actual needs were much lower then what the portal suggested. Saving them A LOT of money.

To identify what they really needed, the partner did was pull the SQL diagnostic view data down into Excel, and then visualized the data there. But I thought there had to be an even easier way to not just render it once, but produce an on-going dashboard. And to that end, started looking at Power BI.

Which is what this post is about.

The Diagnostic Management Views

The first step is to understand the management views and how they can help us. The blog post I mentioned earlier refers to two separate managements views the “classic” sys.resource_stats view which displays data for up to 14 days with 15 minute averages and the new sys.dm_db_resource_stats which have 1 hour of data with 15 second averages.

For the “classic” view, there are two schemas: the web/business version, and the new schema. The new schema, is very similar to the new sys.dm_db_resource_stats schema which is what I’ll focus on for our examples in this blog post. The key difference is that resources_stats is located in the Azure SQL DB “master” database, and dm_db_resource_stats can be found in the individual databases.

There are four values available to us: cpu, data, log writes, and memory. And each is expressed as a percentage of what’s available given the database’s current tier. This allows us to query the data directly to get details:

SELECT * FROM dm_db_resource_stats

Or if we want to get a bit more complex, we can bundle values together per minute and chart out the mix, max, and average per minute:

SELECT
    distinct convert(datetime,convert(char,end_time,100)) as Clock,
    max(avg_cpu_percent) as MaxCPU,
    min(avg_cpu_percent) as MinCPU,
    avg(avg_cpu_percent) as AvgCPU,
    max(avg_log_write_percent) as MaxLogWrite,
    min(avg_log_write_percent) as MinLogWrite,
    avg(avg_log_write_percent) as AvgLogWrite,
    max(avg_data_io_percent) as MaxIOPercent,
    min(avg_data_io_percent) as MinIOPercent,
    avg(avg_data_io_percent) as AvgIOPercent
FROM sys.dm_db_resource_stats
GROUP BY convert(datetime,convert(char,end_time,100))

Using the new dm_db_resource_stats view my experiments consistently got back 256 rows (64 minutes) which is a nice small result set. This gives you a good “point in time” measure of a databases resource usage. If you use the classic resource_stats view (which you query at the master db level), you can get back a few more rows and will likely want to filter the result set back on appropriate database_name.

Since the values represent a percentage of the available resources, this lets us determine how close to the database tier cap our database is operating. Thus allowing us to determine when we may want to make adjustments.

Power BI visualization

With a result set figured out, we’re ready to start wiring up the PowerBI visualizations. For simplicity, I’m going to use the online version of Power BI. It’s an online database, so why not an only reporting tool. Power BI online is available for free, so just head over and sign up. And while this is a sort of “getting started with” Power BI article, I’m not going to dive into everything about the portal. So if you’re new to it, you may need to poke around a bit or watch a couple of the tutorials to get started.

After logging into Power BI, start by getting connected to your database.

sqldbwithpowerbi-workspace

Note: You’ll want to make sure that your SQL Database’s firewall is allowing connections from Azure services.

On the next page, select Azure SQL Database, and click connect. Then fill in the server name (complete with database.windows.net part) and the database name. But before you click on next, select “Enable Advanced Options”.

sqldbwithpowerbi-adddatabase

This is important because Power BI online does not see the system/diagnostic views. So we’re going to use the queries we were working with above, to get our data. Just paste the query into the “Custom Filters” box.

Another important item to point out before we continue is that since we’re using an Azure SQL DB, we can’t have a refresh interval of less than 15 minutes. If you need more frequent updates, you may need to use Power BI Desktop.

With this form completed, click on “Next” and provide the Username and Password for your database and click on “Sign in”. Since our query is fairly small, the import of the dataset should only take a few seconds and once done, we’ll arrive at a dashboard with our SQL database pinned to it. Click on the database tile and we get a new, blank report that we can start to populate.

Along the right side of our report, select the “Line Chart” visualization and then check AvgCPU, MaxCPU, and MinCPU in the field list. Then drag Clock field to the Axis field. This setup should look something like this:

sqldbwithpowerbi-createvisualization

If done properly, we should have a visualization that looks something like…

sqldbwithpowerbi-samplevisualization

From there you can change/resize the visualization, change various properties of it (colors, labels, title, background, etc…) by click on the paint brush icon below visualizations. You can even add other visualizations to this same report by clicking away in the whitespace and adding them like so..

sqldbwithpowerbi-samplereport

It’s entirely up to you what values you want to display and how.

So there you have it

Not really much to it. The hardest part was figuring out how to connect Power BI online to the management view. I like that the Power BI desktop tool calls it a “query” and not a “filter. Less confusion there.

If I had more time, it would be neat to look at creating a content sample pack (creation of content packs is not self-service for the moment) for Azure SQL DB. But I have a couple other irons in the fire, so perhaps I can circle back on that topic for another day. J So in the interim, perhaps have a look at the SQL Sentry content pack (Note: This requires SQL Sentry’s Performance Advisor product which has a free trial).

Until next time!