Network Isolation/Security with Azure Service Fabric

There are times you really need to take things beyond the “file new” experience and implement a more advanced scenario. And with these opportunities, there are times you realize that what you need likely isn’t a “one off” kind of thing. There are larger implications to what you need that can help solve a myriad of problems. This is the story of one these scenarios.

I was recently working with a partner as they explored Service Fabric. They liked what they saw, but there was a “but” (there almost always is). This partner is in the government space, and one of the requirements they had is that all public facing services are isolated and secured from any “back end” services (in a DMZ). If you’ve been doing IT for any length of time, this shouldn’t come as news. But the question they had for me was how to do this with Service Fabric.

There were a couple ways to address this that immediately came to mind. We could deploy the front end web application as an Azure Web App, hosted in an App Service Environment that was joined to the same VNet as the Service Fabric Cluster. We could also set up two Service Fabric clusters, again joined by a single VNet. The issue with both of these is that the front and back ends of the solution would need to be deployed and managed separately. Not a huge deal admittedly. But this did complicate the provisioning and deployment processes a bit, as well as seemed to run counter to the idea of a Service Fabric “application”, composed of multiple services as a single entity. I was fortunate that I had previously engaged my friend and colleague Kal to bring his considerable Service Fabric experience into play with this partner, and he suggested a third option, one we all found fairly intriguing.

A Service Fabric cluster has Node Types which are directly related to VM Scale Sets. Taking advantage of this, we could place different node types into different subnets and place Network Security Groups (NSGs) on the subnets to provide the level of isolation the partner required. We would then use Placement Constraints to ensure that the services within an application are only hosted in the proper subnet by using constraints specific to the node type, or types, in that subnet.

We ran the idea by Mark Fussell,  the lead Project Manager of the Service Fabric team. As we talked, we realized that folks had secured a cluster from all external access, but there didn’t appear to be a public, previously documented version of what we were proposing. Mark was supportive of the idea, and even offered up that in some of the “larger” Service Fabric clusters, the placement constraint approach has been used to ensure that the services that make up the Service Fabric Cluster remain isolated from those that comprise the applications deployed within it.

Our mission clear, I set to work! We were going to create a Azure Resource Manager template to create our “DMZ’d Service Fabric Cluster”.

Network Topology 

The first step was to create the overall network topology.

image

We have the front end subnet, which has a public load balancer that would handle traffic from the internet via a load balancer. There is a back end subnet with an internal load balancer that does not allow any connections from outside of the virtual network (using a private IP). Finally, we have a management subnet that contains the cluster services, including the web portal (on port 19080) and TCP client API (19000). For good measure, we’re also going to toss an RDP jump box into this subnet so if something goes wrong with any of the nodes in the cluster, we can remote in and troubleshoot (something that I used the heck out of while crafting this template).

With this in place, we then define the VM Scale Sets, and bind their network configurations to the proper subnets as follows:

"networkInterfaceConfigurations": [ 
  { 
    "name": "[variables('nodesMgmnt')['nicName']]", 
    "properties": { 
      "ipConfigurations": [ 
        { 
          "name": "[concat(variables('nodesMgmnt')['nicName'],'-',0)]", 
          "properties": { 
            "loadBalancerBackendAddressPools": [ 
              { 
                "id": "[variables('lbMgmnt')['PoolID']]"
              } 
            ], 
            "subnet": { 
              "id": "[variables('subnetManagement')['Ref']]"
            } 
          } 
        } 
      ], 
      "primary": true
    } 
  } 
]

With the VM Scale Sets in place, then we moved on to the Service Fabric Cluster to define each Node Type. Here’s the cluster node type definition for the management subnet node type.

{ 
  "name": "[variables('nodesMgmnt')['TypeName']]", 
  "applicationPorts": { 
    "endPort": "[variables('svcFabCluster')['applicationEndPort']]", 
    "startPort": "[variables('svcFabCluster')['applicationStartPort']]"
  }, 
  "clientConnectionEndpointPort": "[variables('svcFabCluster')['tcpGatewayPort']]", 
  "durabilityLevel": "Bronze", 
  "ephemeralPorts": { 
    "endPort": "[variables('svcFabCluster')['ephemeralEndPort']]", 
    "startPort": "[variables('svcFabCluster')['ephemeralStartPort']]"
  }, 
  "httpGatewayEndpointPort": "[variables('svcFabCluster')['httpGatewayPort']]", 
  "isPrimary": true, 
  "placementProperties": { 
    "isDMZ": "false"
  },            
  "vmInstanceCount": "[variables('nodesMgmnt')['capacity']]"
} 

The “name” of this Node Type, must match the name of a VM Scale Set, that’s how the two get wired together. Since this sample is for our “management” node type, it would also be the only one with the isPrimary property set too true.

At This point, we debugged the template and made sure the cluster to ensure it was valid and the cluster would come up “green”. The next (and harder step) is to start securing the cluster.

Note: If you create a cluster via the Azure portal with multiple node types, each node type will get its own subnet. However, we were after a reusable ARM template so we had to configure things ourselves.

Network Security

Unfortunately, when we set out to create this, there wasn’t much publicly available on the ports that were needed within a fabric cluster. So we had to do some guesswork, some heavy digging, as well as make a wishes for some good luck. So in this section I’m hoping to lay out some of what we learned to save others the effort.

First off, we started by blocking all inbound connections on the three subnets. I then opened ports 19080 (used by the Service Fabric web portal) and 19000 (used by the Fabric Client and Powershell) for the “management” subnet so I could interact with the cluster remotely. This was all done via the Azure Portal, interactively so we could test the rules out then use the Resource Explorer to export them to our template. We assumed that with these rules in place, we would see some of the nodes in the cluster go “red” or unhealthy. But we didn’t!

It took a day or so, but we eventually figured out that we were seeing two separate systems collide. Firstly, when a VM is brought up, the Service Fabric extension is inserted into it. This extension then registers the node with the cluster. As part of that process there’s a series of connections that are established. These connections are not ephemeral, remaining up for the life of the node. Our mistake was in assuming these connections, like we encourage most of our partners to do when building applications, were only temporary and established when they were needed.

Since these are established, persistent connections, they are not impacted when new NSG rules are applied. This makes sense since the NSG rules are there to interrogate any new connection requests, not look over everything that’s already been established. So the nodes would remain green until we rebooted them (tearing down their connections) and they tried (and failed) to re-establish their connection to the cluster.

This sorted out, we set about trying to place the remainder of the rules in place for the subnets. We knew we wanted internet connectivity to any application/service ports in the front end, as well as application/service ports in the backend from within the VNet. But what we were missing was the ports that Service Fabric needed. We found most of these in the cluster manifest:

 
<Endpoints> 
  <ClientConnectionEndpoint Port="19000" /> 
  <LeaseDriverEndpoint Port="1026" /> 
  <ClusterConnectionEndpoint Port="1025" /> 
  <HttpGatewayEndpoint Port="19080" Protocol="http" /> 
  <ServiceConnectionEndpoint Port="1027" /> 
  <ApplicationEndpoints StartPort="20000" EndPort="30000" /> 
  <EphemeralEndpoints StartPort="49152" EndPort="65534" /> 
</Endpoints> 

This worked fine at first. We stood up the cluster with these rules properly in place and the nodes were all green. However, when we’d tried to deploy an app to the cluster, it would always time out during the copy step. I spent a couple hours troubleshooting this one to eventually realize that it was something inside the cluster that was still blocked. I spent a bit of time trying to look at WireShark and Netstat runs inside of the nodes to determine what could still be the blocker. This could have carried on for some time had it not been for Vaishnav Kidambi pointing out that Service Fabric uses SMB to copy the application/service packages around to the nodes in the cluster. We added on a rule for that, and things started to work!

Note: As a result of this work, the Service Fabric product team has acknowledged that there’s a need for better documentation on the ports used by Service Fabric. So keep an eye out for additions to the official documentation.

Here’s what the final set of inbound rules for the Network Security Group (NSG) associated with the management subnet looked like.

image

A quick rundown… I’ll start at the highest priority (at the bottom) and work my way up since that’s how the NSG applies the rules. Rule 4000 blocks all traffic into the subnet. Rule 3950 and 3960 enable RDP connections within the VNet, and to the RDP jumpbox (at internal IP 10.0.3.4) from the internet. The next three rules (3920-3940) allow the connections needed by Service Fabric within the VNet only (thus allowing all the service fabric agents on the nodes to communicate). And finally, the first two rules (3900 and 3910) open up external connections for ports 19080 and 19000. Rules 3960, 3900, and 3910 are unique to the management subnet. I’ll get to why 19000 and 19080 are unique to this subnet in a moment.

Dynamic vs Static Ports

One sidebar for a moment. Connectivity between the front and back end is restricted to a set of ports you set when you run the template (it defaults to 80 and 443). In Service Fabric terms, this is called a static port. When you build services you also have the option of asking the Fabric for a port to use, a dynamic port. As of the writing of this article, the Azure load balancer does not support these dynamic ports. So to leverage them via the load balancer and our network isolation, we’d have to have a way to update both each time a port is allocated or released. Not ideal.

My thought is that most of the use of dynamic ports is likely going to be between services that have a trusted relationship. This relationship would likely results in the services being places inside the same subnet. If you needed to expose something they were doing to the “outside world”, you will likely set up a gateway/façade service that in turn might be load balanced. Its this gateway service that would be exposed on a static port so that it can easily be reached via a load balancer and secured with NSG rules.

Restricting Service Placement

With the network topology set, and the security rules for each of the subnets sorted, next up was ensuring that application services get placed into the proper locations. Service Fabric services can be given placement constraints. These constraints, defined in the Service Manifest, are checked against Placement Properties for each node type to determine which nodes types should host a service instance. These are commonly used for things like restricting services that require more memory to nodes that have more memory available or situations where specific types of hardware are required (a GPU for example).

Each node type gets a default placement property, NodeTypeName, which you can reference in a service manifest like so.

 
<ServiceTypes> 
  <!-- This is the name of your ServiceType. 
       This name must match the string used in RegisterServiceType call in Program.cs. -->
  <StatelessServiceType ServiceTypeName="Web2Type"> 
    <PlacementConstraints>(NodeTypeName==BackEnd)</PlacementConstraints> 
  </StatelessServiceType> 
</ServiceTypes> 

Now we may want to have other constraints beyond just NodeTypeName. Placement Properties can be assigned to the various Node Types in the cluster Manifest. Or, if you’re doing this via an ARM template such as I was, you can declare them directly in the template via a property within the NodeType definition/declaration.

 
"placementProperties": { 
  "isDMZ": "true" 
},

If you look at the node type definition I used earlier, you’ll where this property collection goes. In that template “isDMZ” is false.

Combined, the placement properties, as well as the placement constraints will help ensure that each of the services will go into the subnet that has already been configured to secure host it. But this does pose a challenge. If we declare the placement constraint in the service manifest as I show above, this does restrict which clusters we can deploy the service too. If a cluster doesn’t have our placement properties declared, the service will fail to deploy. We could address this by removing and then added the placement constraints later (not ideal) or altering the cluster manifests (again not ideal). But there are two other options. First, we could craft our own definition of the application/service types and register them with the cluster, then copy the packages to the cluster.

Note: Fore more on Placement Constraints, please check out my new blog post.

This article contains a section that talks about doing this via C# or Powershell. Another option, and one I think I actually prefer (but admittedly haven’t tried), is to use a build event to alter the manifest. You can then trigger this event based on various parameters to control if it happens when you’re doing a local build, vs a cloud build. Perhaps even going so far as reading a value from the Application Parameters or Publication Profile files. But for now, I’ll need to set these aside. There’s also a third option I’m investing but I’m not confident enough to bring it up yet. I hope to eventually circle back on these.

There is one other placement constraint (I mentioned I’d get to this). There are two things unique to the management node type/subnet. The first is that it’s the only subnet I would open ports 19000 and 19080 on. The reason for this is because this is the only node type in the cluster manifest that is marked as “isPrimary”. A service fabric cluster can only have one “primary” node type. This node type is the one where all the “system” services will be placed (Naming, FileStore, Cluster Manager, etc…). So setting “isPrimary” ensures that these services will be placed into this subnet, allowing me to keep them separate from any application services. I previously mentioned that this approach was proposed by Mark Fussell of the Service Fabric team. It’s a pattern that’s used by some larger clusters to help ensure that fabric management resource demands can be scaled independently of application needs.

Between placement of the management services on the primary node type, and restricting application placement via constraints, we can now put each of our services only where we want them to be.

Using the JumpBox

A common technique in cloud solutions is to leverage a “jump box”. Allowing direct, remote access to a virtual machine is sensitive and risky. To help manage this risk, there’s usually one or more, restricted access points that are used as gatekeepers. You access one of these gatekeepers as a leaping off point to access resources inside the security boundary. We’ve set up this approach, allowing you to RDP into a jump box from which you would then  RDP into the other boxes within the VNet.

Using this template, you’ll need to address all your VM instances via IP. Since we’re using dynamic IPs within the VNet, you can RDP into a box using a fairly simple address scheme. The third area of the IP address represents the subnet you want to access (1=front end, 2=back end, 3=management) and the final area is the specific machine. Azure reserves the first three address in a subnet rate for its own use, so you can start at 4 for the VMs in the front end or management subnets. For the back end subjet, I’ve used 10.0.2.4 as the private IP for the internal load balancer. So the nodes in that subnet start at 5.

The next step would be to adapt the “allowJumpBoxRDP”security rule on the management subnet so that it only allows connections from trusted sources (say your on-prem network).

Many diet colas died to bring you this information

So there you have it. I’ll admit that on the surface it may not seem like much. But if you’ve ever built an ARM template, you know how much effort it requires. Add into this all the stuff I had to learn/discover to get it to a functional state and validate it by deploying apps to it (which required more debugging and bug fixes) and..well… we’re talking quite a bit of effort. So I’m hoping that this article and the template will help a few folks avoid what I had to go through.

The entire template (complete with jump box), can be found in my github repo. I also have a version that is a secure cluster using certificates and Azure AD. I’m going to continue to try and polish it, and I’m also looking at getting it published (with additional guidance on usage) in the Azure QuickStart Templates repository. So be sure to let me know of any suggestions or bugs you find. I’ll do my best to get them worked in.

Until next time!

PS – thank you to everyone that helped contribute to this effort: Kal, Jason, Corey, Patrick, Mike, Shenlong, Vaishnav, Chacko, and Mikkel

Advertisement

SimpleFileFetcher for Azure Start up tasks

I know some of you are waiting for more news on my “Simple Cloud Manager” Project. Its coming, I swear! Meanwhile, I had to tackle one fairly common Azure task today and wanted to share my solution with you all.

I’m working on a project where we needed an Azure PaaS start-up task that would download a remote file and unzip it for me. Now there’s several approaches out there of varying degrees of complexity. Powershell, native code, etc… But I wanted something even simpler, a black box that I could pass parameters too that required no additional assemblies/files to work. To that end I sat down and spent about 2 hours crafting the “Simple File Fetcher”.

The usage is fairly simple: simplefilefetcher -targeturi:<where to download the file from> -targetfile:<where to save it to>

You can also optionally specify a parameter that tells it to unzip the file, ‘-unzip’, and another that will make sure downloaded file is deleted, ‘-deleteoriginal’.

I spent most of the 2 hours looking at and trying various options. The final product was < 100 lines of code and now that I know how to do it, would only likely take me 10-20 minutes to rebuild (most of that spend debugging the argument parsing). So instead of boring you all with an explation of the code, I’ll just share it along with a release build of the console app so you can just use it. 🙂

Until next time!

Azure Tenant Management

Hey everyone. I’m writing this post from Redmond, WA. I’m in here on the Microsoft campus this week to meet with some of my colleagues and explore options for my first significant open source project effort. We’re here to spend three days trying to find ways to manage the allocation of, and monitor the billable usage of Azure resources. There are two high level objectives for this effort:

  • Allow non-technical users to provision and access Azure resources without the need to understand Azure’s terminology and even granting direct access to the underlying Azure subscription
  • Monitor the billable usage of the allocated resources, and track against a “cap”

On the surface this seems pretty simple. That is, until you realize that not all the Azure services expose individual resource usage, and of the ones that do, they all do it separately. It gets even more complicated when you realize that you may need to do things like have users “share” cloud services and even storage accounts.

Couple examples

So let’s dive into this a little more and explore a couple of the possible use cases I’ve heard from customers.

Journalism Students

Scenario: We have a series of journalism students that will be learning to provision and maintain a online content management system. They need someplace to host their website, upload content, and need GIT integration so they can version any code changes they make to their site’s code. The instructor for the course starts by creating an individual web site for them with a basic CMS system. The student will then access this workspace as they perform their coursework.

Challenges: Depending on the languages they are using, we may be able get by Azure Web Sites. But this only allows us up to 10 “free” web sites, and what happens to the other students. Additionally, students don’t know anything about the different SKUs available and just want things that work, so do we need to provide “warm-up” and auto-scaling? Additionally, since the instructor is setting up the web sites for them, we need a simple way for the instructor to get the resources provisioned, and give the students access to it without the instructor needing to even be aware of the Azure subscriptions. We also need to track compute and bandwidth usage on the individual web site.

Software Testers

Scenario: A small company has remote testers that perform quality assurance work on software. These workers are distributed but need to remote into Windows VMs to run tests. Ideally, these VMs will be hosted “in the cloud”, and the company wants a simple façade whereby the workers can select which software they need to test and then provision a virtual machine for them. The projects being tested should be “billed back” for the resources used by the testers and the testers work on multiple projects. Additionally, the testers should be able to focus on the work they have to do, not how to manage and provision Azure resources.

Challenges: This one will likely be Azure Virtual Machines. But we need to juggle not only compute/bandwidth, but track impact on storage (transactions and total amount stored) as well. We also need to be able to provision VMs from a selection of customer gallery images and get them running for the testers, sometimes across subscriptions. Finally, we need to be aware of challenges with regards to VM endpoints and cloud services if we to maximize the density of these VMs.

Business Intelligence

Scenario: Students are learning to use Oracle databases to analyze trends. The instructor is using the base Oracle database images from the Azure gallery but has added various tools and sample datasets to them for the students to use. The students will use the virtual machines for various labs over the duration of the course and each lab should only take a few hours.

Challenges: If these VMs were kept running 24×7, it would costs thousands of dollars per month per student. So we need to make sure we can automate the start and stop of the VMs to help control these costs. And since the Oracle licensing fees appear as a separate charge, we need to be able to predict these as well based on current rates and the amount of time the VM was active.

So what are we going to do about it?

In short, my plan is to create a framework that we will release via open source to help fill some of these gaps. A simple, web based user interface for accessing your allocated resources, some back end services that monitor resource usage and track that against quotas set by solution administrators. Underneath all that, a system that allows you to “tag” resources as associated with users or specific projects. If all goes well, I hope to have the first version of this framework published and available by the end of March, 2015 that will focus on Azure Web Sites, IaaS VMs, and Azure Storage.

However, and this is the point of my little announcement post, we’re not going to make you wait until this is done. As this project progresses, I plan to regularly post here and in January we’ll hopefully have a GIT repository where you’ll be able to check out the work as we progress. Furthermore, I plan to actively work with organizations that want to use this solution so that our initial version will not be the only one.

So look for more on this in the next couple weeks as we share our learnings and plans. But also, let me know via the comments if this is something you see value in and what scenario you may have. And oh, we still don’t have a name for the framework yet. So please post a comment or tweet @brentcodemonkey with your ideas. J

Until next time!

ARR as a highly available reverse proxy in Windows Azure

With the general availability of Windows Azure’s IaaS solution last year, we’ve seen a significant uptake in migration of legacy solutions to the Windows Azure platform. And with the even more recent announcement of our agreement with Oracle for them to support their products on Microsoft’s hypervisor technology, Hyper-V, we have a whole new category of apps we are being asked to help move to Windows Azure. One common pattern that’s been emerging is for the need for Linux/Apache/Java solutions to run in Azure at the same level of “density” that is available via traditional hosting providers. If you were an ISV (Independent Software Vendor) hosting solutions for individual customers, you may choose to accomplish this by giving each customer a unique URI and binding that to a specific Apache module, sometimes based on a unique IP address that is associated with a customer specific URL and a unique SSL certificate. This results in a scenario that requires multiple IP’s per server.

As you may have heard, the internet starting to run a bit short on IP addresses. So supporting multiple public IP’s per server is a difficult proposition for a cloud, as well as some traditional hosting providers. To that end we’ve seen new technologies emerge such as SNI (Server Name Indication) and use of more and more proxy and request routing solutions like HaProxy, FreeBSD, Microsoft’s Application Request Routing (ARR). This is also complicated by the need for delivery highly available, fault tolerant solutions that can load balancing client traffic. This isn’t a always an easy problem to solve, especially using just application centric approaches. They require intelligent, configurable proxies and/or load balancers. Precisely the kind of low level management the cloud is supposed to help us get away from.

But today, I’m here to share one solution I created for a customer that I think addresses some of this need. Using Microsoft’s ARR modules for IIS, hosted in Windows Azure’s IaaS service, as a reverse proxy for a high-density application hosting solution.

Disclaimer: This article assumes you are familiar with creating/provisioning virtual machines in Windows Azure and then remoting into them to further alter their configurations. Additionally, you will need a basic understanding of IIS and how to make changes to it via the IIS Manager console. I’m also aware of there being a myriad of ways to accomplish what we’re trying to do with this solution. This is simply one possible solution.

Overview of the Scenario and proposed solution

Here’s the outline of a potential customer’s scenario:

  • We have two or more virtual machines hosted in Windows Azure that are configured for high availability. Each of these virtual machines is identical, and hosts several web applications.
  • The web applications consist of two types:
    • Stateful web sites, accessed by end users via a web browser
    • Stateless APIs accessed by a “rich client” running natively on a mobile device
  • The “state” of the web sites is stored in an in-memory user session store that is specific to the machine on which the session was started. So all subsequent requests made during that session must be routed to the same server. This is referred to as ‘session affinity’ or ‘sticky sessions’.
  • All client requests will be over SSL (on port 443), to a unique URL specific to a given application/customer.
  • Each site/URL has its own unique SSL certificate
  • SSL Offloading (decryption of HTTPS traffic prior to its receipt by the web application) should be enabled to reduce the load on the web servers.

As you can guess based on the title of this article my intent is to solve this problem using Application Request Routing (aka ARR), a free plug-in for Windows Server IIS. ARR is an incredibly powerful utility that can be used to do many things, including acting as a reverse proxy to help route requests in a way that is completely transparent to the requestor. Combined with other features of IIS 8.0, it is able to meet the needs of the scenario we just outlined.

For my POC, I use four virtual machines within a single Windows Azure cloud service (a cloud service is simply a container that virtual machines can be placed into that provides a level of network isolation). On-premises we had the availability provided by the “titanium eggshell” that is robust hardware, but in the cloud we need to protect ourselves from potential outages by running multiple instances configured to help minimize downtimes. To be covered by Windows Azure’s 99.95% uptime SLA, I am required to run multiple virtual machine instances placed into an availability set. But since the Windows Azure Load Balancer doesn’t support sticky sessions, I need something in the mix to deliver this functionality.

The POC will consist of two layers, the ARR based Reverse Proxy layer, and the web servers. To get the Windows Azure SLA, each layer will have two virtual machines: two running ARR with public endpoints for SSL traffic (port 443) and two set up as our web servers, but since these will sit behind our reverse proxy, they will not have any public endpoints (outside of remote desktop to help with initial setup). Requests will come in from various clients (web browsers or devices) and arrive at the Windows Azure Load Balancer. The load balancer will then distribute the traffic equally across our two reserve proxy virtual machines where the requests are processed by IIS and ARR and routed based on the rules we will configure to the proper applications on the web servers, each running on a unique port. Optionally, ARR will also handle the routing of requests to a specific web server, ensuring that “session affinity” is maintained. The following diagram illustrates the solution.

The focus on this article in on how we can leverage ARR to fulfill the scenario in a way that’s “cloud friendly”. So while the original customer scenario called for Linux/Apache servers, I’m going to use Windows Server/IIS for this POC. This is purely a decision of convenience since it has been a LONG time since I set up a Linux/Apache web server. Additionally, while the original scenario called for multiple customers, each with their own web applications/modules (as shown in the diagram), I just need to demonstrate the URI to specific application routing. So as you’ll see in later in the article, I’m just going to set up a couple of web applications.

Note: While we can have more than two web servers, I’ve limited the POC to two for the sake of simplicity. If you want to run, 3, 10, or 25, it’s just a matter of creating the additional servers and adding them to the ARR web farms as we’ll be doing later in this article.

Setting up the Servers in Windows Azure

If you’re used to setting up Virtual Machines in Windows Azure, this is fairly straight forward. We start by creating a cloud service and two storage accounts. The reason for the two is that I really want to try and maximize the uptime of the solution. And if all the VM’s had their hard-drives in a single storage account and that account experienced a sustained service interruption, my entire solution could be taken-offline.

NOTE: The approach to use multiple storage accounts does not guarantee availability. This is a personal preference to help, even if in some small part, mitigate potential risk.

You can also go so far as to define a virtual network for the machines with separate subnets for the front and back end. However, this should not be required for the solution to work as the cloud service container gives us DNS resolution within its boundaries. However, the virtual network can be used to help manage visibility and security of the different virtual machine instances.

Once the storage accounts are created, I create the first of our two “front end” ARR servers by provisioning a new Windows Server 2012 virtual machine instance. I give it a meaningful name like “ARRFrontEnd01” and make sure that I also create an availability set and define a HTTPS endpoint on port 443. If you’re using the Management portal, be sure to select the “from gallery” option as opposed to ‘quick create’ as it will give you additional options when provisioning the VM instance and allow you to more easily set the cloud service, availability set, and storage account. After the first virtual machine is created, create a second, perhaps “ARRFrontEnd02”, and “attach” it to the first instance by associating it with the endpoint we created while provisioning the previous instance.

Once our “front end” machines are provisioned, we set up two more Windows Server 2012 instances for our web servers, “WebServer01” and “WebServer02”. However, since these machines will be behind our front end servers, we won’t declare any public endpoints for ports 80 or 443, just leave the defaults.

When complete, we should have four virtual machine instances, two that are load balanced via Windows Azure on port 433 and will act as our ARR front end servers and our two that will act as our web servers.

Now before we can really start setting things up, we’ll need to remote desktop into each of these servers and add a few roles. When we log on, we should see the Server Manager dashboard. Select “Add roles and features” from the “configure this local server” box.

In the “Add Roles and Features” wizard, skip over the “Before you Begin” (if you get it), and select the role-based installation type.

On the next page, we’ll select the current server from the server pool (the default) and proceed to adding the “Web Server (IIS)” server role.

This will pop-up another dialog confirming the features we want added. Namely the Management Tools and IIS Management Console. So take the defaults here and click “Add Features” to proceed.

The next page in the Wizard is “Select Features”. We’ve already selected what we needed when we added the role, so click on “Next” until you arrive at the “Select Role Services”. There are two optional role services here I’d recommend you consider adding. Health and Diagnostic Tracing will be helpful if we have to troubleshoot our ARR configuration later and The IIS Management Scripts and Tools will be essential if we want to automate the setup of any of this at a later date (but that’s another blog post for another day). Below is a composite image that shows these options selected.

It’s also a good idea to double-check here and make sure that the IIS Management Console is selected. It should be by default since it was part of the role features we included earlier. But it doesn’t hurt to be safe. J

With all this complete, go ahead and create several sites on the two web servers. We can leave the default site on port 80, but create two more HTTP sites. I used 8080 and 8090 for the two sites, but feel free to pick available ports that meet your needs. Just be sure to go into the firewall settings of that server enable inbound connections on these ports. I also went into the sites and changed the HTML so I could tell which server and which app I was getting results back from (something like “Web1 – SiteA” works fine).

Lastly, test the web sites from our two front end servers to make sure they can connect by logging into those servers and opening a web browser and enter in the proper address. This will be something like HTTP://<servername>:8080/iisstart.htm. The ‘servername’ parameter is simply the name we gave the virtual machine when it was provisioned. Make sure that you can hit both servers and all three apps from both of our proxy servers before proceeding. If these fail to connect, the most likely cause is an issue in the way the IIS site was defined, or an issue with the firewall configuration on the web server preventing the requests from being received.

Install ARR and setting up for HA

With our server environment now configured, and some basic web sites we can balance traffic against, it’s time to define our proxy servers. We start by installing ARR 3.0 (the latest version as of this writing and compatible with IIS 8.0. You can download it from here, or install it via the Web Platform Installer (WebPI). I would recommend this option, as WebPI will also install any dependencies and can be scripted. Fortunately, when you open up the IIS Manager for the first time and select the server, it will ask if you want to install the “Microsoft Web Platform” and open up a browser to allow you to download it. After a few adding a few web sites to the ‘trusted zone’ (and enabling file downloads when in the ‘internet’ zone), you’ll be able to download and install this helpful tool. Once installed, run it and enter “Application Request” into the search bar. We want to select version 3.0.

Now that ARR is installed (which we have to do on both of our proxy servers), let’s talk about setting this up for high availability. We hopefully placed both or proxy servers into an availability set and load balanced the 443 endpoint as mentioned above. This allows both servers to act as our proxy. But we have two possible challenges yet:

  1. How to maintain the ARR setup across two servers
  2. Ensure that session affinity (aka sticky sessions) works with multiple, load balanced ARR servers

Fortunately, there’s a couple of decent
blog posts on IIS.NET about this subject. Unfortunately, these appear to have been written by folks that are familiar with IIS, networking, pings and pipes, and a host of other items. But as always, I’m here to try and help cut through all that and put this stuff in terms that we can all relate too. And hopefully in such a way that we don’t lose any important details.

To leverage Windows Azure’s compute SLA, we will need to run two instances of our ARR machines and place them into an availability set. We set up both these servers earlier, and hopefully properly placed them into an availability set with a load balanced endpoint on port 443. This allows the Windows Azure fabric to load balanced traffic between the two instances. Also, should updates to the host server (where our VMs run) or the fabric components be necessary, we can minimize the risk of both ARR servers being taken offline at the same time.

This configuration leads us to the options highlighted in the blog post I linked previously, “Using Multiple Instances of Application Request Routing (AAR) Servers“. The article discusses using Shared Configuration and External Cache. A Shared Configuration allows two ARR servers to share their confiurations. By leveraging a shared configuration, changes made to one ARR server will automatically be leveraged by the other because both servers will share a single applicationhost.config file. The External Cache is used to allow both ARR servers to share affinity settings. So if a client’s first request is sent to a given back end web server, then all subsequent requests will be sent to that same back end server regardless of which ARR server receives the request.

For this POC, I decided not to use either option. Both require a shared network location. I could put this on either ARR server, but this creates a single point of failure. And since our objective is to ensure the solution remains as available as possible, I didn’t want to take a dependency that would ultimately reduce the potential availability of the overall solution. As for the external cache, for this POC I only wanted to have server affinity for one of the two web sites since the POC is mocking up both round-robin load balancing for requests that may be more like an API. For requests that are from a web browser, instead of using shared cache, we’ll use “client affinity”. This option returns a browser cookie that contains all the routing information needed by ARR to ensure that subsequent requests are sent to the same back end server. This is the same approach used by the Windows Azure Java SDK and Windows Azure Web Sites.

So to make a long story short, if we’ve properly set up our two ARR server in an availability set, with load balanced endpoints, there’s no additional high level configuration necessary to set up the options highlighted in the “multiple instances” article. We can get what we need within ARR itself.

Configure our ARR Web Farms

I realize I’ve been fairly high level with my setup instructions so far. But many of these steps have been fairly well documented and up until this point we’ve been painting with a fairly broad brush. But going forward I’m going to get more detailed since it’s important that we properly set this all up. Just remember, that each of the steps going forward will need to be executed on each of our ARR servers since we opted not to leverage the Shared Configuration.

The first step after our servers have been set up is to configure the web farms. Open the IIS Manager on one of our ARR servers and (provided our ARR 3.0 install was completed successfully), we should see the “Server Farm” node. Right-click on that node and select “Create Server Farm” from the pop-up menu as shown in the image at the right. A Server Farm is a collection of servers that we will have ARR route traffic to. It’s the definition of this farm that will control aspects like request affinity and load balancing behaviors as well as which servers will receive traffic.

The first step in setting up the farm is to add our web servers. Now in building my initial POC, this is the piece that caused me the most difficulty. Not because creating the server farm was difficult, but because there’s one thing that’s not apparent to those of us that aren’t intimately familiar with web hosting and server farms. Namely that we need to consider a server farm to be specific to one of our applications. It’s this understanding that helps us realize that we need the definition of the server farm to help us route requests coming to the ARR server on one port, to be routed to the proper port(s) on the destination back end servers. We’ll do this as we add each server to the farm using the following steps…

After clicking on “Create Server Farm”, provide a name for the farm. Something suitable of course…

After entering the farm name and clicking on the “Next” button, we’ll be presented with the “Add Server” dialog. In this box, we’ll enter in the name of each of our back end servers but more importantly we need to make sure we expand the “Advanced Settings” options so we can also specify the port on that server we want to target. In my case, I’m going to a ‘Web1’, the name of the server I want to add and I want to set ‘httpPort’ to 8080.

We’re able to do this because Windows Azure handles DNS resolution for the servers I added to the cloud service. And since they’re all in the same cloud service, we can address each server on any ports those servers will allow. There’s no need to define endpoints for connections between servers in the same cloud service. So we’ll complete the process by clicking on the ‘Add’ button and then doing the same for my second web server, ‘Web2’. We’ll receive a prompt about the creation of a default a rewrite rule, click on the “No” button to close the dialog.

It’s important to set the ‘httpPort’ when we add the servers. I’ve been unable to find a way to change this port via the IIS Manager UI once the server has been added. Yes you can change it via appcmd, powershell, or even directly editing the applicationhost.config, but that’s a topic for another day. J

Now to set the load balancing behavior and affinity we talked about earlier, we select the newly created server farm from the tree and we’ll see the icons presented below:

If we double-click on the Load Balance icon, it will open a dialog box that allows us to select from the available load balancing algorithms. For the needs of this POC, Least Recent Request and Weighted Round Robin would both work suitably. Select the algorithm you prefer and click on “Apply”. To set the cookie based client affinity I mentioned earlier, you can double click on the “Server Affinity” option and then check the box for “Client Affinity”.

The final item that we will make sure is enabled here is SSL Offloading. We can verify this by double-clicking on “Routing Rules” and verifying that “Enabled SSL Offloading” is checked which is should be by default.

Now it’s a matter of repeating this process for our second application (I put it on port 8090) as well as setting up the same two farms on the other ARR server.

Setting up the URL Rewrite Rule

The next step is to set up the URL rewrite rule that will tell ARR how to route requests for each of our applications to the proper web farm. But before we can do that, we need to make sure we have two unique URI’s, one for each of our applications. If you scroll up and refer to the diagram that provides the overview of our solution, you’ll see that an end user request to the solution are directed at custaweb.somedomian.com and device api calls are directed to custbweb.somedomain.com. So we will need to create an aliasing DNS entry for these names and alias them to the *.cloudapp.net URI that is the entry point of the cloud service where this solution resides. We can’t use just a forwarding address for this but need a true CNAME alias.

Presuming that has already been setup, we’re ready to create the URL rule for our re-write behavior.

We’ll start by selecting the web server itself in the IIS server manager and double clicking the URL Rewrite icon as shown below.

This will open the list of URL rewrite rules, and we’ll select “add rules…” form the action menu on the right. Select to create a blank inbound rule. Give the rule an appropriate name, and complete the sections as shown in the following images.

Matching URL

This section details what incoming request URI’s this rule should be applied too. I have set it up so that all inbound requests will be evaluated.

Conditions

Now as it stands, this rule would route nearly any request. So we need have to add a condition to the rule to associate it with a specific request URL. We need to expand the “Conditions” section and click on “Add…”. We specify “{HTTP_HOST}” as the input condition (what to check) and set the condition’s type is a simple pattern match. And for the pattern itself, I opted to use a regular expression that looks at the first part of the domain name and makes sure it contains the value “^custAweb.*” (as we highlighted in the diagram at the top). In this way we ensure that the rule will only be applied to one of the two URI’s in our sample.

Action

The final piece of the rule is to define the action. For our type, we’ll select “Route to Server Farm”, keep HTTP as the scheme, and specify the appropriate server farm. And for the path, we’ll leave the default value of “/{R:0}”. The final piece of this tells ARR to add any paths or parameters that were in the request URL to the forwarded request.

Lastly, we have the option of telling ARR that if we execute this rule, we should not process any subsequent rules. This can be checked or unchecked depending on your needs. You may desire to set up a “default” page for requests that don’t meet any of our other rules. In which case just make sure you don’t “stop processing of subsequent rules” and place that default rule at the bottom of the list.

This completes the basics of setting up of our ARR based reverse proxy. Only one more step remains.

Setting up SNI and SSL Offload

Now that we have the ARR URL Rewrite rules in place, we need to get all the messiness with the certificates out of the way. We’ll assume, for the sake of argument, that we’ve already created a certificate and added it to the proper local machine certificate store. If you’re unsure how to do this, you can find some instructions in this article.

We start by creating web site for the inbound URL. Select the server in the IIS Manager and right-click it to get the pop-up menu. This open the “Add Website” dialog which we will complete to set up the site.

Below you’ll find some settings I used. The site name is just a descriptive name that will appear in the IIS manager. For the physical path, I specified the same path as the “default” site that was created when we installed IIS. We could specify our own site, but that’s really not necessary unless you want to have a placeholder page in case something goes wrong with the ARR URL Rewrite rules. And since we’re doing SSL for this site, be sure to set the binding type to ‘https’ and specify the host name that matches the inbound URL that external clients will use (aka our CNAME). Finally, be sure to check “Require Server Name Indication” to make sure we support Server Name Indication (SNI).

And that’s really all there is to it. SSL offloading was already configured for us by default when we created the server farm (feel free to go back and look for the checkbox). So all we had to do was make sure we had a site defined in IIS that could be used to resolve the certificate. This will process the encryption duties, then ARR will pick up the request for processing against our rules.

Debugging ARR

So if we’ve done everything correctly, it should just work. But if it doesn’t, debugging ARR can be a bit of a challenge. You may recall that back when we installed ARR, I suggested also installing the tracing and logging features. If you did, these can be used to help troubleshoot some issue as outlined in this article from IIS.NET. While this is helpful, I also wanted to leave you with one other tip I ran across. If possible, use a browser on the server we’re configured ARR on to access the various web sites locally. While this won’t do any routing unless you set up some local DNS entries to help with resolving to the local machine, it will show you more than a stock “500” error. By accessing the local IIS server from within, we can get more detailed error messages that help us understand what may be wrong with our rules. It won’t allow you to fix everything, but could sometimes be helpful.

I wish I had more for you on this, but ARR is admittedly a HUGE topic, especially for something that’s a ‘free’ add-on to IIS. This blog post is the results of several days of experimentation and self-learning. And even with this time invested, I would never presume to call myself an expert on this subject. So please forgive if I didn’t get into enough depth.

With this, I’ll call this article to a close. I hope you find this information useful and I hope to revisit this topic again soon. One item I’m still keenly interested in is how to automate these tasks. Something that will be extremely useful for anyone that has to provision new ‘apps’ into our server farm on a regular basis. Until next time then!

Postscript

I started this post in October 2013 and apologize for the delay in getting it out. We were hoping to get it published as a full-fledge magazine article but it just didn’t work out. So I’m really happy to finally get this out “in the wild”. I’d also like to give props to Greg, Gil, David, and Ryan for helping do technical reviews. They were a great help but I’m solely responsible for any grammar or spelling issues contained here-in. If you see something, please call it out in the comments or email me and I’m happy to make corrections.

This will also hopefully be the first of a few ARR related posts/project I plan to share over the next few weeks/months. Enjoy!

Local File Cache in Windows Azure

 

When creating a traditional on-premise application, it’s not uncommon to leverage the local file system as a place to store temporary files and thus increase system performance. But with Windows Azure Cloud Services, we’ve been taught that we shouldn’t write things to disk because the virtual machines that host our services aren’t durable. So we start going to remote durable storage for everything. This slows down our applications so we need to add back in some type of cache solution.

Previously, I discussed using the Windows Azure Caching Preview to create a distributed, in-memory cache. I love that we finally have a simple way to do to this. But there are times when I think that caching something, for example an image file that doesn’t change often, within a single instance would be fine, especially if I don’t have to use up precious RAM on my virtual machines.

Well there is an option! Windows Azure Cloud Services all include, at no additional cost, an allocation of non-durable local disk space called surprisingly enough “Local Storage”. For each core you get 250gb of essentially temporary disk space. And with a bit of investment, we can leverage that space as a local, file backed cache.

Extending System.Runtime.Caching

So .NET 4.0 introduced the System.Runtime.Caching namespace along with a template base class ObjectCache that can be extended to provide caching functionality with whatever storage system we want to use. Now this namespace also provides a concrete implementation called MemoryCache, but we want to use the file system. So we’ll create our own implementation called FileCache class.

Note: There’s already a codeplex project that provides a file based implementation of ObjectCache. But I still wanted to role my own for the sake of explaining some of the challenges that will arise.

So I create a class library and add a reference to System.Runtime.Caching. Next up, let’s rename the default class “Class1.cs” to “FileCache.cs”. Lastly, inside of the FileCache class, I’ll add a using statement for the Caching namespace and make sure my new class inherits from ObjectCache.

Now if we try to build the class library now, things wouldn’t go very well because there are 18 different abstract members we need to implement. Fortunately I’m running the Visual Studio Power Tools so it’s just a matter of right-clicking on ObjectCache where I indicated I’m inheriting from it and selecting the “Implement Abstract Class”. This gives us shells for all 18 abstract members, but until we add some real implementation in, our FileCache class won’t even be minimally useful.

I’ll start by fleshing out the Get method and adding a public property, CacheRootPath, to the class that designates where our file cache will be kept.

public string CacheRootPath
{
    get { return cacheRoot.FullName; }
    set
    {
        cacheRoot = new DirectoryInfo(value);
        if (!cacheRoot.Exists) // create if it doesn't exist
            cacheRoot.Create();
    }
}

public override bool Contains(string key, string regionName = null)
{
    string fullFileName = GetItemFileName(key,regionName);
    FileInfo fileInfo = null;

    if (File.Exists(fullFileName))
    {
        fileInfo = new FileInfo(fullFileName);

        // if item has expired, don't return it
        //TODO: 
        return true;
    }
    else
        return false;
}

// return type is an object, but we'll always return a stream
public override object Get(string key, string regionName = null)
{
    if (Contains(key, regionName))
    {
        //TODO: wrap this in some exception handling
        MemoryStream memStream = new MemoryStream();
        FileStream fileStream = new FileStream(GetItemFileName(key, regionName), FileMode.Open);
        fileStream.CopyTo(memStream);
        fileStream.Close();

        return memStream;
    }
    else
        return null;
}

CacheRootPath is just a way for us to set the path to where our cache will be stored. The Contains method is a way to check and see if the file exists in the cache (and ideally should also be where we check to make sure the object isn’t expired), and the Get method leverages Contains to see if the item exists in the cache and retrieves it if it exists.

Now this is where I had my fist real decision to make. Get must return an object, but what type of object should I return. In my case I opted to return a memory stream.  Now I could have returned a file stream that was attached to the file on disk, but because this could lock access to file, I wanted to have explicit control of that stream. Hence I opted to copy the file stream to a memory stream and return that to the caller.

You may also note that I left the expiration check alone. I did this for the demo because your needs for file expiration may differ. You could base this on FileInfo.CreationTimeUTC, or FileInfo.LastAccessTimeUTC. both are valid as may be any other meta data you need to base it on. I do recommend one thing, make a separate method that does the expiration check. We will use it later.

Note: I’m specifically calling out the use of UTC. When in Windows Azure, UTC is your friend. Try to use it whenever possible.

Next up, we have to shell out the three overloaded versions of AddOrGetExisting. These methods are important because even though I won’t be directly accessing them in my implementation, they are leveraged by base cass Add method. And thus, these methods are how we add items into the cache. The first two overloaded methods will call the lowest level implementation.

public override object AddOrGetExisting(string key, object value, CacheItemPolicy policy, string regionName = null)
{
    if (!(value is Stream))
        throw new ArgumentException("value parameter is not of type Stream");

    return this.AddOrGetExisting(key, value, policy.AbsoluteExpiration, regionName);
}

public override CacheItem AddOrGetExisting(CacheItem value, CacheItemPolicy policy)
{
    var tmpValue = this.AddOrGetExisting(value.Key, value.Value, policy.AbsoluteExpiration, value.RegionName);
    if (tmpValue != null)
        return new CacheItem(value.Key, (Stream)tmpValue);
    else
        return null;
}

The key item to note here is that in the first method, I do a check on the object to make sure I’m receiving a stream. Again, that was my design choice since I want to deal with the streams.

The final overload is where all the heavy work is…

public override object AddOrGetExisting(string key, object value, DateTimeOffset absoluteExpiration, string regionName = null)
{
    if (!(value is Stream))
        throw new ArgumentException("value parameter is not of type Stream");

    // if object exists, get it
    object tmpValue = this.Get(key, regionName);
    if (tmpValue != null)
        return tmpValue;
    else
    {
        //TODO: wrap this in some exception handling

        // create subfolder for region if it was specified
        if (regionName != null)
            cacheRoot.CreateSubdirectory(regionName);

        // add object to cache
        FileStream fileStream = File.Open(GetItemFileName(key, regionName), FileMode.Create);

        ((Stream)value).CopyTo(fileStream);
        fileStream.Flush();
        fileStream.Close();

        return null; // successfully added
    }
}

We start by checking to see if the object already exists and return it if found in the cache. Then we create a subdirectory if we have a region (region implementation isn’t required). Finally, we copy the value passed in to our file and save it. There really should be some exception handling in here to make sure we’re handling things in a way that’s a little more thread save (what if the file gets created between when we check for it and start the write). And the get should be checking to make sure the file isn’t already open when doing its read. But I’m sure you can finish that out.

Now there’s still about a dozen other methods that need to be fleshed out eventually. But these give us our basic get and add functions. What’s still missing is handling evictions from the cache. For that we’re going to use a timer.

public FileCache() : base()
{
    System.Threading.TimerCallback TimerDelegate = new System.Threading.TimerCallback(TimerTask);

    // time values should be based on polling interval
    timerItem = new System.Threading.Timer(TimerDelegate, null, 2000, 2000);
}

private void TimerTask(object StateObj)
{
    int a = 1;
    // check file system for size and if over, remove older objects

    //TODO: check polling interval and update timer if its changed
}

We’ll update the FileCache constructor to create a delegate using our new TimerTask method, and pass that into a Timer object. This will execute the TimeTask method and regular intervals in a separate thread. I’m using a hard-coded value, but we really should check to see we have a specific polling interval set. Course we should also put some code into this method so it actually does things like check to see how much room we have in the cache and evict expired items(by checking via the private method I suggested earlier), etc…

The Implementation

With our custom caching class done (well not done but at least to a point where its minimally functional), its time to implement it. For this, I opted to setup an MVC Web Role that allows folks to upload an image file to Windows Azure Blob storage. Then, via a WCF/REST based service, it would retrieve the images twice. The first retrieval would be without using caching, the second would be with caching. I won’t bore you with all the details of this setup, so we’ll focus on just the wiring up of our custom FileCache.

We start appropriately enough with the role’s Global.asax.cs file where we add public property that represents out cache (so its available anywhere in the web application):

public static Caching.FileCache globalFileCache = new Caching.FileCache();

And then I update the Application_Start method to retrieve our LocalResource setting and use it to set the CacheRootPath property of our caching object.

protected void Application_Start()
{
    AreaRegistration.RegisterAllAreas();

    RegisterGlobalFilters(GlobalFilters.Filters);
    RegisterRoutes(RouteTable.Routes);

    Microsoft.WindowsAzure.CloudStorageAccount.SetConfigurationSettingPublisher(
        (configName, configSetter) =>
            configSetter(RoleEnvironment.GetConfigurationSettingValue(configName))
    );

    globalFileCache.CacheRootPath = RoleEnvironment.GetLocalResource("filecache").RootPath;
}

Now ideally we could make it so that the CacheRootPath instead accepted the LocalResource object returned by GetLocalResource. This would then also mean that our FileCache could easily manage against the maximum size of the local storage resource. But I figured we’d keep any Windows Azure specific dependencies out of this base class and maybe later look at creating a WindowsAzureLocalResourceCache object. But that’s a task for another day.

Ok, now to wire up the cache into the service that will retrieve the blobs. Lets start with the basic implementation:

public Stream GetImage(string Name, string container, bool useCache)
{
    Stream tmpStream = null; // could end up being a filestream or a memory stream

    var account = CloudStorageAccount.FromConfigurationSetting("ImageStorage"); 
    CloudBlobClient blobStorage = account.CreateCloudBlobClient();
    CloudBlob blob = blobStorage.GetBlobReference(string.Format(@"{0}/{1}", container, Name));
    tmpStream = new MemoryStream();
    blob.DownloadToStream(tmpStream);

    WebOperationContext.Current.OutgoingResponse.ContentType = "image/jpeg";
    tmpStream.Seek(0, 0); // make sure we start the beginning
    return tmpStream;
}

This method takes the name of a blob and its container, as well as a useCache parameter (which we’ll implement in a moment). It uses the first two values to get the blob and download it to a stream which is then returned to the caller with a content type of “image/jpeg” so it can be rendered by the browser properly.

To implement our cache we just need to add a few things. Before we try to set up the CloudStorageAccount, we’ll add these lines:

// if we're using the cache, lets try to get the file from there
if (useCache)
    tmpStream = (Stream)MvcApplication.globalFileCache.Get(Name);

if (tmpStream == null)
{

This code tries to use the globalFileCache object we defined n the Global.asax.cs file and retrieve the blob from the cache if it exists, providing we told the method useCache=true. If we couldn’t find the file (tmpStream == null), we’ll then fall into the block we had previously that will retrieve the blob image and return it.

But we still have to add in the code to add the blob to the cache. We’ll do right after we DownloadToStream:

    // "fork off" the adding of the object to the cache so we don't have to wait for this
    Task tsk = Task.Factory.StartNew(() =>
    {
        Stream saveStream = new MemoryStream();
        blob.DownloadToStream(saveStream);
        saveStream.Seek(0, 0); // make sure we start the beginning
        MvcApplication.globalFileCache.Add(Name, saveStream, new DateTimeOffset(DateTime.Now.AddHours(1)));
    });
}

This uses an async task to add the blob to the cache. We do this with asynchronously so that we don’t block returning the blob back to the requestor while the write to disk completes. We want this service to return the file back as quickly as possible.

And that does it for our implementation. Now to testing it.

Fiddler is your friend

Earlier, you may have found yourself saying “self, why did he use a service for his implementation”. I did this because I wanted to use Fiddler to measure the performance of calls to retrieve the blob with and without caching. And by putting it in a service and letting fiddler monitor the response times, I didn’t have to write up my own client and put timings around it.

To test my implementation, I fired up fiddler and then launched the service. We should see calls in Fiddler to SimpleService.svc/GetImage, one with cache=false and one with cache=true. If we select those items, and select the Statistics tab, we should see some significant differences in the “Overall Elapsed” times of each call. In my little tests, I was seeing anywhere from a 50-90% reduction in the elapsed time.

image

In fact, if you run the tests several times by hitting refresh on the page, you may even notice that the first time you hit Windows Azure storage for a particular blob, you may have additional delay compare to subsequent calls. Its only a guess but we may be seeing Windows Azure storage doing some of its own internal caching there.

So hopefully I’ve described things well enough here and you can follow what we’ve done. But if not, I’m posting the code for you to reuse. Just make sure you update the storage account settings and please please please finish the half started implementation I’m providing you.

Here’s to speedy responses thanks to caching. Until next time.

Partial Service Upgrades

So I as working on an answer for a stack overflow question yesterday and realized it was a topic that I hadn’t put down in my blog yet. So rather than just answer the question, I figured I’d blog about it here so I could include some screen shots and further explanation. The question was essentially how can I control the deployment of individual parts of my service.

So for this, I create a simple Windows Azure service with a Web Role, and a Worker Role. Its already up and hosted when we start this.

NOTE: this post only details doing this via the windows.azure.com portal. We’ll leave doing I programmatically via the management API for another day.

Upgrading a Single Role

imageThis is actually surprising simple. I open up the service, and select the role (not instances) I want to upgrade. Then we can right click and select upgrade, or click on the “upgrade” button on the toolbar.

Either option will launch the “Upgrade Deployment” diaglog box. If you look at this box (and presuming you have the role selected, you’ll notice that in the box, the “Role to Upgrade” option will list the role you had selected. If you didn’t properly select the role, this may list “All”.

Take a peek at the highlighted section of the following screen shot for an example of how this should look.

image

Note: while creating this post, I did receive an unhandled exception message from the silverlight portal. This has been reported to MSFT and I’ll update his when I get a response.

Manual Upgrades

I’ve run out of time today, but next time I’d like to cover doing a manual upgrade. Of course, I still have two posts in my PHP series I need to finish. Smile with tongue out So we’ll see which of these I eventually get back around to first.

Until next time!

Windows Azure & PHP (for nubs)–Part 1 of 4

PHP was my first web development language. I got into web apps just prior to the “Dot Com” boom more than 10 years ago when I was doing “social networking” (we didn’t call it that back then) sites for online games. nfortunately, as a consultant, the skills I get to exercise are often “subject to the demands of the service”. And we don’t really get much call for PHP work these days. I did break those skills back out about a year ago for a project involving a more recent love, Windows Azure for a short yet very sweet reunion. But since then I’ve only gone back to it once or twice for a quick visit.

So when the call for speakers for CodeMash came up, I pitched a session on PHP and WIndows Azure. The topic is a good fit for the conference and I’m really looking forward to it. Fortunately, I have some time between engagements right now so I’m using it to brush up on my PHP+Azure skills (last used on a project almost a year ago). To help ensure that the session is in alignment with the latest developments.

My how things have changed in the last year.

Change in tooling

So when I worked with PHP last year, I relied on the Windows Azure  Tools for Eclipse. It’s still a great toolset that allows for the simple creation and deployment of Windows Azure apps. I loved the level of IDE integration they provided and “built in” support for deployments to the development emulator.

Part of the problem though is that in the last year, it appears that the PHP for Eclipse toolset has lost a bit of steam. Communication isn’t as good as it once was and updates aren’t as frequent. Still a good tool, but it really didn’t seem to be keeping pace with the changes in Azure.

So I ping’d an expert to find out what the latest and greatest was. Turns out things are going command line in a big way with the Windows Azure SDK for PHP. While we do lose the pretty GUI, I can’t say I’m really heart-broken. So lets start by walking through what you need.

Needed Tools

First up, we need to make sure we have the Web Platform Installer because we’re going to use it to snag some of the components we need. The platform installer is nice because it will make sure we have necessary pre-requisites installed and even download them for us if it can.

If you aren’t already a .NET developer, you want to look at start with getting SQL Server Express. Just launch the platform installer and type “SQL server express” into the search box in the top right. Look for “SQL Server Express 2008 R2” and select “install” if its not already.

image

Do this same thing except search for “Azure” and get the “Windows Azure SDK” and “Windows Azure Libraries”. Lastly, search for PHP and get the latest version of PHP for web matrix.

Lastly, we’ll need to download the PHP SDK for Azure and install it manually by unzipping the file to “C:\Program Files\Windows Azure SDK for PHP”.

Now there’s a lot more to this then what I’ve covered here. For additional, more detailed information I would direct to this this link on setting up PHP on Windows and this link on setting up the PHP SDK.

Our first PHP app

imageWith all the bit installed, we want to do a quick test locally to make sure we have PHP installed and running properly. So fire up the Internet Information Services (IIS) manager (just type “IIS” into the Windows 7 search box) and in there, we’re going to drill down to the default web site and add some stuff in. Open up the branches like you see in the picture below and right click on “Default Web Site” and select “Add Virtual Directory…” from the pop-up menu.

I entered in “phpsample” as the Alias of my test web site and set the physical path to a location just below “C:\inetpub\wwwroot” (the default root location for IIS web sites. I then created a new file named “index.php” and placed it into that location. This file had only a single line of code…

<?php  phpinfo(); ?>

Now if you’re not familiar with PHP, this code will give us a dump of all the PHP settings in use by our system. And if we browse to the new web application (you can click on the browse link on the right in IIS Manager, we hopefully get output like this:

image

Next time on our show…

So that’s it for part 1 of this series. Next time (and hopefully later this week). We’ll create a Windows Azure web application and show how to deploy and test it locally. We’ve only scratched the surface here. So stay tuned! But if you can’t wait, check out Brian Swan’s PHP on Windows Azure Learning path post.

Until next time.

Enhanced Visual Studio Publishing (Year of Azure–Week 19)

With the latest 1.6 SDK (ok, now its actually called Azure Authoring Tools), Scott Guthrie’s promise of a better developer publishing experience has landed. Building upon the multiple cloud configuration options that were delivered back in September with the Visual Studio tools update, we have an even richer experience.

Now the first thing you’ll notice is that the publish dialog has changed. The first time you run it, you’ll need to sign in and get things set up.

image

Clicking the “Sign in to download credentials” link will send you to the windows.azure.com website where a publish-settings file will be generated for you to download. Following the instructions, you’ll download the file, then import it into the publishing window shown above. Then you can chose a subscription from the populated drop down and proceed.

A wee bit of warning on this though. If you have access to multiple subscriptions (own are or a co-admin), the creation of a publish-settings file will install the new certificate in each subscription. Additionally, if you click the the “Sign in to download”, you will end up with multiple certs. These aren’t things to be horrified about it, just wanted to make sure I gave a heads up.

Publish Settings

Next up is the publication settings. Here we can select a service to deploy too or create a new one (YEAH!). You can also easily set the environment (production or staging), the build configuration, and the service configuration file to be used. Setting up remote desktop is also as easy as a checkbox.

image

In the end, these settings to get captured into a ‘profile’ that is saved and can then be reused. Upon completion, the cloud service will get a new folder, “Profiles”. In this folder you will find an xml file with the extension azurePubxml that contains the publication settings.

<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup>
    <AzureCredentials>my subscription</AzureCredentials>
    <AzureHostedServiceName>service name</AzureHostedServiceName>
    <AzureHostedServiceLabel>service label</AzureHostedServiceLabel>
    <AzureSlot>Production</AzureSlot>
    <AzureEnableIntelliTrace>False</AzureEnableIntelliTrace>
    <AzureEnableProfiling>False</AzureEnableProfiling>
    <AzureEnableWebDeploy>False</AzureEnableWebDeploy>
    <AzureStorageAccountName>bmspublic</AzureStorageAccountName>
    <AzureStorageAccountLabel>bmspublic</AzureStorageAccountLabel>
    <AzureDeploymentLabel>newsdk</AzureDeploymentLabel>
    <AzureSolutionConfiguration>Release</AzureSolutionConfiguration>
    <AzureServiceConfiguration>Cloud</AzureServiceConfiguration>
    <AzureAppendTimestampToDeploymentLabel>True</AzureAppendTimestampToDeploymentLabel>
    <AzureAllowUpgrade>True</AzureAllowUpgrade>
    <AzureEnableRemoteDesktop>False</AzureEnableRemoteDesktop>
  </PropertyGroup>
</Project>

This file contains a reference to a storage account and when I looked at the account I noticed that there was a new container in there called “vsdeploy”. Now the folder was empty but I’m betting this is where the cspkg was sent to before being deployed and subsequently deleted. I only wish there was an option to leave the package there after deployment. I love having old packages in the cloud to easily reference.

If we go back into the publish settings again (you may have to click “previous” a few times to get back to the “settings” section_ and can select “advanced” you can set some of the other options in this file. Here we can set the storage account to be used as well as enable IntelliTrace and profiling.

The new experience does this using a management certificate that was created for us at the beginning of this process. If you open up the publish settings file we downloaded at the beginning, you’ll find its an XML document with an encoded string representing the management certificate to be used. Hopefully in a future edition, I’ll be able to poke around at these new features a bit more. It appears we may have one of more new API’s at work as well as some new options to help with service management and build automation.

What next?

There’s additional poking around I need to do with these new features. But there’s some great promise here. Out of the box, developers managing one or two accounts are going to see HUGE benefits. For devs in large, highly structured and security restricted shops, they’re more likely to keep to the existing mechanisms or looking at leveraging this to enhance their existing automated processes.

Meanwhile, I’ll keep poking at this a little bit as well as the other new features of this SDK and report back when I have more.

But that will have to wait until next time. Smile

SQL Azure Error Codes (Year of Azure–Week 17)

Ok, I’m no counting last week as a ‘Year of Azure’ post. I could,but I feel it was even too much of a softball for me to bare. Unfortunately, I was even less productive this week in getting a new post out. I started a new client and the first weeks, especially when travelling are horrible for me doing anything except going back to the hotel and sleeping.

However, I have spent time the last few week working over a most difficult question. The challenges of SQL Azure throttling behaviors and error reporting.

Reference Materials

Now, on the surface SQL Azure is a perfect wonderful relational database solution. However, when you begin subjecting it to a significant load, its limitations start becoming apparent. And when this happens, you’ll find you get back various error codes that you have to decode.

Now, I could dive into an hours long discussion regarding architectural approaches for creating scalable SQL Azure data stores. A discussion mind you which would be completely enjoyable, very thought provoking, and for which I’m less well equipped then many folks (databases just aren’t my key focus, I leave those to better…. er…. more interested people *grin*). For a nice video on this, be sure to check out the TechEd 2011 video on the subject

Anyways…

Deciphering the code

So if you read the link on error codes, you’ll find that there’s several steps that need to be decoded. Fortunately for me. While I have been fairly busy, I have access to a resource that wasn’t. One fairly brilliant Andrew Espenes. Now Andrew was kind enough to take on a task for me and look at deciphering the code. And in a show of skill that demonstrates to me I’m becoming far older then I would like to believe,

Anyways, pulled together some code that I wanted to share. Some code that leverages a technique I haven’t used since my college days of developing basic assembly (BAL) code. Yes, I am that old.

So lets fast forward down the extensive link I gave you earlier to the “Decoding Reason Codes” section. And our first stop will actually be adjust the reason code into something usable.  The MSDN article says to apply modulo 4 to the reason code:

ThrottlingMode = (AzureReasonDecoder.ThrottlingMode)(codeValue % 4);

Next determine the resource type (data space, CPU, Worker Threads, etc…):

int resourceCode = (codeValue / 256);

And finally, we’ll want to know the throttling type (hard vs. soft):

int adjustedResourceCode = resourceCode & 0x30000;
adjustedResourceCode = adjustedResourceCode >> 4;
adjustedResourceCode = adjustedResourceCode | resourceCode;
resourceCode = adjustedResourceCode & 0xFFFF;
ResourcesThrottled = (ResourceThrottled)resourceCode;

Next Time

Now I warned you that I was short on time, and while I have some items I’m working on for future updates I do want to spend some time this weekend with family. So I need to hold some of this until next week when I’ll post a class Andrew created for using these values and some samples for leveraging them.

Until next time!

Windows Azure In-place Upgrades (Year of Azure – Week16)

On Wednesday, Windows Azure unveiled yet another substantial service improvement, enhancements to in-place upgrades. Before I dive into these enhancements and why they’re important, I want to talk first about where we came from.

PS – I say “in-place upgrade” because the button on the windows azure portal is labeled “upgrade”. But the latest blog post calls this an “update”. As far as I’m concerned, these are synonymous.

Inside Windows Azure

If you haven’t already, I encourage you to set aside an hour, turn off your phone, email, and yes even twitter so you can watch Mark Russinovich’s “Inside Windows Azure” presentation. Mark does an excellent job of explaining that within the Windows Azure datacenter, we have multiple clusters. When you select an affinity group, this tells the Azure Fabric Controller to try and put all resources aligned with that affinity group into the same cluster. Within a cluster, you have multiple server racks, each with multiple servers, each with in turn multiple cores.

Now these resources are divided up essentially into slots, with each slot being the space necessary for a small size Windows Azure Instance (1 1.68ghz core, and 1.75gb of RAM). When you deploy your service, the Azure Fabric will allocate these slots (1 for a small, 2 for a medium, etc…) and provision a guest virtual machine that allocates those resources. It also sets up the VHD that will be mounted into that VHD for any local storage you’ve requested, and configure firewall and load balancers for any endpoints you’ve defined.

These parameters, the instance size, endpoints, local storage… are what I’ve taken to calling the Windows Azure service signature.

Now if this signature wasn’t changing, you had the option of deploying new bits to your cloud service using the “upgrade” option. This allowed you to take advantage of the upgrade domains to do a rolling update and deploy functional changes to your service. The advantage of the in-place upgrade, was that you didn’t “restart the clock” on your hosts costs (the hourly billing for Azure works like cell phone minutes), and it was also faster since the provisioning of resources was a bit more streamlined. I’ve seen a single developer deploying a simple service eat through a couple hundred compute hours in a day just by deleting and redeploying. So this was an important feature to take advantage of whenever possible.

If we needed to change this service signature, we were forced to either stop/delete/redeploy our services or deploy to another slot (staging or a separate service) and perform either a VIP or DNS swap. With this update, much of these imitations have been removed. This was because in the case of a change in size, you may have to move the instance to a new set of “slots” to get the resources you wanted. For the firewall/load balancer changes, I’m not quite sure what the limitation was. But this was life as we’ve known it in Azure for last (dang, has if really been this long?)… 2+ years now.

What’s new?

With the new enhancements we can basically forget about the service signature. The gloves are officially off! We will need to the 1.5 SDK to take advantage of changes to size, local storage, or endpoints, but that’s a small price to pay. Especially since the management API also already supports these changes.

The downside, is that the Visual Studio tools do not currently take advantage of this feature. However, with Scott “the Gu” Guthrie at the helm of the Azure tools, I expect this won’t be the case for long.

I’d dive more into exactly how to use this new feature, but honestly the team blog has done a great job and and I can’t see myself wanting to add anything (aside from the backstory I already have). So that’s all for this week.

Until next time!