Introduction to the Service Fabric

I’ve spent most of the last 7 years focused on the cloud. During that time I’ve worked with customers of all sizes in a myriad of industries to adopt cloud platforms and build solutions they can then provide to their customers. If I’ve learned nothing else, it’s that continuous innovation is necessary and EVERY cloud solution could be improved.

By now, you’re hopefully familiar with the concept of DevOps. This blending of developer and IT pro responsibilities is at the center of many cloud infrastructures. Unfortunately, most approaches still don’t offer a good way to separate these two fairly disparate viewpoints. They either require the developer to learn too much about infrastructure, or the IT Pro needs to know too much about building deployment scripts. The challenge is to satisfy the unique needs of both audiences, while still allowing them to stay focused on their passions.

What we need is a platform the developer can build applications for that can scale, provide resiliency, and be used to easily deliver performant, stateful, and highly available services. This same platform also needs to be something that can be deployed either in the cloud as a managed service, or set up on-premises using existing compute resources. It also needs to be something that can allow the IT Pro to worry less about the applications/services that are on it, and more about keeping the underlying infrastructure healthy.

To that end, I’m pleased to talk about our newest innovation, Microsoft Azure Service Fabric. But before I dive into exactly what Service Fabric is, let’s talk about the road that’s led us here.

Infrastructure as a Service

A natural evolution from managed infrastructure, it’s easy to argue that IaaS simply built upon the hypervisor technologies we’d already been using for years. The only real difference here was the level of automation taking care of common tasks (such as relocating virtual machines in case of hardware failure). This solution also had the benefit that to a degree, you could stand up your own private IaaS cloud.

The issue here however was that many of the old problems still remained. You had to manage the virtual machines, patching/upgrading them. And if you deployed an application to this type of infrastructure, there was still a lot of work to be done to make it resilient in case of a failure. Especially in scenarios where the application had any type of state information that you didn’t want to lose.

When you add in common tasks like rolling upgrades, application version management, etc… IaaS really starts to look like its predecessor. IaaS did reduce the need to manage hardware, but still didn’t address how we build highly scalable, resilient, solutions.

Containerization

Linux (and you could easily argue that more specifically, Docker) brought us containerization. Containers are deployed into machines (either physical or virtual), as self-contained, mostly isolated components. This allowed for individual services to be quickly and easily combined into complex solutions. The level of automation available pushed the underlying machines further back on the stack. So while the underlying shortcomings of IaaS still exist, the automation allows us to work about it even less.

In addition to the composable nature of container based solutions, this technology also offered the advantage of less bloat. If you used virtual machines to isolate your services, you have the overhead of a full operating system for each instance. Containers are much lighter weight, sacrificing a degree of resource isolation to save resources overall.

Unfortunately, this approach is still about putting components on infrastructure and not really about applications that comprise them. So many of the problems with application lifecycle management (ALM) that we saw with IaaS, still exist. And while there are solutions that can be layered on top of containerization to help manage some of this. But these add even more complexity.

Platform as a Service

PaaS, be it Microsoft Azure Cloud Services, or solutions like Heroku, tried to solve some of this by pushing the underlying OS so far into the background that it nearly vanished. At least until something went wrong. And unfortunately, in the cloud there’s only one absolute, failure is inevitable.

Don’t get me wrong. I still love the promise of platform as a service. It is supposed to give us a place to deploy applications where we can depend on the platform to take care of the common tasks like scalability, failover, rolling upgrades. Unfortunately, in most cases PaaS solutions fell a bit short of their goals. Version management was mostly non-existent and if you wanted things to be truly resilient, you needed to externalize any state information which required a dependency on externalized caches or data stores.

Another key challenge is that PaaS adoption was often done using traditional n-tier architecture patterns. The result was that you would design a system comprised of components for the different layers, then deploy them as individual pieces. While you could scale the number of copies/instances of the solution components based on needed capacity, this pattern still often leads to wasted resources as each instance is often under-utilized.

Enter the Service Fabric

We (Microsoft) watched customers and often times ourselves struggle with these problems. And more importantly, we watched what was being done to overcome them. At the same time, we were also learning from the solutions we were building. Over time, common patterns became visible and the solutions for them were industrialized. It’s out of these learnings that we began to craft the Service Fabric.

The goal of Service Fabric, as the name applies, is to provide a framework that allows services to be easily stitched together into complex solutions. It will manage the services by helping them discover and communicate with each other while also helping them maintain a healthy state. The services could be a web application, a traditional service, or even some executable.

The fabric intends to specifically solve the following challenges:

–          Provide a platform into which services are deployed. Reducing the need to worry about the underlying virtual machine(s)

–          Increase the density of deployed services to decrease latency without increasing complexity

–          Manage deployed services, providing failover, version management, and discoverability

–          Giving services a way to coordinate actions such as data replication to increase resiliency when state needs to be maintained

And while we’re just now announcing this new offering, we’ve already tried the waters with this approach ourselves. Existing Azure Services such as Service Bus Event Hubs, Document DB, and the latest version of Azure SQL Database are being delivered on Service Fabric.

What is the Service Fabric

Service Fabric is a distributed systems platform that allows developers building build services and applications to avoid complex distributed infrastructure problems and focus instead on implementing their workloads and business logic while adding scalability and reliability. It also greatly reduces the burden on application administrators and IT Operators by implementing simple workflows for provisioning, deploying, patching, and monitoring services. Providing first class support for the full application lifecycle of cloud applications for initial deployment to eventual decommissioning.

At the heart of these buzzwords is the concept of a Service Fabric Cluster. A cluster is a collection of servers (physical or virtual) that are stitched together via the Service Fabric software. (did you see what I did there grin) Individually, these servers are referred to as “nodes”. The Service Fabric software keeps track of each node and the applications that have been deployed to the cluster. Its monitoring that allows the fabric to detect when a node fails, and move the instances of an application to another node. Thus providing resiliency for your application.

One of the interesting things about the Service Fabric is that it’s headless.  The Service Fabric software on each node synchronizes with the instances running on other nodes in the cluster. In essence, keeping track of the state of its own node, while also sharing that state with the other nodes. Those nodes in turn share their state back out. This is important because when you want to know anything about the cluster, you can simply query the Service Fabric management API on any node and you’ll get the same answer.

When you deploy an application to the cluster, a notification is sent to one of the nodes in the cluster. This node in turn looks at what it knows of the other nodes, and determines where to place the various parts of the application. Notifications are then in turn sent to the selected nodes so they can take the appropriate actions.

But before we go to deep in this, let’s look at a Service Fabric Application.

Service Fabric Application Model

Service Fabric operates on the notion that applications are composed of services. But the term ‘service’ is used fairly loosely. A service could be an application listening for a request, or a console application that runs in a loop, performing some action. A service is really just a process that you want to run.

Services in turn, are composed of three parts: code, config, and data. Just as you would expect applications and services to be versionable, Service Fabric allows you to version these components. This is extremely powerful because now you can deploy updates of any of these components independently of the others.

The service acts as a containerized, functional unit. It can be developed and revised individually, start/run independently, and scaled in isolation from other services. But most importantly, the services, when acting together, form the higher level applications that are needed for today’s cloud scale solutions.

svcFabAppModel

This separation provides several advantages. First off, we can easily upgrade and scale individual instances of the services as needed. Another key advantage for Service Fabric is that we can deploy as many applications into the cluster as its resources (cpu, memory, etc…) can support. This can even be multiple, different versions of the same application or service. But each is managed separately.

But this isn’t the end of the story. This example was fairly simple because the services in question are not stateful. And a solution doesn’t have to get very complex because the need to store state arises.

Stateless vs Stateful Services

As mentioned earlier, a key failure with most cloud solutions is that they attempt to follow a traditional “n-tier” architecture. The result is that you have clear separation of concerns and externalized data stores. This in turn introduces both complexity and latency since the solution traverses various artificial boundaries such as remote calls to read/write data, or introducing more moving pieces such as a caching layer.

To address this issue, Service Fabric has two types of services: stateless and stateful. A stateless service is entirely self-contained and didn’t do anything besides write some output to the console. But the type of service, a stateful service, has something that stateless services lack. A stateful service, as its name implies can store date and more importantly, leverage the Service Fabric to replicate that data across multiple instances of the service. This speeds up access because we can now durably store information within the fabric cluster and since the data replicated, we don’t sacrifice resiliency. We also simplify the solution by reducing its dependency on external services. Any one of which could impact the availability of our application.

Stateful Service Fabric services (say that 5 times fast), work because they leverage the cluster for discoverability and communication. Each Stateful Service should run at least three instances. The cluster will select one of these copies to be the primary, and the others will be secondary replicas. The instances of the service then leverage the Service Fabric to keep the replicas in quorum. As transactions occur against the primary, they are sent over and applied to each of the replicas.

This works because when you request an endpoint for the service, the fabric will automatically route the request to the elected primary. When changes occur (inserts, updates, deletes), these transactions are then replayed on the secondary replicas. Each replicate will store its current state either in memory, locally on disk, or optionally remotely (although this approach should be used with caution for the reasons we mentioned earlier).

Internally, the stateful service leverages another Service Fabric concept, a distributed collection. Its these collections that actually do the task of storing any data and working with the Service Fabric to ensure replication is performed. The service container provides an activation point for the collection, while simultaneously providing a process that can host any service endpoints that allow for interaction with the collection. What’s important to note here is that the only part of any application in the cluster that can interact with a given collection is its hosting service.

Now some stateful services may operate just fine using a single set of primary/secondary instances. But as demand increases, you may need to scale out the service to allow for even greater throughput. This is where “partitions” come into play. When the service registers the distributed collection with the Service Fabric, it defines the number of partitions and the key ranges for those partitions.

It should be stressed that the partition is a logical construct only. You could have a distributed collection that’s running on a single primary that has 10,000 partitions. That same collection could be spread across 10, 20, or more primary instances. It’s all based on your demand. How this is done is a bit more of an advanced topic, so we’ll leave that for another time.

PaaS for the new generation of cloud solutions

So that is it for this introduction. We’ve barely scratched the service of what this framework can help you accomplish. So we’re hoping you’ll join us for some of the other sessions we have planned that explore various aspects of Service Fabric more deeply.

Thank you, and until next time!

An Anniversary, and a restart

This month my 3rd anniversary at Microsoft (start date was October 15th, 2012). Three years working for a company I believe in, focused on a topic I’m passionate about (cloud), and working with many great partners and fellow geeks along the way. Its been a great experience and one I hope till continue for some time to come. I’ve been able to explore new opportunities and stretch myself a bit. This has sometimes proven successful, sometimes not. But as with all things, I learned a lot.

One thing that has suffered, is this blog. Its been almost 8 months since my last post. Part of this been due to “the demands of the service” and part has been a lack of me feeling I really had anything to share. I was learning, but it was mostly focused on skills I hadn’t previously picked up (JavaScript, AngularJS, DocDB, etc…). In these, I wasn’t sure I really had much to contribute. So I focused on the partners/projects in front of me and let the “evangelism” side of my job slide a bit. I haven’t even been doing much speaking.

This month, I intend to start changing this. To get back to my desire to help others and “spread the word”. All this while I help the partners I’m assigned to and my colleagues. Back on July 1st, I moved to the Commercial ISV team. This means that I have a portfolio of partners I’m focused on (4 large software vendors that are focused on providing solutions that in some way serve local, state, or federal government). I’m also focused on Windows 10 UWP, Office 365, and of course Microsoft Azure. What you can expect to see from me over the course of the next few months are topics like the Azure Resource Monitor, Windows 10 UWP with Cortana and Azure Services integration (really keen to play with the new Azure AD B2C stuff), Windows containers, and Media streaming. IOT may also come up, but despite it being such a key industry focus, isn’t high on my list. If things to really well, I may even have some Hololens stuff to share by next summer. crossingfingers

That’s really all I have to say for the moment. But look for more soon (already working on a posts around DocDB and Azure SQL DB performance analysis w/ Power BI). Meanwhile, look for me on the twitters and let me know if there’s something you’re interested in hearing more about. Otherwise, I’ll be up-skilling and prepping to pass the Azure Architecture exam.

Until next time!

 

 

 

SAS, it’s just another token

Note: Please bear with me, I authored this post in Markdown.🙂

I’ve been trying to finish this post since September of 2014. But I kept getting distracted away from it. Well this lovely (its 5 degrees Fahrenheit here in Minnesota) Saturday morning, it IS my distraction. I’ve been focused the last few weeks on my new love, the Simple Cloud Manager Project, as well as some internal stuff I can’t talk about just yet. Digging into things like Ember, Jekyll, Broccoli, GitHub Pages, git workflows, etc… has been great. But it’s made me keenly aware of how much development has leapfrogged my skills as a Visual Studio/.NET centric cloud architect. With all that learning, I needed to take a moment and get back to something I was a little more comfortable with. Namely, Azure services and specifically Service Bus and Shared Access Signatures (SAS).

I continue to see emails, forum posts, etc… regarding to SAS vs ACS for various scenarios. First off, I’d like to state that both approaches have their merit. But something we all need to come to terms with is that at their heart, both approaches are based around a security token. So as the name of this blog article points out, SAS is just a token.

What is in a SAS token?

For the Azure Service Bus, the token is simply a string that looks like the following

SharedAccessSignature sr=https%3a%2f%2fmynamespace.servicebus.windows.net%2fvendor-&sig=AQGQJjSzXxECxcz%2bbT2rasdfasdfasdfa%2bkBq%2bdJZVabU%3d&se=64953734126&skn=PolicyName

Within this string, you see a set of URL encoded parameters. Let’s break them down a bit…

SharedAccessSignature – used to identify the type of Authorization token being provided. ACS starts with “WRAP”

sr – this is the resource string we’re sharing access to. In the example above, the signature is for anything at or under the path “https://mynamespace.servicebus.windows.net/vendor-“

sig – this is a generated, HMAC-SHA256 hash of the resource string and expiry that was created using a private access key.

se – the expiry date/time for the signature expressed in the number of seconds since epoch (00:00:00 UTC on January 1st, 1970)

skn – the policy/authorization rule who’s key is was used to generate the signature and who’s permissions determine what can be done

The token, or signature, is created by using the resource path (the url that we want to access) and an expiry date/time. A HMAC-SHA256 hash using the key of a specify authorization policy/access rule is then generated off of those parameters. In its own way, using the policy name and its key is not that different then using an identity and password. And like an ACS token, we have an expiry value that helps ensure the token we receive can only be used for a given period of time.

Generating our own Token

So the next logical question is how to generate our own token. If we opt to use .NET and have the ability to leverage the Azure Service Bus SDK, we can pull together a simple console application to do this for us.

Start by creating a Console application, and adding some prompts for a parameter to it so that the main method looks like this…

static void Main(string[] args)
{
Console.WriteLine("What is your service bus namespace?");
string sbNamespace= Console.ReadLine();

Console.WriteLine("What is the path?");
string sbPath = Console.ReadLine();

Console.WriteLine("What existing policy would you like to use to generate your signature?");
string sbPolicy = Console.ReadLine();

Console.WriteLine("What is the policy's secret key?");
string sbKey = Console.ReadLine();

Console.WriteLine("When should this expire (MM/DD/YY HH, GMT)?");
string sbExpiry = Console.ReadLine();

Console.WriteLine("Press any key to exit...");
Console.ReadKey();
}

The first parameter we’re going to capture and save is the namespace we’re wanting to access without the “servicebus.windows.net” part. Next, is the path to the Service Bus Entities we want provide access to. This can be a specific entity such as a queue name or as I mentioned last time, a partial path to grant access to multiple resources. Then we need to provide a named policy (which you can set up via the portal), and one of its secret keys. Finally, you will specify when this signature will need to expire.

Next, we need to transform the expiration time that was entered as a string into a Timespan (how long does the SAS need to ‘stay alive’. We’ll insert this right after we read the expiration value…

// convert the string into a timespan...
DateTime tmpDT;
bool gotDate = DateTime.TryParseExact(sbExpiry, "M/dd/yy HH", enUS, DateTimeStyles.None, out tmpDT);
if (!gotDate)
Console.WriteLine("'{0}' is not in an acceptable format.", sbExpiry);

Now we have all the variables we need to create a signature, so it’s time to generate it. For that, we’ll use a couple classes contained in the .NET Azure Service Bus SDK. We’ll start by adding it to our project using the instructions available on MSDN (so I don’t have to retype them all here).
With the proper references added to the project, we add a simple using clause at the top..

using Microsoft.ServiceBus;

Then add the code that will create the SAS token for us right after the code that created out TimeSpan.

var serviceUri = ServiceBusEnvironment.CreateServiceUri("https", sbNamespace, sbPath).ToString().Trim('/');
string generatedSaS = SharedAccessSignatureTokenProvider.GetSharedAccessSignature(sbPolicy, sbKey, serviceUri, expiry);

And there we have it. A usable SAS token that will automatically expire after a given period of time, or that we can revoke immediately by removing the policy on which its based.

But what about doing this without the SDK?

Lets start by looking at what the Service Bus SDK is doing for us. Fortunately, Sreeram Garlapati has already written some code to generate a signature.

static string CreateSasToken(string uri, string keyName, string key)
{
// Set token lifetime to 20 minutes. When supplying a device with a token, you might want to use a longer expiration time.
DateTime origin = new DateTime(1970, 1, 1, 0, 0, 0, 0);
TimeSpan diff = DateTime.Now.ToUniversalTime() - origin;
uint tokenExpirationTime = Convert.ToUInt32(diff.TotalSeconds) + 20 * 60;

string stringToSign = HttpUtility.UrlEncode(uri) + "n" + tokenExpirationTime;
HMACSHA256 hmac = new HMACSHA256(Encoding.UTF8.GetBytes(key));

string signature = Convert.ToBase64String(hmac.ComputeHash(Encoding.UTF8.GetBytes(stringToSign)));
string token = String.Format(CultureInfo.InvariantCulture, "SharedAccessSignature sr={0}&sig={1}&se={2}&skn={3}",
HttpUtility.UrlEncode(uri), HttpUtility.UrlEncode(signature), tokenExpirationTime, keyName);
return token;
}

This example follows the steps available at this SAS Authentication with Service Bus article. Namely:
– use the time offset from UTC time January 1st, 1970 in seconds to set when the SAS should expire
– create the string to be signed using the URI and the expiry time
– sign that string via HMACSHA256 and the key for the policy we’re basing our signature on
– base64 encode the signature
– create the fully signed URL with the appropriate parameters

With the steps clearly laid out, its just a matter of converting this into the language of your choice. Be it javascript, objective c, php, ruby… it doesn’t really matter as long as you can perform these same steps.

In the future, its my sincere hope that we’ll actually see something in the Azure Service Bus portal that will make this even easier. Perhaps even a callable API that could be leveraged.

But what about “Connection Strings”

This is something I’ve had debates about. If you look at most of the Service Bus examples out there, they all use a connection string. I’m not sure why this is except that it seems simpler because you don’t have to generate the SAS. The reality is that the connection string you get from the portal works much like a SAS, except that is lacks an expiry. The only way to revoke a connection string is by revoking the policy on which its based. This seems fine, until you realize you only get a handful of policies per entity to creating hundreds of policies to be used by customers is a tricky proposition.

So what are you to do when you want to use a SAS, but all the examples use a connection string? Lets start by looking at a connection string example. First with the connection string.

Endpoint=sb:///;SharedAccessKeyName=;SharedAccessKey=

This string has several parameters:

sb – the protocol to be used. In this case its ‘sb’ which is Service Bus shorthand for “use AMQP”.

namespace – the URL we’re trying to access

SharedAccessRuleName – the policy we’re using

SharedAccessKey – the policy’s secret key

The common approach is to put this string into a configuration setting and with the .NET SDK, load it as follows…

EventHubClient client = EventHubClient.Create(ConfigurationManager.AppSettings["eventHubName"]);

or

string connectionString = CloudConfigurationManager.GetSetting("Microsoft.ServiceBus.ConnectionString");
var namespaceManager = NamespaceManager.CreateFromConnectionString(connectionString);

These are common examples you’ll see using the connection string. But what if you want to use the SAS instead… For that, we go up a level to the Service Bus MessagingFactory. Both of the examples above abstract away this factory. But we can still reach back and use it.

We start by creating the URI we want to access, and a SAS token object:

Uri runtimeUri = ServiceBusEnvironment.CreateServiceUri("sb", , string.Empty);
TokenProvider runtimeToken = TokenProvider.CreateSharedAccessSignatureTokenProvider());

Alternatively, we can create the token provider using a policy and its secrete key. But here we want to use a SAS token.

Now its just a matter of using the message factory to create a client object…

QueueClient sendClient = mf.CreateQueueClient(qPath);

Simple as that. A couple quick line changes and we’ve gone from a dependency on connection strings (ick!), to using SAS tokens.

So back to SAS vs ACS

So even with all this in mind, there’s one argument that gets brought up. The SAS tokens expire.

Yes, they do. But so do ACS and nearly all other claims based tokens. The only real difference is that most of the “common” security mechanisms support a way of renewing the tokens. Be this asking the user to log in again, or some underlying bits which store the identity/key and use them to request new tokens when the old is about to expire. The reality is that this same pattern needs to be implemented when you’re doing a SAS token.

Given what I’ve shown you above, you can now stand up your own simple token service that accepts a couple parameters (identity/key), authenticates them, selects the appropriate policy and URL for the provided identity, and then creates and returns the appropriate SAS.

The client then implements the same types of patterns we can already see for things like mobile notification services. Namely to store tokens locally until they’re about to expire. When they are approaching their expiry, reach back out using our credentials and ask for an updated token. Finally, use the token we been provided to perform the required operations.

All that’s really left up to you is to determine the expiry for the token. You can have one set value for all tokens. You may also opt to have tokens for more sensitive operations expire faster. You have that flexibility.

So until next time, enjoy SAS’s. Just please…. don’t be afraid of them.

SimpleFileFetcher for Azure Start up tasks

I know some of you are waiting for more news on my “Simple Cloud Manager” Project. Its coming, I swear! Meanwhile, I had to tackle one fairly common Azure task today and wanted to share my solution with you all.

I’m working on a project where we needed an Azure PaaS start-up task that would download a remote file and unzip it for me. Now there’s several approaches out there of varying degrees of complexity. Powershell, native code, etc… But I wanted something even simpler, a black box that I could pass parameters too that required no additional assemblies/files to work. To that end I sat down and spent about 2 hours crafting the “Simple File Fetcher”.

The usage is fairly simple: simplefilefetcher -targeturi:<where to download the file from> -targetfile:<where to save it to>

You can also optionally specify a parameter that tells it to unzip the file, ‘-unzip’, and another that will make sure downloaded file is deleted, ‘-deleteoriginal’.

I spent most of the 2 hours looking at and trying various options. The final product was < 100 lines of code and now that I know how to do it, would only likely take me 10-20 minutes to rebuild (most of that spend debugging the argument parsing). So instead of boring you all with an explation of the code, I’ll just share it along with a release build of the console app so you can just use it.🙂

Until next time!

Azure Tenant Management

Hey everyone. I’m writing this post from Redmond, WA. I’m in here on the Microsoft campus this week to meet with some of my colleagues and explore options for my first significant open source project effort. We’re here to spend three days trying to find ways to manage the allocation of, and monitor the billable usage of Azure resources. There are two high level objectives for this effort:

  • Allow non-technical users to provision and access Azure resources without the need to understand Azure’s terminology and even granting direct access to the underlying Azure subscription
  • Monitor the billable usage of the allocated resources, and track against a “cap”

On the surface this seems pretty simple. That is, until you realize that not all the Azure services expose individual resource usage, and of the ones that do, they all do it separately. It gets even more complicated when you realize that you may need to do things like have users “share” cloud services and even storage accounts.

Couple examples

So let’s dive into this a little more and explore a couple of the possible use cases I’ve heard from customers.

Journalism Students

Scenario: We have a series of journalism students that will be learning to provision and maintain a online content management system. They need someplace to host their website, upload content, and need GIT integration so they can version any code changes they make to their site’s code. The instructor for the course starts by creating an individual web site for them with a basic CMS system. The student will then access this workspace as they perform their coursework.

Challenges: Depending on the languages they are using, we may be able get by Azure Web Sites. But this only allows us up to 10 “free” web sites, and what happens to the other students. Additionally, students don’t know anything about the different SKUs available and just want things that work, so do we need to provide “warm-up” and auto-scaling? Additionally, since the instructor is setting up the web sites for them, we need a simple way for the instructor to get the resources provisioned, and give the students access to it without the instructor needing to even be aware of the Azure subscriptions. We also need to track compute and bandwidth usage on the individual web site.

Software Testers

Scenario: A small company has remote testers that perform quality assurance work on software. These workers are distributed but need to remote into Windows VMs to run tests. Ideally, these VMs will be hosted “in the cloud”, and the company wants a simple façade whereby the workers can select which software they need to test and then provision a virtual machine for them. The projects being tested should be “billed back” for the resources used by the testers and the testers work on multiple projects. Additionally, the testers should be able to focus on the work they have to do, not how to manage and provision Azure resources.

Challenges: This one will likely be Azure Virtual Machines. But we need to juggle not only compute/bandwidth, but track impact on storage (transactions and total amount stored) as well. We also need to be able to provision VMs from a selection of customer gallery images and get them running for the testers, sometimes across subscriptions. Finally, we need to be aware of challenges with regards to VM endpoints and cloud services if we to maximize the density of these VMs.

Business Intelligence

Scenario: Students are learning to use Oracle databases to analyze trends. The instructor is using the base Oracle database images from the Azure gallery but has added various tools and sample datasets to them for the students to use. The students will use the virtual machines for various labs over the duration of the course and each lab should only take a few hours.

Challenges: If these VMs were kept running 24×7, it would costs thousands of dollars per month per student. So we need to make sure we can automate the start and stop of the VMs to help control these costs. And since the Oracle licensing fees appear as a separate charge, we need to be able to predict these as well based on current rates and the amount of time the VM was active.

So what are we going to do about it?

In short, my plan is to create a framework that we will release via open source to help fill some of these gaps. A simple, web based user interface for accessing your allocated resources, some back end services that monitor resource usage and track that against quotas set by solution administrators. Underneath all that, a system that allows you to “tag” resources as associated with users or specific projects. If all goes well, I hope to have the first version of this framework published and available by the end of March, 2015 that will focus on Azure Web Sites, IaaS VMs, and Azure Storage.

However, and this is the point of my little announcement post, we’re not going to make you wait until this is done. As this project progresses, I plan to regularly post here and in January we’ll hopefully have a GIT repository where you’ll be able to check out the work as we progress. Furthermore, I plan to actively work with organizations that want to use this solution so that our initial version will not be the only one.

So look for more on this in the next couple weeks as we share our learnings and plans. But also, let me know via the comments if this is something you see value in and what scenario you may have. And oh, we still don’t have a name for the framework yet. So please post a comment or tweet @brentcodemonkey with your ideas. J

Until next time!

Azure’s new Event Hub

Over the past few months, I had the good fortune to be accepted to present at ThatConference in Wisconsin and CloudDevelop in Ohio. I count myself even more fortunate because at the time I submitted my session for both these events, it was about a new Azure solution that hadn’t even been announced yet, the Event Hub.

Whenever possible, I like to put demos into a real world context. For this one, I reached out to two colleagues that were presenting at ThatConference and collectively we came up with the idea to do a conference attendee tracking solution. For my part of this series, I was going to cover using Event Hub to ingest event messages from various sources (social media, mobile apps, and proximity sensors) and feeding those into the hub. I also wrote some code so that the other sessions could consume the messages.

Event Hub vs. Topics/Queues

The first question to get out of the way is that Event Hub is NOT just a new variation on Topics/Queues. For this, I’ve found a simple visual example works best.

This is topics/queues

This is Event Hub

 

The key differentiator between the two is scale. A courier can pick up a selection of packages, and ensure they are delivered. But if you need to move hundreds of thousands of packages, you can do that with a lot of couriers, or you could build a distribution center capable of handling that kind of volume more quickly. Event Hub is that distribution center. But it’s been built as a managed service so you don’t have to build your own expensive facility. You can just leverage the one we’ve built for you.

In Service Bus, topics and queues are about the transportation and delivery of a specific payload (the brokered message) from point A to point B. These come with specific features (guaranteed delivery, visibility controls, etc…) that also limit the scale at which a single queue can operate. Service Bus was built to solve the challenges of scaled ingestion of messages, but did so with the trade-off of these types of atomic operations. The easiest way to think of Event Hub is as a giant buffer into which you place messages, and they are automatically retained for a given period of time. You then have the ability to read those messages much as you would read a file stream from disk. You can even rewind all the way back to the beginning of the stream and process everything again.

And as you might expect given the different focus of the two solutions, the programming models are also different. So it’s also important to understand that switching from one to the other isn’t simply a matter of switching the SDK.

What is the Event Hub?

If you think back to Topics/Queues, you had the option of enabling partitions via the EnablePartioning property. This would cause the topic or queue to switch from a single event hub broker (the service side edge compute node), to 16 brokers, increasing the total throughput of the queue by 16 times. We call this approach, partitioning. And this is exactly what Event Hub does.

When you create an Event Hub, you determine the number of partitions that you want (from 16, the default, up to 1024). This allows you to scale out the processes that need to consume events. Partitions are also used to help direct messages. When you send an event to the hub, you can assign a partition key which is in turn hashed by the Event hub brokers so that it lands in a given partition. This hash ensures that as long as the same partition key is used, the events will be placed into the same partition and in turn will be picked up by the same reciever. If you fail to specify a partition, the events will be distributed randomly.

When it comes to throughput, this isn’t the end of the story. Event Hubs also have “throughput units”. By default you start with a single throughput unit that allows 1mb/s in and 2mb/s out through your hubs. You can request this to get scaled up to 1000 throughput units. When you purchase a throughput unit, you do this at the namespace level since it applies to all your event hubs in that namespace.

So what we have is a service that can scale to handle massive ingestion of events, combined with a huge buffer just in case the back end, which also features scalable consumption, can’t keep up with the rate in which messages are being sent. This gives us scale on multiple facets, as a managed, hosted service.

So about that presentation…

So the next obvious question is, “how does it work?” This is where my demos came in. I wanted to show using event hug to consume events from multiple sources: a social media feed, a mobile app used by vendors to scan attendee badges, and proximity sensors scattered around the conference to help measure session attendance.

I started by realizing that when I consume event, I needed to know what type they were (aka how to deserialize them). To make this easy, I started by defining my own customer, .NET message types. I selected twitter for the media feed and for the messages, the type class declaration looks like this:

So we have who tweeted, the text of the message, and when it was created. I decorated the class with various data attributes to aid in serialization.

When a tweet is found, we’ll need a client to send the event…

This creates an EventHubClient object, using a connection string from the configuration file, and a configuration setting that defines the name of the hub I want to send to.

Next up, we need to create my event object, and serialize it.

I opted to use Newtonsoft’s .NET JSON serializer. It was already brought in by the Event Hub nuget package. JSON is lighter weight then XML, and since Event Hub is based on total throughput, not # of messages, it made sense to keep the payloads as small as was convenient.

Finally, I have to actually send the message:

I create an instance of the EventData object using the serialized event, and assign a partition key to it. Furthermore, I also add a custom property to it that my event processors can then use to determine how to deserialize the event. Finally, I call the EventHubClient method, Send, handing my event as a property. The default way for the .NET client to do all this is to use AMQP 1.0 under the covers. But you can also do this via REST from just about any language you want. Here’s an example using Javascript…

This example comes from another part of the presentation’s demo, where I use a web page with imbedded javascript to simulate the vendor mobile device app for scanning attendee badges. In this case, the Partition key is in the URI and is set to “vendor”. While I’m still sending a JSON payload, this one uses a UTF-8 encoding instead of Unicode. Another reason it could be important that we have an event type when we’re consuming events.

Now you’ll recall I explained that the partition key is used to help distribute the events so that we end up with a fairly even distribution amoung the consuming processes. So why would I select to bind each of my examples to a single partition? In my case, I knew that volumes would be low, so there wasn’t much of an issue with overloading my consuming processes. But you can also use this approach if you want to ensure that the same consuming process always gets the events from the same source. Something that can be really handy if the consuming process is using the events to maintain an in-memory state model of some type.

So what about consuming the events?

Events are consumed via “consumer groups”. Each group can track its position within the overall event hub ‘stream’ separately, allowing for parallel consumption of events. A default group is created when the event hub is created, but we can create our own. Consuming processes in turn create receivers, which connect to the various partitions to actually consume the events. This would normally require you to code up some rather complicated logic to ensure that if the process that owns a given set of receivers becomes unavailable, another process can pick up the slack. Fortunately, the event hub team thought of this already and created another nuget package called the EventProcessorHost.

Simply put, this is a handy, .NET based approach to handle resiliency and fault tolerance of event consumers/recievers. It uses Azure Storage blobs to track which receivers are attached to a given partition in an event hub. If you add or remove consuming processes, it will redistribute the receivers accordingly. I used this approach for my presentation to create a simple console app that displays the events coming into the hub. There’s really just three parts of the solution: the program itself, a receiver class, and an event processor class.

The console program is the simplest bit of code…

We use the namespace manager to create a consumer group if the one we want doesn’t already exist. Then we instantiate a Receiver object, and tell that object to start processing events, distributing threads across the various partitions in the event hub. The nuget package comes with its own Receiver class, so there’s not much you really need to do. The core of the receiver is in the MessageProcessingWithPartitionDistribution method.

You’ll note that this may actually be a bit different then the version that arrives with the nuget package. This is because I’ve modified it to use a consumer group name I specify, instead of just the default name. Otherwise, it’s the same example. I get the Azure Storage connection string (where the blobs that will control our leases will go), and then uses that to create an EventProcessorHost object. We then tell the host to start doing asynchronous event processing (via RegisterEventProcessorAsync). This registration, actually points to our third class, which implements the IEventProcessor interface. Again a template is provided as part of the nuget package, so we don’t have to write the entire thing ourselves. But if you look at this ProcessEventsAsync method, we see the heart of it…

What’s happening behinds the scenes is that a thread is being spun up for each partition on the Event Hub. This thread then uses a blob lease to take ownership of a partition, then attached a receiver and begins consuming the events. Each time it pulls events (by default, it will pull 10 at a time), the method I show above is called. This method just loops through the collection of events, and every minute will tell the EventProcessorHost to checkpoint (record were we’re at in the stream) our progress. Inside of the foreach loop is the code that looks at the events, deserializes appropriately, and displays then on the programs console.

You can see we’re checking the events “Type” property, and then deseralizing it back into an object with the proper type of encoding. It’s a simple example, but I think drives the point home.

We can see some of what’s going on under the covers of the processor by looking at the blob storage account we’ve associated with our processor. First up, the EventProcessor creates a container in the storage account that is named the same as our event hub (so if you have multiple hubs with the same name in different namespaces, be sure to use different storage accounts). Within that container is a blob named “evenhub.info” which contains a JSON payload that describes the container and the hub.

{“EventHubPath”:”defaulthub”,”CreatedAt”:”2014-10-16T20:45:16.36″,”PartitionCount”:16}

This shows the location of the hub, when this container/file was created, and the number of partitions in the hub. Getting the number of partitions is why you must use a connection string or SAS for the hub that has manager permissions. Also within this container is one blob (zero indexed) for each partition in the hub. These blobs also contain JSON payloads.

{“PartitionId”:”0″,”Owner”:”singleworker”,”Token”:”642cbd55-6723-47ca-85ee-401bf9b2bbea”,”Epoch”:1,”Offset”:”-1″}

We have the partition this file is for, the owner (aka the EventProcessorHost name we gave this), A token (presumably for the lease), an Epoch (not sure what this is for YET), and an Offset. This last value is the position we’re at in the stream. When you call the CheckPointAsync method of our SimpleEventProcessor, this will update the value of the offset so we don’t read old message again.

Now if we spin up two copies of this application, after a minute or so, you’d see the two change ownership of various partitions. Messages start appearing in both and providing you’re spreading your messages over enough partitions, you’ll even be able to see the partition keys at work as different clients will get messages from specific partitions.

Ok, but what about the presentations?

Now when I started this post, I mentioned that there was a presentation and set of demos to go along with this. I’ve upload both for you to take away and use as you see fit. So enjoy!

Annotated Event Hub PowerPoint Presentation Deck && Event Hub Visual Studio 2013 Demo Solution (contains 3 demos and 5 projects)

Until next time!

Shared Access Signatures with Azure Service Bus

Sorry for the silence folks. Microsoft’s fiscal year end was over June 31st and I started on a new team on July 1st. While July and August are usually great periods for folks like me to get some extra blogging done, I’ve had a few distractions that kept me from writing. Namely learning, learning, learning and trying to find my “voice” on the new team.

Going forward, I’m going to start having more of a focus on this “internet of things” fad that everyone’s talking about. And within that, I’m going to be sticking fairly close to home and working on the services side of things. Even more tightly focused, I’m going to go deeper on “build” vs “buy” scenarios and focus a goodly amount of my available time on one of the Azure product collections I’ve often felt didn’t get enough respect… Service Bus.

So in the coming weeks expect to see blog posts on Event Hub, possibly Hybrid Connections, and for starters, setting a few things straight about Service Bus in general.

SAS vs. ACS

So my first starting point is to call out a ‘stealth update’ that happened recently. As of sometime on/after August 21st 2014, if you create a new Service Bus root namespace via the Azure Management Portal, it will no longer include the associated Access Control Service namespace. This change is in following with recommended guidance the product team has been saying for some time. Namely to use SAS over ACS whenever possible.

Note: You can still generate the associated ACS namespace automatically by using the new-azuresbnamespace powershell cmdlet. Just be aware that a new version of this will be coming soon that will match the default behavior of the management portal. When the new cmdlet is available, you will need to append the “-useAcs true” parameter to the command if you still want to create the ACS namespace.

There are a few reasons for this guidance. The first is that according to the data the team had available to them, many folks doing ACS authentication were using the “owner” credential. This identity had FULL control over the namespace. So using it was akin to having an app use the SA (system administrator) identity for a SQL database. Secondly, ACS requires two calls for the first time operation: one to get the ACS token, one to perform the requested service bus operation. Now the token had a default time to life of 3 hours, but some SDKs didn’t cache the token and as a result all operations against the SB were generating two calls which increases the latency of the operation significantly. As if these weren’t enough, ACS only supports about 40 authentications per second. So if your SDK didn’t cache the token, your possible throughput on a single queue goes from somewhere near 2,000 messages a second down to 40 at best.

Now ACS has some benefits to be sure. In general, folks are much more familiar with username/password models then shared access signatures. You could create identities for specific publishers/consumers (within reason), as well as scope those identities and their permissions to specific paths. Allowing a single identity to send/receive from multiple queues for example. It also had the ACS management namespace with a GUI to help manage things. And to shut down access, all one has to do is revoke the identity and access is cut off.But many of these needs can also be met using Shared Access Signatures if one knows how. And that is what I’d like to start helping you with in this post. J

Shared Access Policies/Rules & Connection Information

Ok, first issue… If you use the management portal, you’ll see the ability to create/manage Shared Access Policies, but in the SDK and API, these are referred to as a SharedAccessAuthorizationRule. For the sake of simplicity, and consistency with Azure storage, I’ll refer to this from now on simply as “policies” (which matches the Azure Storage naming).

In Service Bus terms, a policy (aka SharedAccessAuthorizationRule) is a named set of permissions associated with an entity. The entity can be the Service Bus’ root namespace (the name you gave it when it was created), a queue, a topic, or an event hub. Basically an addressable endpoint that has a name assigned to it. For each entity you can have up to twelve policies and each policy is allowed a mix of the same three permissions: manage, send, and listen. Each policy also has two access keys much like Azure Storage and for the same reason. So you can do key swaps periodically and ensure you always have at least one active, valid key available to your applications when an old one is regenerated.

It’s these policies that are the “Connection Information” you access via the portal and see available as SAS connection strings. And it’s the connection strings that lead me to a bit of an issue I have with how many service bus demos are done.

Service Bus Clients

When you create your first service bus project using the .NET SDK and one of the tutorials, you’ll likely be asked at some point to add code that looks like the following:

// Create EventHubClient
QueueClient client = QueueClient.Create(“vendor-queue2”);

// insert the message body into the request
BrokeredMessage message = new BrokeredMessage(“hello world!”);

// execute the request and get the response 
client.Send(message);

Notice that the sample specifies a queue name that we want a client for, but no credentials. That’s because within the SDK, this method is overloaded to look for an application configuration setting by the name of “Microsoft.ServiceBus.ConnectionString”. The value of this string is the SAS connection string you can get from the portal. It gives the application access to the entity until such time as the policy/rule is removed. In other words, you can re-write this code to look like this:

// Create EventHubClient
QueueClient client = QueueClient.CreateFromConnectionString("Endpoint=sb://<namespace>/;SharedAccessKeyName=<SharedAccessRuleName>;SharedAccessKey=<RuleKey>""vendor-queue2");

// insert the message body into the request
BrokeredMessage message = new BrokeredMessage("hello world!");

// execute the request and get the response
client.Send(message);

By using CreateFromConnectionString in place of the simple Create, we can specify our own connection string. But again, this is permanent access until the policy/rule is removed. It also highlights the issue I have with the way the available samples/demos work I mentioned earlier. I bemoaned the use of the “owner” credential when doing ACS. Lets look at the default connection string that the Service Bus Nuget package puts into the application configuration file:

    <add key="Microsoft.ServiceBus.ConnectionString" value="Endpoint=sb://[your namespace].servicebus.windows.net;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=[your secret]" />

This sample refers to a policy/rule named “RootManageSharedAccessKey”. When you create a new namespace, this default policy has been created for you automatically with permission to listen, send, AND MANAGE the entire namespace. Please! For the love of all that is digital, NEVER… EVER use this credential for anything other than an application that needs to manage all aspects of a given namespace! Instead, configure your own policies with just the permissions that are needed. If you put the default policy into all your apps, we’re right back to the “owner” credential situation we had with ACS.

There’s another issue with this approach. Rules/policies must be associated with a specific service bus entity. This is where SAS comes into play.

Our First SAS

From the root namespace, entities in service bus are accessible by a path that lies directly under that namespace. Together with the root namespace, this can be represented by a path. Like so…

brentsample-ns.servicebus.windows.net/my-queue
brentsample-ns.servicebus.windows.net/my-eventhub
brentsample-ns.servicebus.windows.net/your-topic

Now I could create a policy at the root namespace that has “send” permission. But using it as a connection string would give the sender access to send to everything in the namespace. Alternatively, I could create individual policies/rules on each of these entities. But then I need to juggle all those different connection strings.

If we opt to use a SAS, we have a simpler way to help restrict access, but also make management a touch easier by creating signatures that allow access to a partial path, namely something like entities that begin with “my-“. Unfortunately, the management portal does yet provide the ability to create a SAS. So we either need to write a bit of .NET code, or leverage 3rd party utilities. Fortunately the code is pretty simple. Using Visual Studio, you can create a new Console Application and then add the Nuget package for Azure Service Bus. Then all that remains is to populate some variables and use these two lines of code to generate our signature.

var serviceUri = ServiceBusEnvironment.CreateServiceUri("https", sbNamespace, sbPath).ToString().Trim('/');
string generatedSaS = SharedAccessSignatureTokenProvider.GetSharedAccessSignature(sbPolicy, sbKey, serviceUri, expiry);

The important variables in here are:

sbNamespace – the name of the service bus namespace. Don’t include the “.servicebus.windows.net” stuff. Just the name when we created it.
sbPath – the name, or partial name of the entities we want to access. For this example, let’s pretend its “my-”
sbPolicy – this is the rule/policy that has the permissions we want to the signature to include
sbKey – one of the two secret keys of the rule/policy we’re referencing
expiry – a date/time of when the signature should expire.

If we fill these in, we get a signature that looks something like:

SharedAccessSignature sr=https%3a%2f%2fbmssample-ns.servicebus.windows.net%2fmy-&sig=B9cy8OEuxum2UN5VjsC4JPhbVU7jwJi%2bq20qiaXk24s%3d&se=64953912194&skn=Publish

Now that we have this signature, we want to be able to use it to interact with one of our entities. There’s no “CreateFromSAS” option, but fortunately in the .NET SDK we can use this signature together with a MessagingFactory to create our entity client.

MessagingFactorySettings mfSettings = new MessagingFactorySettings();
mfSettings.TransportType = TransportType.NetMessaging;
mfSettings.TokenProvider = TokenProvider.CreateSharedAccessSignatureTokenProvider("<signature we just created>");
MessagingFactory mf = MessagingFactory.Create("sb://thatconference-ns.servicebus.windows.net", mfSettings);

// Create Client
QueueClient client = mf.CreateQueueClient(queueName);

And in this case, the same signature will work for the queue ‘my-queue’, or the event hub ‘my-eventhub’ (although for the later the TransportType needs to be Amqp). We can even take this same signature, and put into into javascript for use perhaps in a NodeJs application…

var xmlHttpRequest = new XMLHttpRequest();
xmlHttpRequest.open('POST''https://thatconference-ns.servicebus.windows.net/my-eventhub/publishers/vendorA-DeviceIDnnn/messages'true);
xmlHttpRequest.setRequestHeader('Content-Type''application/atom+xml;type=entry;charset=utf-8');
xmlHttpRequest.setRequestHeader('Authorization''<SharedAccessSignature>');

In the case of event hub, were we’ll have various publishers, we can do exactly the same but using a rule/policy from the event hub and generate a signature that goes deeper like “my-eventhub/publishers/vendorA-“.

Policy Expiry and Renewal

So here’s the $50,000 question. With ACS I could (at least to a certain scale), have a single identity for each client and remove them at any time. With SAS, I can remove a signature by removing its underlying policy at any time. But since I’m limited to twelve policies. How to I recreate the ability to revoke on demand. Simply put, you don’t.

Now before you throw your hands up and state with exasperation that this approach “simply won’t work”, I do ask you to take a long hard look at this requirement. In my experience, I find that the vast majority of the time, allowing someone to publish to an entity is a matter of trust. You’ve entered into an agreement with them and aren’t likely to revoke that access on a whim. The nature of this trust is rarely volatile/short term in nature. If it’s a customer, they are usually engaged for the term of their subscription (monthly or annual are common). So you know when your trust will expire and need to be renewed.

In this situation, were are planning for an eventuality that rarely comes to pass. And one that has an alternative that requires just a small amount of work, implementing your own “credential broker”.

If we look back at what the ACS did, you present it with credentials, and it issued you a token. You then used that token to access the appropriate service bus resources. That’s exactly what our credential broker would do. Periodically, the requesting client would call to a service you are hosting and present some credentials (username/pass, certificate, PSK, etc…). Your broker service validates this request (and likely logs it), then issues back a SAS ‘token’ with an appropriate expiry. The client then (hopefully) caches the SAS ‘token’, and uses it on subsequent requests until it expires and then goes back to the broker to get a new SAS ‘token’.

Now if this isn’t  enough, we still have the ability to remove/disable the queue (or event hub). So in a way we get the best of both worlds. We have automatic expiry of tokens to ensure “key rotation” while also having the ability revoke access immediately.

This is just one possible pattern. So instead of offering up alternatives, I’d love to hear from any of you via the comments on the patterns you have used to help manage shared access signatures.

Expiration Reached

In the coming weeks/months I’m going to generate a series of posts that will dive into various Service Bus related topics more deeply. If there’s something specific you’d like to see, please let me know. Otherwise you can look forward to posts on access service bus from other languages/sdks, programmatic management of namespaces/rules, and resilient architectures around Service Bus.

I hope this article has helped clear up some of the fog around the Azure Service Bus. So until next time!

Follow

Get every new post delivered to your Inbox.

Join 1,356 other followers