October 19, 2015 4 Comments
I’ve spent most of the last 7 years focused on the cloud. During that time I’ve worked with customers of all sizes in a myriad of industries to adopt cloud platforms and build solutions they can then provide to their customers. If I’ve learned nothing else, it’s that continuous innovation is necessary and EVERY cloud solution could be improved.
By now, you’re hopefully familiar with the concept of DevOps. This blending of developer and IT pro responsibilities is at the center of many cloud infrastructures. Unfortunately, most approaches still don’t offer a good way to separate these two fairly disparate viewpoints. They either require the developer to learn too much about infrastructure, or the IT Pro needs to know too much about building deployment scripts. The challenge is to satisfy the unique needs of both audiences, while still allowing them to stay focused on their passions.
What we need is a platform the developer can build applications for that can scale, provide resiliency, and be used to easily deliver performant, stateful, and highly available services. This same platform also needs to be something that can be deployed either in the cloud as a managed service, or set up on-premises using existing compute resources. It also needs to be something that can allow the IT Pro to worry less about the applications/services that are on it, and more about keeping the underlying infrastructure healthy.
To that end, I’m pleased to talk about our newest innovation, Microsoft Azure Service Fabric. But before I dive into exactly what Service Fabric is, let’s talk about the road that’s led us here.
Infrastructure as a Service
A natural evolution from managed infrastructure, it’s easy to argue that IaaS simply built upon the hypervisor technologies we’d already been using for years. The only real difference here was the level of automation taking care of common tasks (such as relocating virtual machines in case of hardware failure). This solution also had the benefit that to a degree, you could stand up your own private IaaS cloud.
The issue here however was that many of the old problems still remained. You had to manage the virtual machines, patching/upgrading them. And if you deployed an application to this type of infrastructure, there was still a lot of work to be done to make it resilient in case of a failure. Especially in scenarios where the application had any type of state information that you didn’t want to lose.
When you add in common tasks like rolling upgrades, application version management, etc… IaaS really starts to look like its predecessor. IaaS did reduce the need to manage hardware, but still didn’t address how we build highly scalable, resilient, solutions.
Linux (and you could easily argue that more specifically, Docker) brought us containerization. Containers are deployed into machines (either physical or virtual), as self-contained, mostly isolated components. This allowed for individual services to be quickly and easily combined into complex solutions. The level of automation available pushed the underlying machines further back on the stack. So while the underlying shortcomings of IaaS still exist, the automation allows us to work about it even less.
In addition to the composable nature of container based solutions, this technology also offered the advantage of less bloat. If you used virtual machines to isolate your services, you have the overhead of a full operating system for each instance. Containers are much lighter weight, sacrificing a degree of resource isolation to save resources overall.
Unfortunately, this approach is still about putting components on infrastructure and not really about applications that comprise them. So many of the problems with application lifecycle management (ALM) that we saw with IaaS, still exist. And while there are solutions that can be layered on top of containerization to help manage some of this. But these add even more complexity.
Platform as a Service
PaaS, be it Microsoft Azure Cloud Services, or solutions like Heroku, tried to solve some of this by pushing the underlying OS so far into the background that it nearly vanished. At least until something went wrong. And unfortunately, in the cloud there’s only one absolute, failure is inevitable.
Don’t get me wrong. I still love the promise of platform as a service. It is supposed to give us a place to deploy applications where we can depend on the platform to take care of the common tasks like scalability, failover, rolling upgrades. Unfortunately, in most cases PaaS solutions fell a bit short of their goals. Version management was mostly non-existent and if you wanted things to be truly resilient, you needed to externalize any state information which required a dependency on externalized caches or data stores.
Another key challenge is that PaaS adoption was often done using traditional n-tier architecture patterns. The result was that you would design a system comprised of components for the different layers, then deploy them as individual pieces. While you could scale the number of copies/instances of the solution components based on needed capacity, this pattern still often leads to wasted resources as each instance is often under-utilized.
Enter the Service Fabric
We (Microsoft) watched customers and often times ourselves struggle with these problems. And more importantly, we watched what was being done to overcome them. At the same time, we were also learning from the solutions we were building. Over time, common patterns became visible and the solutions for them were industrialized. It’s out of these learnings that we began to craft the Service Fabric.
The goal of Service Fabric, as the name applies, is to provide a framework that allows services to be easily stitched together into complex solutions. It will manage the services by helping them discover and communicate with each other while also helping them maintain a healthy state. The services could be a web application, a traditional service, or even some executable.
The fabric intends to specifically solve the following challenges:
– Provide a platform into which services are deployed. Reducing the need to worry about the underlying virtual machine(s)
– Increase the density of deployed services to decrease latency without increasing complexity
– Manage deployed services, providing failover, version management, and discoverability
– Giving services a way to coordinate actions such as data replication to increase resiliency when state needs to be maintained
And while we’re just now announcing this new offering, we’ve already tried the waters with this approach ourselves. Existing Azure Services such as Service Bus Event Hubs, Document DB, and the latest version of Azure SQL Database are being delivered on Service Fabric.
What is the Service Fabric
Service Fabric is a distributed systems platform that allows developers building build services and applications to avoid complex distributed infrastructure problems and focus instead on implementing their workloads and business logic while adding scalability and reliability. It also greatly reduces the burden on application administrators and IT Operators by implementing simple workflows for provisioning, deploying, patching, and monitoring services. Providing first class support for the full application lifecycle of cloud applications for initial deployment to eventual decommissioning.
At the heart of these buzzwords is the concept of a Service Fabric Cluster. A cluster is a collection of servers (physical or virtual) that are stitched together via the Service Fabric software. (did you see what I did there grin) Individually, these servers are referred to as “nodes”. The Service Fabric software keeps track of each node and the applications that have been deployed to the cluster. Its monitoring that allows the fabric to detect when a node fails, and move the instances of an application to another node. Thus providing resiliency for your application.
One of the interesting things about the Service Fabric is that it’s headless. The Service Fabric software on each node synchronizes with the instances running on other nodes in the cluster. In essence, keeping track of the state of its own node, while also sharing that state with the other nodes. Those nodes in turn share their state back out. This is important because when you want to know anything about the cluster, you can simply query the Service Fabric management API on any node and you’ll get the same answer.
When you deploy an application to the cluster, a notification is sent to one of the nodes in the cluster. This node in turn looks at what it knows of the other nodes, and determines where to place the various parts of the application. Notifications are then in turn sent to the selected nodes so they can take the appropriate actions.
But before we go to deep in this, let’s look at a Service Fabric Application.
Service Fabric Application Model
Service Fabric operates on the notion that applications are composed of services. But the term ‘service’ is used fairly loosely. A service could be an application listening for a request, or a console application that runs in a loop, performing some action. A service is really just a process that you want to run.
Services in turn, are composed of three parts: code, config, and data. Just as you would expect applications and services to be versionable, Service Fabric allows you to version these components. This is extremely powerful because now you can deploy updates of any of these components independently of the others.
The service acts as a containerized, functional unit. It can be developed and revised individually, start/run independently, and scaled in isolation from other services. But most importantly, the services, when acting together, form the higher level applications that are needed for today’s cloud scale solutions.
This separation provides several advantages. First off, we can easily upgrade and scale individual instances of the services as needed. Another key advantage for Service Fabric is that we can deploy as many applications into the cluster as its resources (cpu, memory, etc…) can support. This can even be multiple, different versions of the same application or service. But each is managed separately.
But this isn’t the end of the story. This example was fairly simple because the services in question are not stateful. And a solution doesn’t have to get very complex because the need to store state arises.
Stateless vs Stateful Services
As mentioned earlier, a key failure with most cloud solutions is that they attempt to follow a traditional “n-tier” architecture. The result is that you have clear separation of concerns and externalized data stores. This in turn introduces both complexity and latency since the solution traverses various artificial boundaries such as remote calls to read/write data, or introducing more moving pieces such as a caching layer.
To address this issue, Service Fabric has two types of services: stateless and stateful. A stateless service is entirely self-contained and didn’t do anything besides write some output to the console. But the type of service, a stateful service, has something that stateless services lack. A stateful service, as its name implies can store date and more importantly, leverage the Service Fabric to replicate that data across multiple instances of the service. This speeds up access because we can now durably store information within the fabric cluster and since the data replicated, we don’t sacrifice resiliency. We also simplify the solution by reducing its dependency on external services. Any one of which could impact the availability of our application.
Stateful Service Fabric services (say that 5 times fast), work because they leverage the cluster for discoverability and communication. Each Stateful Service should run at least three instances. The cluster will select one of these copies to be the primary, and the others will be secondary replicas. The instances of the service then leverage the Service Fabric to keep the replicas in quorum. As transactions occur against the primary, they are sent over and applied to each of the replicas.
This works because when you request an endpoint for the service, the fabric will automatically route the request to the elected primary. When changes occur (inserts, updates, deletes), these transactions are then replayed on the secondary replicas. Each replicate will store its current state either in memory, locally on disk, or optionally remotely (although this approach should be used with caution for the reasons we mentioned earlier).
Internally, the stateful service leverages another Service Fabric concept, a distributed collection. Its these collections that actually do the task of storing any data and working with the Service Fabric to ensure replication is performed. The service container provides an activation point for the collection, while simultaneously providing a process that can host any service endpoints that allow for interaction with the collection. What’s important to note here is that the only part of any application in the cluster that can interact with a given collection is its hosting service.
Now some stateful services may operate just fine using a single set of primary/secondary instances. But as demand increases, you may need to scale out the service to allow for even greater throughput. This is where “partitions” come into play. When the service registers the distributed collection with the Service Fabric, it defines the number of partitions and the key ranges for those partitions.
It should be stressed that the partition is a logical construct only. You could have a distributed collection that’s running on a single primary that has 10,000 partitions. That same collection could be spread across 10, 20, or more primary instances. It’s all based on your demand. How this is done is a bit more of an advanced topic, so we’ll leave that for another time.
PaaS for the new generation of cloud solutions
So that is it for this introduction. We’ve barely scratched the service of what this framework can help you accomplish. So we’re hoping you’ll join us for some of the other sessions we have planned that explore various aspects of Service Fabric more deeply.
Thank you, and until next time!