IaaS – The changing face of Windows Azure

I need to preface this post by saying I should not be considered, by any stretch of imagination, a “network guy”. I know JUST enough to plug in an Ethernet cable, not to fall for the old “the token fell out of the network ring” gag, and know how to tracert connectivity issues. Thanks mainly to my past overindulgence in online role playing games.

In June of 2012, we announced that we would be adding Infrastructure as a Service (IaaS) features to the Windows Azure Platform. While many believe that Platform as a Service (PaaS) is still the ultimate “sweet spot” with regards to cost/benefit ratios, the reality is that PaaS adoption is… well… challenging. After 25+ years of buying, installing, configuring, and maintaining hardware, nearly everyone in the IT industry tends to think of terms of servers, both physical and virtual. So the idea of having applications and data float around within a datacenter and not tied to specific locations is just alien for many. This created a barrier to the adoption of PaaS, a barrier that we are hoping our IaaS services will help bridge (not sure about “bridging barriers” as a metaphor since I always visualize barriers as those concrete fence things on the side of highway construction sites for but we’ll just go with it).

Unfortunately, there’s still a lot of confusion about what our IaaS solution is and how to work with this. Over the last few months, I’ve run into this several times with partners so I wanted to pull together some of my learnings into a single blog post. As much for my own personal reference as for me to be able to easily share it with all of you.

Some terminology

So I’d like to start by explaining a few terms as they are used within the Windows Azure Platform…

Cloud Service – This is a collection of virtual machines (either PaaS role instances or IaaS virtual machines) representing an isolation boundary that contains computational workloads. A Cloud Service can contain either PaaS compute instances, or IaaS Virtual Machines, but not both. (UPDATE 4/16/2013: A IaaS VM hosting Cloud Service will only appear in the cloud service tab of the management portal after at second VM has been added to it. Once visible, it will remain so until it is deleted).

Availability Set – For PaaS solutions, the Windows Azure Fabric already knows to distribute the same workload across different physical hardware within the datacenter. But for IaaS, I need to tell it to do this with the specific virtual machines I’m creating. We do this by placing the virtual machines into an availability set.

Virtual Network – Because addressability to the PaaS or IaaS instances within Cloud Services is limited to only those ports that you declare (by configuring endpoints), it’s sometimes helpful to have a way to create bridges between those boundaries or even between them and on-premises networks. This is where Windows Azure Virtual Networks come into play.

The reason these items are important is that in Windows Azure you’re going to use them to define your solution. Each piece represents a way to group, or arrange resources and how they can be addressed.

You control the infrastructure, mostly…

Platform as a Service, or PaaS, handles a lot for you (no surprise as that’s part of the value proposition). But in Infrastructure as a Service, IaaS, you take on some of that responsibility. The problem is that we are used to taking care of traditional datacenter deployments and either a) don’t understand what IaaS still does for us and b) just aren’t sure how this cloud stuff is supposed to be used. So we, through no fault of our own try to do things the way we always have. And who could really blame us?

So let’s start with what Windows Azure IaaS still does for you. It obviously handles the physical hardware and hypervisor management. This includes provisioning the locations for our Virtual Machines, getting them deployed, and of course moving them around the data center in the case of a hardware failure or host OS (the server that’s hosting our virtual machine) upgrades. The Azure Fabric, our secret sauce as it were, also controls basic datacenter firewall configuration (what ports are exposed to the internet), load balancing, and addressability/visibility isolation (that Cloud Service thing I was talking about). This covers everything right up to the virtual machine itself. But that’s not where it stops. To help secure Windows Azure, we control how all the virtual machines talk to our network. This means that the Azure Fabric also has control of the virtual NIC that is installed into your VM’s!

Now the reason this is important is that there are some things you’d normally try to do if you were creating a network in a traditional datacenter. Like possibly providing fixed IP’s to the servers so you can easily do name resolution. Fixed IPs in a cloud environment is generally a bad idea. Especially so if that cloud is built on the concept of having the flexibility to move stuff around the datacenter for you if it needs too. And if this happens in Windows Azure, it’s pretty much assured that the virtual NIC will get torn down and rebuilt and in the process lose any customizations you made to it. This is also a frequent cause for folks losing the ability to connect to their VMs (something that’s usually fixable by re-sizing/kicking the VM via the management portal). It also highlights one key, but not often thought of feature that Windows Azure provides for you, server name resolution.

Virtual Machine Name Resolution

The link I just dropped does a pretty good job of explaining what’s available to you with Windows Azure. You can either let Windows Azure do it for you and leverage the names you provided for the virtual machines when you created them, or you can use Virtual Networking to bring your own DNS. Both work well, so it’s really a matter of selecting the right option. The primary constraint is that the Windows Azure provided name resolution will only work for virtual machines (be they IaaS machines or PaaS role instances) hosted in Windows Azure. If you need to provide name resolution between cloud and on-premises, you’re going to want to likely use your own DNS server.

The key here again is to not hardcode IP address. Pick the appropriate solution and let it do the work for you.

Load Balanced Servers

The next big task is how to load balance virtual machines in IaaS. For the most part, this isn’t really any different than how you’d do it for PaaS Cloud Services, create the VM, and “attach” it to an existing Virtual Machine (this places both virtual machines within the same cloud service). Then, as long as both machines are watching the same ports, the traffic will be balanced between the two by the Windows Azure Fabric.

If you’re using the portal to create the VM, you’ll need to make sure you use the “create from gallery” option and not quick create. Then as you progress through the wizard, you’ll hit the step where it asks you if you want to join the new virtual machine to an existing virtual machine or leave it as standalone.

Now once they are both part of the same cloud service, we simply edit the available endpoints. In the management portal, you’ll select a Virtual Machine, and either add or edit the endpoint using the tools menu across the bottom. Then you set the endpoint attributes manually (if it’s a new endpoint that’s not already load balanced), or choose to load balance it with a previously defined endpoint. Easy-peasy. J

High Availability

Now that we have load balanced endpoints, the next step is to make sure that if one of our load balanced virtual machines goes offline (say a host OS upgrade or hardware failure), that the service doesn’t become entirely unavailable. In Windows Azure Cloud Services, the Fabric would automatically distribute the running instances across multiple fault domains. To put it simply, fault domains try to help ensure that workloads are spread across multiple pieces of hardware, this way if there is a hardware failure on a ‘rack’, it won’t take down both machines. When working with IaaS, we still have this as an option but we need to tell the Azure Fabric that we want to take advantage of this by placing our virtual machines into an Availability Set so the Azure Fabric knows it should distribute them.

You configure a virtual machine that’s already deployed to join it to an Availability Set, or we can assign a new one to a set when we create/deploy it (providing we’re not using Quick Create which you hopefully aren’t anyways because you can’t place a quick create VM into an existing cloud service). Both options work equally well and we can create multiple Availability Sets within a Cloud Service.

Virtual Networks

So you might ask, this is all find and dandy if the virtual machines are deployed as part of a single cloud service. But I can’t combine PaaS and IaaS into a single cloud service, and I also can’t do direct machine addressing if the machine I’m connecting to exists in another cloud service, or even on-premises. So how do I fix that? The answer is Windows Azure Virtual Networks.

In Windows Azure, the Cloud Service is an isolation boundary, fronted by a gatekeeper layer that serves as a combination load balancer and NAT. The items inside the cloud service can address each other directly and any communication that comes in from outside of the cloud service boundary has to come through the gatekeeper. Think of the cloud service as a private network branch. This is good because it provides a certain level of network security, but bad in that we now have challenges if we’re trying to communication across the boundary.

Virtual Network allows you to join resources across cloud service boundaries, or by leveraging an on-premises VPN gateway to join cloud services and on-premises services. Acting as a bridge across the isolation boundaries and enabling direct addressability (providing there’s appropriate domain resolution) without the need to publically expose the individual servers/instances to the internet.

Bringing it all together

So if we bring this all together, we now have a way to create complex solutions that mix and match different compute resources (we cannot currently join things like Service Bus, Azure Storage, etc… via Virtual Network). One such example might be the following diagram…

A single Windows Azure Virtual Network that combines an on-premises server, a PaaS Cloud Service, and both singular and load balanced virtual machines. Now I can’t really speculate on where this could go next, but I think we have a fairly solid starting point for some exciting scenarios. And if we do for IaaS what we’ve done for the PaaS offering over the last few years… continuing to improve the tooling, expanding the feature set, and generally just make things more awesome, I think there’s a very bright future here.

But enough chest thumping/flag waving. Like many topics here, I created this to help me better understand these capabilities and hopefully some of you may benefit from it as well. If not, I’ll at least share with you a few links I found handy:

Mike Washam – Windows Azure Virtual Machines

MSDN – Windows Azure Name Resolution

WindowsAzure.com – Load Balancing Virtual Machines

WindowsAzure.com – Manage the Availability of Virtual Machines

Until next time!

Windows Azure Web Sites – Quotas, Scaling, and Pricing

It hasn’t been easy making the transition from a consultant to someone that for lack of a better explanation is a cross between pre-sales and technical support. But I’ve come to love two aspects of this job. First off, I get to talk to many different people and I’m constantly learning as much from their questions as I’m helping teach them about the platform. Secondly, when not talking with partners about the platform, I’m digging up answers to questions. This gives me the perfect excuse… er… reason to dig into some of the features and learn more about them. I had to do this as a consultant, but the issue there is that since I’d be asked to do this by paying clients, they would own the results. But now I do this work on behalf of Microsoft, it’s much easier to share these findings with the community (providing it doesn’t violate non-disclosure agreements of course). And since this blog has always been a way for me to document things so I can refer back to them, it’s a perfect opportunity to start sharing this research.

Today’s topic is Windows Azure Web Sites quotas and pricing. Currently we (Microsoft) doesn’t IMHO do a very good job of making this information real clear. Some of it is available over on the pricing page, but for the rest you’ve got to dig it out of blog posts or from the Web Site dashboard’s usage overview details in the management portal. So I decided it was time to consolidate a few things.

Usage Quotas

A key aspect of the use of any service is to understand the limits. And nowhere is this truer then the often complex/confusing world of cloud computing services. But when someone slaps a “free” in front of a service, we tend to forget this. Well here I am to remind you. Windows Azure Web Sites has several dials that we need to be aware of when selecting the level/scale of Windows Azure Web Sites (Free, Shared, and Reserved).

File System/Storage: This is the total amount of space you have to store your site and content. There’s no timeframe on this one. If you reach the quota limit, you simply can’t write any new content to the system.

Egress Bandwidth: This is the amount of content that is served up by your web site. If you exceed this quota, your site will be temporarily suspended (no further requests) until the quota timeframe (1 day) resets.

CPU Time: This is the amount of time that is spent processing requests for your web site. Like the bandwidth quota, if you exceed the quota, your site will be temporarily suspended until the quota timeframe resets. There are two quota timeframes, a 5 minute limit, and a daily limit.

Memory: is the amount of RAM that the site can use at one shot (there’s no timeframe). If you exceed the quota, a long running or abusive process will be terminated. And if this occurs often enough, your site may be suspended. Which is pretty good encouragement to rethink that process.

Database: There’s also up to 20mb for database support for your related database (MySQL or Windows Azure SQL Database currently). I can’t find any details but I’m hoping/guessing this will work much like the File Storage quota.

Now for the real meat of this. What are the quotas for each tier? For that I’ve created the following table.

Quota Resource

Free Tier

Shared Tier

(per web site)

Reserved Tier

(up to 100 sites)

File Storage 1024mb for all sites 1024mb 10gb
Egrees Bandwidth 165mb/day per datacenter, 5gb per region Pay as you go, not included in base price Pay as you go, not included in base price
CPU Time 1hr/day, 2.5 minutes of every 5 4hrs/day, 2.5 minutes of every 5 N/A
Memory 1024mb/hr 512mb/hr N/A
Database 20mb 20mb N/A

Now there’s an important but slightly confusing “but” to the free tier. At that level, you get a daily limit egress bandwidth quota per sub-region (aka datacenter), but there’s also a regional (US, EU, Asia) limit (5GB). The regional limit is the sum total off all web sites you’re hosting that is shared with any other services. So if you’re also using Blob storage to serve up images from your site that will count against your “free” 5 GB. But when you move to the shared/reserved tier, there’s no limit, but you pay for every gigabyte that leaves the datacenter.

Monitoring Usage

Now the next logical question is how you monitor the resources your sites are using. Fortunately, the most recent update to Windows Azure portal has a dashboard that provides a quick glance as how much you’re using of each quota. This displays just below usage grid on the “Dashboard” panel of the web site.

At a glance you can tell where you on any quotas which also makes it convenient for you to predict your usage. Run some common scenarios and see what they do to your numbers and extrapolate from there.

You can also configure the site for diagnostics (again via the management portal). This allows you to take the various performance indicators and save them to Windows Azure Storage. From there you can download the files and set up automated monitors to alert you to problems. Just keep in mind that turning this on will consume resources and incur additional charges.

Fortunately, there’s a pretty good article we’ve published on Monitoring Windows Azure Web Sites.

Scaling & Pricing

Now that we’ve covered your usage quotas and how to monitor your usage, it’s important to understand how we can scale the capacity of our web sites and the impact this has on pricing.

Scaling our web site is pretty straight forward. We go can go from the Free Tier, to Shared, to Reserved using the management portal. Select the web site, click on the level, and then save to “scale” your site. But before you do that, you will want to understand the pricing impacts.

At the Free tier, we get up to 10 web sites. When we move a web site to shared, we will pay $0.02 per hour for each web site (at general availability). Now that this point, I can mix and match free (10 per sub-region/datacenter) and shared (100 per sub-region/datacenter) web sites. But things get a bit trickier when we move to reserved. A reserved web site is a dedicated virtual machine for your needs. When you move a web site within a region to the reserved tier, all web sites in that same sub-region/datacenter (up to the limit of 100) will also be moved to reserved.

Now this might seem a bit confusing until you realize that at the reserved tier, you’re paying for the virtual machine and not an individual web site. So it makes sense to have all your sites hosted on that instance, maximizing your investment. Furthermore, if you are running enough shared tier web sites, it may be more cost effective to run them as reserved.

Back to scaling, if you scale back down to the free or shared tiers, the other sites will revert back to their old states. For example, let’s assume you have two web sites one at the free tier, one at the shared tier. I scale the free web site up to reserved and now both sites are reserved. If I scale the original free tier site back to free, the other site returns to shared. If I opted to scale the original shared site back to shared or free, then the original free site returns to its previous free tier. So it’s important when dealing with reserved sites that you remember what tier they were at previously.

The tiers are not our only option for scaling our web sites. We also have a slider labelled instance count if we are running a Shared or Reserved site. When running at the shared tier, this slider will change the number of processing threads that are servicing the web site allowing us between 1 and 6 threads. Of course, it we increase the threads, there’s a greater risk of hitting our cpu usage quota. But this adjustment could come in real handy if we’re experiencing a short term spike in traffic. Running at the reserved tier, the slider increases the number of virtual machine instances we (and subsequently our cost). This option allows us to run up to 10 reserved instances.

Also at the reserved tier, we can increase the size of our virtual machine. By default, our reserved instance will be a “small” giving us a single cpu core and 1.75 GB of memory at a cost of $0.12/hr. We can increase the size to “Medium” and even “Large” with each size increase doubling our resources and the price per hour ($0.24 and $0.48 respectively). This cost will be per virtual machine instance, so if I have opted to run 3 instances, take my cost per hour for the size and multiple it by 3.

So what’s next?

This pretty much hits the limits of what we can do with scaling web sites. But fortunately we’re running on a platform that’s built for scale. So it’s just a hop, skip, and jump from Web Sites to Windows Azure Cloud Services (Platform as a Service) or Windows Azure Virtual Machines (Infrastructure as a service). But that’s an article for another day. J

BUILD 2012 – Not just for Windows anymore

Last week marked the second BUILD conference. In 2011, BUILD replaced the Microsoft PDC conference in an event that was so heavily Windows 8 focused that it was even host at buildwindows.com. While the URL didn’t change for 2012, the focus sure did as this event also marked the latest round of big release news for Windows Azure. In this post (which I’m publishing directly from MS Word 2013 btw), I’m going to give a quick rundown of the Windows Azure related announcements. Think of this as your Cliff Notes version of the conference.

Windows Azure Service Bus for Windows Server – V1 Released

Previously released as a beta/preview back in June, this on-premise flavor of the Windows Azure Service bus is now fully released and available for download. Admittedly, it’s strictly for brokered messaging for now. But it’s still a substantial step towards providing feature parity between public and private cloud solutions. Now we just need to hope that shops that opt to run this will run it as internal SaaS and not set up multiple silos. Don’t get me wrong. It’s nice to know we have the flexibility to do silos, but I’m hoping we learn from what we’ve seen in the public cloud and don’t fall back to old patterns.

One thing to keep in mind with this… It’s now possible for multiple versions of the Service Bus API to be running within an organization. To date, the public service has only had two major API versions. But going forward, we may need to be able to juggle even more. And while there will be a push to keep the hosted and on-premises versions at similar versions, there’s nothing requiring someone hosting it on-premises to always upgrade to the latest version. So as solution developers/architects, we’ll want to be prepared for be accommodating here.

Windows Azure Mobile Services – Windows Phone 8 Support

With Windows Phone 8 being formally launched the day before the BUILD conference, it only makes sense that we’d seen related announcements. And a key one of those was the addition of Windows Phone 8 support to Windows Azure Mobile Services. This announcement makes Windows Phone 8, the 3rd supported platform (Windows Store & iOS apps) for Mobile Services. This added to an announcement earlier in the month which expanded support for items like sending email, and different identity providers. So the Mobile Services team is definitely burning the midnight oil to get new features out to this great platform.

New Windows Azure Storage Scalability Targets

New scale targets have been announced for storage accounts created after June 7th 2012. This change has been enabled by the new “flat network” topology that’s being deployed into the Windows Azure Datacenters. In a nutshell, it allows the tps scale targets to be increased by 4x and the upper limit of a storage account to be raised to 200tb (2x). This new topology will continue to be rolled out through the end of the year but will only affect storage accounts created after the 07/12/2012 as mentioned above. These scale target improvements (which BTW are separate from the published Azure Storage SLA) will really help reduce the amount of ‘sharding’ that needs to be done for those with higher throughput requirements.

New 1.8 SDK – Windows Server 2012, .NET 4.5, and new Storage Client

BUILD also marked the launch of the new 1.8 Windows Azure SDK. This release is IMHO the most significant update to the SDK since the 1.3 version was launched almost 2 years ago. You could write a blog post any one of the key features, but since they are all so closely related and this is supposed to be a highlight post, I’m going to bundle it up.

The new SDK introduces the new “OS Family 3″ to Windows Azure Cloud Services giving us support for Windows Server 2012. Now when you combine this with the added support for .NET 4.5 and IIS 8, we can start taking advantage of technology like Web Sockets. Unfortunately Web Sockets are not enabled by default so there is some work you’ll need to do to take advantage of it. You may also need to tweak the internal Windows Firewall. A few older Guest OS’s were also depreciated so you may want to refer to the latest update of the compatibility matrix.

The single biggest, and subsequently most confusing piece of this release has to do with the new 2.0 Storage Client. Now this update includes some great features including support for a preview release of the storage client toolkit for Windows Runtime (Windows Store) apps. However, there are some SIGNIFICANT changes to the client, so I’d recommend you review the list of Breaking Changes and Known Issues before you decide to start converting over. Fortunately, all the new features are in a new set of namespaces (Windows.AzureStorage.StorageClient has become simply Windows.Azurestorage.Storage). So this does allow you to mix and match old functionality with the new. But forewarned is forearmed as they say. So read up before you just dive into the new client headlong.

For more details on some of the known issues with this SDK and the workarounds, refer to the October 2012 release notes and you can learn about all the changes to the Visual Studio tools by checking out “What’s New in the Windows Azure Tools“.

HDInsight – Hadoop on Windows Azure

Technically, this was released the week before BUILD, but I’m going to touch on it none the less. A preview of HDInsight has been launched that allows you to help test out the new Apache™ Hadoop® on Windows Azure service. This will feature support for common frameworks such as Pig and Hive and it also includes a local developer installation of the HDInsight Server and SDK for writing jobs with .NET and Visual Studio.

It’s exciting to see Microsoft embracing these highly popular open source initiatives. So if you’d doing anything with big data, you may want to run over and check out the blog post for additional details.

Windows Azure – coming to China

Doug Hauger also announced that Microsoft has reached an agreement (Memorandum of Understanding, aka an agreement to start negotiations) which will license Windows Azure technologies to 21Vianet. This will in turn allow them to offer Windows Azure in China from local datacenters. While not yet a fully “done deal”, it’s a significant first step. So here’s hoping the discussions are concluded quickly and that this is just the first of many such deals we’ll see struck in the coming year. So all you Aussies, hold out hope! J

Other news

This was just the beginning. The Windows Azure team ran down a slew of other slightly less high-profile but equally important announcements on the team blog. Items like a preview of the Windows Azure Store, GA (general availability) for the Windows Azure dedicated, distributed in-memory cache feature launched back in June with the 1.7 SDK, and finally the launch of the Visual Studio Team Foundation Service which has been in preview for the last year.

In closing…

All in all, it was a GREAT week in the cloud. Or as James Staten put it on ZDNet, “You’re running out of excuses to not try Microsoft Windows Azure“. And this has just been the highlights. If you’d like to learn more, I highly recommend you run over and check out the session recordings from BUILD 2012 or talk to your local Microsoft representative.

PS – Don’t forget to snag your own copy of the great new Windows Azure poster!

Joining Microsoft

Seven years ago, I set out to take charge of my career. I’d spent the last 13 years working as an FTE for various employers both large and small. And realized that for the last 5-6 years, I’d basically been coasting along with the currents. If I wanted to go anywhere, I needed to take control and find a direction.

With that decision, I set out to pursue a position with a consulting firm. It figured it would provide me with challenges that would help me grow.  Fortunately, just as I made this decision, my brother had a coworker leave to go to work for a local firm. I shared my info and within a few days got a call. Even more fortunately, they had an immediate need for someone with my exact skills (knowing both the mainframe and .NET worlds). Things moved very rapidly and in less than a month, I joined Sogeti USA as a Senior Consultant.

I haven’t regretted that decision for a moment. Working at Sogeti has been a great experience. It has had its up and downs like any job. But taken on the whole, I’ve really liked it here. I have a management team that I feel honestly cares about me and my career growth. I work with some great people both locally and globally. And most of all, they provided me with the opportunity to seek out new ventures for myself and the company. In my seven years here, I’ve gone from being a local code-slinging, heads down delivery resource to a national thought leader with the organization, helping steer its future.

So it was a very difficult decision for me to leave this behind. Colleagues I’ve come to consider friends and even family.

Now over the last 3 years, I’ve been focused on this “cloud thing”. I went really deep on a technology I feel would help carry my career for the next 5-10yrs and in doing so I achieved some items I never really set out for. I gained the attention and made friends with some REALLY smart people at Microsoft. I’m talking the kinds of people that just when you think you know what you’re talking about show you that you don’t know jack. I also became a Microsoft MVP for Windows Azure. And nobody was as surprised about this as I was.

Over these years, I’ve also learned of opportunities to work even closer with Windows Azure. But the opportunities never felt right, especially with two kids I would really like to see graduate from the same school system they’ve been in since kindergarten. That was until back in June of this year when a position was posted on the Windows Azure ISV Incubation team. I thought long and hard on this, even talked to former Microsoft employees and family. And after weeks of reflection I applied and was ultimately offer the position.

So starting Monday I’m going to join Microsoft as a Technical Evangelist in the US central region. I’m both excited and nervous about this change. Sogeti is a great company to work with and I wouldn’t hesitate to go back for a moment. But I feel that at this time I’ll truly be able to pursue my passion around cloud and maybe in some small way help steer the platform into the bright future I see ahead of it. Not a short term one of “wins” and industry hype. But one that is helping organizations of all sizes build the next generation of applications and solutions.

I’ll still be based in Minneapolis, and still active online and at local/regional events. I do have to set aside my MVP status (which I’d just received for the 3rd time). But honestly, that pales by comparison to stepping away from my role at Sogeti. And I’ll never forget that Sogeti has been the place that most helped me grow and get to where I’m at. So this next new step in my life wouldn’t have been possible without them.

So today, as I look at my surroundings, is a day for mixed emotions. I have hope and excitement about the future. But sadness at the ending, well.. the changing of a great partnership.

Cloud Computing News Digest for September 21st, 2012

I normally publish this over at my Sogeti blog at http://blogs.us.sogeti.com/ccdigest/ but that’s down at the moment so we’re going to my backup copy. I know, the self proclaimed “cloud guy” isn’t in the cloud. Well there’s an old saying that goes something like ‘the cobbler’s children have no shoes’. Smile

I’d say I’m late with this edition but this is developing into enough of a pattern that I think I’m just going to start thinking of monthly as the new weekly J So on to the news…

The Cloud Security Alliance (CSA) and Fujitsu announced the launch of the Big Data Working Group. The intent of this organization is to help the industry by bringing forth best practices for security and privacy when working with big data. They will start focused on research across several industry verticals with their first report due sometime this fall.

At the 2012 CloudOpen conference this past August, Suse announced their OpenStack based enterprise level private cloud solution called amazingly enough “Suse Cloud”. This IaaS based solution would help organizations deploy and manage private clouds with self-service and workload standardization capabilities.

I also found an article about a competitor to OpenStack, Eucalyptus. SearchCloudComputing has published a “deep dive” into using Eucalpytus 3.1. You’ll need to register as a member (its free) to read the full article

In my job, I’m often asked what skills are needed for cloud. This article by Joe McKendrick does a nice job of covering the list. Not just for individuals, but for organizations as well.

When you talk to cloud vendors, they will eventually reference PEU (Power to Energy Utilization) statistics in some way. But as this piece by David Linthicum over at Toolbox.com explains, the real savings are in the ability to adjust to changing needs and in turn, changing our consumption.

Last month the world watched the 2012 Summer Olympics. And it turns out the cloud played a major hand in helping deliver that content around the globe. Windows Azure Media Services helped deliver live and on-demand video content to several broadcasters. Eyes weren’t just on the games as Apica, a vendor of testing and monitoring solutions, monitored various Olympics related web sites and scored them for their uptime and performance.

For this edition I also found a presentation by Adrian Cockcroft of Netflix on the Cassandra (another noSQL database solution) Performance and Scalability on AWS. Even if you don’t plan to use Cassandra, I highly recommend listen to this and picking up what you can of their approach and learnings. The video lasts about an hour.

Pfizer (the drug…. er… pharmaceutical company), also ventured into the world of cloud computing to help with supply chain issues. If you ever wondered about your critical delivery, what about getting lifesaving medicine to patients.

On the Google front, they haven’t been quite. They recently launched the Google Cloud Partner Program, giving them a way to help promote and leverage delivery partners not unlike the programs already in place at Amazon and Microsoft.

Related to topics that are close to my heart, I have a great article on resilient solution engineering from Jesse Robbins at GameDay. Having all this capacity for disaster recovery and failover doesn’t do us much good if we won’t create solutions that can take advantage of it. And on the subject of architecture, just yesterday I ran across this great list of items for architectural principles taken from Will Larson’s “Introduction to Architecting Systems for Scale”. Definitely give this a read.

And to close out this edition, I have an info graphics on enterprise cloud adoption. I’m not a big fan of infographics, but I found this one useful and figured I’d share it with all of you.

Avoiding the Chaos Monkey

Yesterday I was pleased (and nervous) to be presenting at the Heartland Developers Conference in Omaha, NE. I’ve been hoping to present at this event for a couple years and was really pleased that one of my submissions was accepted. Especially given that the topic was more architect/concept then code. It was only my second time presenting this material and the first time for a non-captive audiance. And given that it was the 2pm slot, and only a handful of people fell asleep or left, I’m pretty pleased with how things went.

I’ve posted the deck for my Avoiding the Chaos Monkey presentation so please feel free to take and reuse. I just ask that you give proper credit and I’d love any feedback on it. I received some great feedback from HDC on the material and will be making some updates that show some real world scenarios and how applying the principles covered in this presentation can address them. I spoke to some of these during the presentation, but agreed with my colleague Eric that it would help to have more concrete and visual examples to drive the message home. I’ve already submitted the talk to two upcoming conferences and hopefully it will get accepted at one. Meanwhile, feel free to snag a copy and drop me a comment with any feedback you have!

You don’t really want an SLA!

I don’t often to editorials (and when I do, they tend to ramble), but I felt I’m due and this is a conversation I’ve been having a lot lately. I sit to talk with clients about cloud and one of the first questions I always get is “what is the SLA”? And I hate it.

The fact is that an SLA is an insurance policy. If your vendor doesn’t provide a basic level of service, you get a check. Not unlike my home owners insurance. If something happens, I get a check. The problem is that most of us NEVER want to have to get that check. If my house burns down, the insurance company will replace it. But all those personal mementos, the memories, the “feel” of the house are gone. So that’s a situation I’d rather avoid. What I REALLY want is safety. So install a fire-alarm, I make sure I have an extinguisher in the kitchen, I keep candles away from drapes. I take measures to help reduce the risk that I’ll need to cash my insurance policy.

When building solutions, we don’t want SLA’s. What we REALLY want is availability. So we as the solution owners need to take steps to help us achieve this. We have to weight the cost vs the benefit (do I need an extinguisher or a sprinkler system?) and determine how much we’re wiling to invest in actively working to achieve our own goals.

This is why when I get asked the question, I usually respond by giving them the answer and immediately jump into a discussion about resiliency. What is a service degradation vs an outage? How can we leverage redundancy? Can we decouple components and absorb service disruptions? These are the types of things we as architects need to start considering, not just for cloud solutions but for everything we build.

I continue to tell developers that the public cloud is a stepping stone. The patterns we’re using in the public cloud are lessons learned that will eventually get applied back on premises. As the private cloud becomes less vapor and more reality, the ability to think in these new patterns is what will make the next generation of apps truly useful. If a server goes down, how quickly does your load balancer see this and take that server out of rotation? How do the servers shift workloads?

When working towards availability, we need to take several things in mind.

Failures will happen – how we deal with them is our choice. We can have the world stop, or we can figure out how to “degrade” our solution to keep anything we can going.

How are we going to recover – when things return to normal, how does the solution “catch up” with what happened during the disruption

the outage is less important than how fast we react – we need to know something has gone wrong before our clients call to tell us

We (aka solution/application architects) really need to start changing the conversation here. We need to steer away from SLA’s entirely and when we can’t manage that at least get to more meaningful, scenario based SLA’s. This can mean instead of saying “the email server will be 99% of the time” we switch to “99% of emails will be transmitted within 5 minutes”. This is much more meaningful for the end users and also gives s more flexibility in how we achieve it. And depending on how traffic.

Anyway, enough rambling for now. I need to get a deck that discusses this ready for a presentation on Thursday that only about 20 minutes ago I realized I needed to do. Fortunately, I have an earlier draft of the session and definitely have the passion and knowhow to make this happen. So time to get cracking!

Until next time!

Follow

Get every new post delivered to your Inbox.

Join 1,076 other followers