IaaS – The changing face of Windows Azure

I need to preface this post by saying I should not be considered, by any stretch of imagination, a “network guy”. I know JUST enough to plug in an Ethernet cable, not to fall for the old “the token fell out of the network ring” gag, and know how to tracert connectivity issues. Thanks mainly to my past overindulgence in online role playing games.

In June of 2012, we announced that we would be adding Infrastructure as a Service (IaaS) features to the Windows Azure Platform. While many believe that Platform as a Service (PaaS) is still the ultimate “sweet spot” with regards to cost/benefit ratios, the reality is that PaaS adoption is… well… challenging. After 25+ years of buying, installing, configuring, and maintaining hardware, nearly everyone in the IT industry tends to think of terms of servers, both physical and virtual. So the idea of having applications and data float around within a datacenter and not tied to specific locations is just alien for many. This created a barrier to the adoption of PaaS, a barrier that we are hoping our IaaS services will help bridge (not sure about “bridging barriers” as a metaphor since I always visualize barriers as those concrete fence things on the side of highway construction sites for but we’ll just go with it).

Unfortunately, there’s still a lot of confusion about what our IaaS solution is and how to work with this. Over the last few months, I’ve run into this several times with partners so I wanted to pull together some of my learnings into a single blog post. As much for my own personal reference as for me to be able to easily share it with all of you.

Some terminology

So I’d like to start by explaining a few terms as they are used within the Windows Azure Platform…

Cloud Service – This is a collection of virtual machines (either PaaS role instances or IaaS virtual machines) representing an isolation boundary that contains computational workloads. A Cloud Service can contain either PaaS compute instances, or IaaS Virtual Machines, but not both. (UPDATE 4/16/2013: A IaaS VM hosting Cloud Service will only appear in the cloud service tab of the management portal after at second VM has been added to it. Once visible, it will remain so until it is deleted).

Availability Set – For PaaS solutions, the Windows Azure Fabric already knows to distribute the same workload across different physical hardware within the datacenter. But for IaaS, I need to tell it to do this with the specific virtual machines I’m creating. We do this by placing the virtual machines into an availability set.

Virtual Network – Because addressability to the PaaS or IaaS instances within Cloud Services is limited to only those ports that you declare (by configuring endpoints), it’s sometimes helpful to have a way to create bridges between those boundaries or even between them and on-premises networks. This is where Windows Azure Virtual Networks come into play.

The reason these items are important is that in Windows Azure you’re going to use them to define your solution. Each piece represents a way to group, or arrange resources and how they can be addressed.

You control the infrastructure, mostly…

Platform as a Service, or PaaS, handles a lot for you (no surprise as that’s part of the value proposition). But in Infrastructure as a Service, IaaS, you take on some of that responsibility. The problem is that we are used to taking care of traditional datacenter deployments and either a) don’t understand what IaaS still does for us and b) just aren’t sure how this cloud stuff is supposed to be used. So we, through no fault of our own try to do things the way we always have. And who could really blame us?

So let’s start with what Windows Azure IaaS still does for you. It obviously handles the physical hardware and hypervisor management. This includes provisioning the locations for our Virtual Machines, getting them deployed, and of course moving them around the data center in the case of a hardware failure or host OS (the server that’s hosting our virtual machine) upgrades. The Azure Fabric, our secret sauce as it were, also controls basic datacenter firewall configuration (what ports are exposed to the internet), load balancing, and addressability/visibility isolation (that Cloud Service thing I was talking about). This covers everything right up to the virtual machine itself. But that’s not where it stops. To help secure Windows Azure, we control how all the virtual machines talk to our network. This means that the Azure Fabric also has control of the virtual NIC that is installed into your VM’s!

Now the reason this is important is that there are some things you’d normally try to do if you were creating a network in a traditional datacenter. Like possibly providing fixed IP’s to the servers so you can easily do name resolution. Fixed IPs in a cloud environment is generally a bad idea. Especially so if that cloud is built on the concept of having the flexibility to move stuff around the datacenter for you if it needs too. And if this happens in Windows Azure, it’s pretty much assured that the virtual NIC will get torn down and rebuilt and in the process lose any customizations you made to it. This is also a frequent cause for folks losing the ability to connect to their VMs (something that’s usually fixable by re-sizing/kicking the VM via the management portal). It also highlights one key, but not often thought of feature that Windows Azure provides for you, server name resolution.

Virtual Machine Name Resolution

The link I just dropped does a pretty good job of explaining what’s available to you with Windows Azure. You can either let Windows Azure do it for you and leverage the names you provided for the virtual machines when you created them, or you can use Virtual Networking to bring your own DNS. Both work well, so it’s really a matter of selecting the right option. The primary constraint is that the Windows Azure provided name resolution will only work for virtual machines (be they IaaS machines or PaaS role instances) hosted in Windows Azure. If you need to provide name resolution between cloud and on-premises, you’re going to want to likely use your own DNS server.

The key here again is to not hardcode IP address. Pick the appropriate solution and let it do the work for you.

Load Balanced Servers

The next big task is how to load balance virtual machines in IaaS. For the most part, this isn’t really any different than how you’d do it for PaaS Cloud Services, create the VM, and “attach” it to an existing Virtual Machine (this places both virtual machines within the same cloud service). Then, as long as both machines are watching the same ports, the traffic will be balanced between the two by the Windows Azure Fabric.

If you’re using the portal to create the VM, you’ll need to make sure you use the “create from gallery” option and not quick create. Then as you progress through the wizard, you’ll hit the step where it asks you if you want to join the new virtual machine to an existing virtual machine or leave it as standalone.

Now once they are both part of the same cloud service, we simply edit the available endpoints. In the management portal, you’ll select a Virtual Machine, and either add or edit the endpoint using the tools menu across the bottom. Then you set the endpoint attributes manually (if it’s a new endpoint that’s not already load balanced), or choose to load balance it with a previously defined endpoint. Easy-peasy. J

High Availability

Now that we have load balanced endpoints, the next step is to make sure that if one of our load balanced virtual machines goes offline (say a host OS upgrade or hardware failure), that the service doesn’t become entirely unavailable. In Windows Azure Cloud Services, the Fabric would automatically distribute the running instances across multiple fault domains. To put it simply, fault domains try to help ensure that workloads are spread across multiple pieces of hardware, this way if there is a hardware failure on a ‘rack’, it won’t take down both machines. When working with IaaS, we still have this as an option but we need to tell the Azure Fabric that we want to take advantage of this by placing our virtual machines into an Availability Set so the Azure Fabric knows it should distribute them.

You configure a virtual machine that’s already deployed to join it to an Availability Set, or we can assign a new one to a set when we create/deploy it (providing we’re not using Quick Create which you hopefully aren’t anyways because you can’t place a quick create VM into an existing cloud service). Both options work equally well and we can create multiple Availability Sets within a Cloud Service.

Virtual Networks

So you might ask, this is all find and dandy if the virtual machines are deployed as part of a single cloud service. But I can’t combine PaaS and IaaS into a single cloud service, and I also can’t do direct machine addressing if the machine I’m connecting to exists in another cloud service, or even on-premises. So how do I fix that? The answer is Windows Azure Virtual Networks.

In Windows Azure, the Cloud Service is an isolation boundary, fronted by a gatekeeper layer that serves as a combination load balancer and NAT. The items inside the cloud service can address each other directly and any communication that comes in from outside of the cloud service boundary has to come through the gatekeeper. Think of the cloud service as a private network branch. This is good because it provides a certain level of network security, but bad in that we now have challenges if we’re trying to communication across the boundary.

Virtual Network allows you to join resources across cloud service boundaries, or by leveraging an on-premises VPN gateway to join cloud services and on-premises services. Acting as a bridge across the isolation boundaries and enabling direct addressability (providing there’s appropriate domain resolution) without the need to publically expose the individual servers/instances to the internet.

Bringing it all together

So if we bring this all together, we now have a way to create complex solutions that mix and match different compute resources (we cannot currently join things like Service Bus, Azure Storage, etc… via Virtual Network). One such example might be the following diagram…

A single Windows Azure Virtual Network that combines an on-premises server, a PaaS Cloud Service, and both singular and load balanced virtual machines. Now I can’t really speculate on where this could go next, but I think we have a fairly solid starting point for some exciting scenarios. And if we do for IaaS what we’ve done for the PaaS offering over the last few years… continuing to improve the tooling, expanding the feature set, and generally just make things more awesome, I think there’s a very bright future here.

But enough chest thumping/flag waving. Like many topics here, I created this to help me better understand these capabilities and hopefully some of you may benefit from it as well. If not, I’ll at least share with you a few links I found handy:

Mike Washam – Windows Azure Virtual Machines

MSDN – Windows Azure Name Resolution

WindowsAzure.com – Load Balancing Virtual Machines

WindowsAzure.com – Manage the Availability of Virtual Machines

Until next time!

Windows Azure Web Sites – Quotas, Scaling, and Pricing

It hasn’t been easy making the transition from a consultant to someone that for lack of a better explanation is a cross between pre-sales and technical support. But I’ve come to love two aspects of this job. First off, I get to talk to many different people and I’m constantly learning as much from their questions as I’m helping teach them about the platform. Secondly, when not talking with partners about the platform, I’m digging up answers to questions. This gives me the perfect excuse… er… reason to dig into some of the features and learn more about them. I had to do this as a consultant, but the issue there is that since I’d be asked to do this by paying clients, they would own the results. But now I do this work on behalf of Microsoft, it’s much easier to share these findings with the community (providing it doesn’t violate non-disclosure agreements of course). And since this blog has always been a way for me to document things so I can refer back to them, it’s a perfect opportunity to start sharing this research.

Today’s topic is Windows Azure Web Sites quotas and pricing. Currently we (Microsoft) doesn’t IMHO do a very good job of making this information real clear. Some of it is available over on the pricing page, but for the rest you’ve got to dig it out of blog posts or from the Web Site dashboard’s usage overview details in the management portal. So I decided it was time to consolidate a few things.

Usage Quotas

A key aspect of the use of any service is to understand the limits. And nowhere is this truer then the often complex/confusing world of cloud computing services. But when someone slaps a “free” in front of a service, we tend to forget this. Well here I am to remind you. Windows Azure Web Sites has several dials that we need to be aware of when selecting the level/scale of Windows Azure Web Sites (Free, Shared, and Reserved).

File System/Storage: This is the total amount of space you have to store your site and content. There’s no timeframe on this one. If you reach the quota limit, you simply can’t write any new content to the system.

Egress Bandwidth: This is the amount of content that is served up by your web site. If you exceed this quota, your site will be temporarily suspended (no further requests) until the quota timeframe (1 day) resets.

CPU Time: This is the amount of time that is spent processing requests for your web site. Like the bandwidth quota, if you exceed the quota, your site will be temporarily suspended until the quota timeframe resets. There are two quota timeframes, a 5 minute limit, and a daily limit.

Memory: is the amount of RAM that the site can use at one shot (there’s no timeframe). If you exceed the quota, a long running or abusive process will be terminated. And if this occurs often enough, your site may be suspended. Which is pretty good encouragement to rethink that process.

Database: There’s also up to 20mb for database support for your related database (MySQL or Windows Azure SQL Database currently). I can’t find any details but I’m hoping/guessing this will work much like the File Storage quota.

Now for the real meat of this. What are the quotas for each tier? For that I’ve created the following table.

Quota Resource

Free Tier

Shared Tier

(per web site)

Reserved Tier

(up to 100 sites)

File Storage 1024mb for all sites 1024mb 10gb
Egrees Bandwidth 165mb/day per datacenter, 5gb per region Pay as you go, not included in base price Pay as you go, not included in base price
CPU Time 1hr/day, 2.5 minutes of every 5 4hrs/day, 2.5 minutes of every 5 N/A
Memory 1024mb/hr 512mb/hr N/A
Database 20mb 20mb N/A

Now there’s an important but slightly confusing “but” to the free tier. At that level, you get a daily limit egress bandwidth quota per sub-region (aka datacenter), but there’s also a regional (US, EU, Asia) limit (5GB). The regional limit is the sum total off all web sites you’re hosting that is shared with any other services. So if you’re also using Blob storage to serve up images from your site that will count against your “free” 5 GB. But when you move to the shared/reserved tier, there’s no limit, but you pay for every gigabyte that leaves the datacenter.

Monitoring Usage

Now the next logical question is how you monitor the resources your sites are using. Fortunately, the most recent update to Windows Azure portal has a dashboard that provides a quick glance as how much you’re using of each quota. This displays just below usage grid on the “Dashboard” panel of the web site.

At a glance you can tell where you on any quotas which also makes it convenient for you to predict your usage. Run some common scenarios and see what they do to your numbers and extrapolate from there.

You can also configure the site for diagnostics (again via the management portal). This allows you to take the various performance indicators and save them to Windows Azure Storage. From there you can download the files and set up automated monitors to alert you to problems. Just keep in mind that turning this on will consume resources and incur additional charges.

Fortunately, there’s a pretty good article we’ve published on Monitoring Windows Azure Web Sites.

Scaling & Pricing

Now that we’ve covered your usage quotas and how to monitor your usage, it’s important to understand how we can scale the capacity of our web sites and the impact this has on pricing.

Scaling our web site is pretty straight forward. We go can go from the Free Tier, to Shared, to Reserved using the management portal. Select the web site, click on the level, and then save to “scale” your site. But before you do that, you will want to understand the pricing impacts.

At the Free tier, we get up to 10 web sites. When we move a web site to shared, we will pay $0.02 per hour for each web site (at general availability). Now that this point, I can mix and match free (10 per sub-region/datacenter) and shared (100 per sub-region/datacenter) web sites. But things get a bit trickier when we move to reserved. A reserved web site is a dedicated virtual machine for your needs. When you move a web site within a region to the reserved tier, all web sites in that same sub-region/datacenter (up to the limit of 100) will also be moved to reserved.

Now this might seem a bit confusing until you realize that at the reserved tier, you’re paying for the virtual machine and not an individual web site. So it makes sense to have all your sites hosted on that instance, maximizing your investment. Furthermore, if you are running enough shared tier web sites, it may be more cost effective to run them as reserved.

Back to scaling, if you scale back down to the free or shared tiers, the other sites will revert back to their old states. For example, let’s assume you have two web sites one at the free tier, one at the shared tier. I scale the free web site up to reserved and now both sites are reserved. If I scale the original free tier site back to free, the other site returns to shared. If I opted to scale the original shared site back to shared or free, then the original free site returns to its previous free tier. So it’s important when dealing with reserved sites that you remember what tier they were at previously.

The tiers are not our only option for scaling our web sites. We also have a slider labelled instance count if we are running a Shared or Reserved site. When running at the shared tier, this slider will change the number of processing threads that are servicing the web site allowing us between 1 and 6 threads. Of course, it we increase the threads, there’s a greater risk of hitting our cpu usage quota. But this adjustment could come in real handy if we’re experiencing a short term spike in traffic. Running at the reserved tier, the slider increases the number of virtual machine instances we (and subsequently our cost). This option allows us to run up to 10 reserved instances.

Also at the reserved tier, we can increase the size of our virtual machine. By default, our reserved instance will be a “small” giving us a single cpu core and 1.75 GB of memory at a cost of $0.12/hr. We can increase the size to “Medium” and even “Large” with each size increase doubling our resources and the price per hour ($0.24 and $0.48 respectively). This cost will be per virtual machine instance, so if I have opted to run 3 instances, take my cost per hour for the size and multiple it by 3.

So what’s next?

This pretty much hits the limits of what we can do with scaling web sites. But fortunately we’re running on a platform that’s built for scale. So it’s just a hop, skip, and jump from Web Sites to Windows Azure Cloud Services (Platform as a Service) or Windows Azure Virtual Machines (Infrastructure as a service). But that’s an article for another day. J

BUILD 2012 – Not just for Windows anymore

Last week marked the second BUILD conference. In 2011, BUILD replaced the Microsoft PDC conference in an event that was so heavily Windows 8 focused that it was even host at buildwindows.com. While the URL didn’t change for 2012, the focus sure did as this event also marked the latest round of big release news for Windows Azure. In this post (which I’m publishing directly from MS Word 2013 btw), I’m going to give a quick rundown of the Windows Azure related announcements. Think of this as your Cliff Notes version of the conference.

Windows Azure Service Bus for Windows Server – V1 Released

Previously released as a beta/preview back in June, this on-premise flavor of the Windows Azure Service bus is now fully released and available for download. Admittedly, it’s strictly for brokered messaging for now. But it’s still a substantial step towards providing feature parity between public and private cloud solutions. Now we just need to hope that shops that opt to run this will run it as internal SaaS and not set up multiple silos. Don’t get me wrong. It’s nice to know we have the flexibility to do silos, but I’m hoping we learn from what we’ve seen in the public cloud and don’t fall back to old patterns.

One thing to keep in mind with this… It’s now possible for multiple versions of the Service Bus API to be running within an organization. To date, the public service has only had two major API versions. But going forward, we may need to be able to juggle even more. And while there will be a push to keep the hosted and on-premises versions at similar versions, there’s nothing requiring someone hosting it on-premises to always upgrade to the latest version. So as solution developers/architects, we’ll want to be prepared for be accommodating here.

Windows Azure Mobile Services – Windows Phone 8 Support

With Windows Phone 8 being formally launched the day before the BUILD conference, it only makes sense that we’d seen related announcements. And a key one of those was the addition of Windows Phone 8 support to Windows Azure Mobile Services. This announcement makes Windows Phone 8, the 3rd supported platform (Windows Store & iOS apps) for Mobile Services. This added to an announcement earlier in the month which expanded support for items like sending email, and different identity providers. So the Mobile Services team is definitely burning the midnight oil to get new features out to this great platform.

New Windows Azure Storage Scalability Targets

New scale targets have been announced for storage accounts created after June 7th 2012. This change has been enabled by the new “flat network” topology that’s being deployed into the Windows Azure Datacenters. In a nutshell, it allows the tps scale targets to be increased by 4x and the upper limit of a storage account to be raised to 200tb (2x). This new topology will continue to be rolled out through the end of the year but will only affect storage accounts created after the 07/12/2012 as mentioned above. These scale target improvements (which BTW are separate from the published Azure Storage SLA) will really help reduce the amount of ‘sharding’ that needs to be done for those with higher throughput requirements.

New 1.8 SDK – Windows Server 2012, .NET 4.5, and new Storage Client

BUILD also marked the launch of the new 1.8 Windows Azure SDK. This release is IMHO the most significant update to the SDK since the 1.3 version was launched almost 2 years ago. You could write a blog post any one of the key features, but since they are all so closely related and this is supposed to be a highlight post, I’m going to bundle it up.

The new SDK introduces the new “OS Family 3″ to Windows Azure Cloud Services giving us support for Windows Server 2012. Now when you combine this with the added support for .NET 4.5 and IIS 8, we can start taking advantage of technology like Web Sockets. Unfortunately Web Sockets are not enabled by default so there is some work you’ll need to do to take advantage of it. You may also need to tweak the internal Windows Firewall. A few older Guest OS’s were also depreciated so you may want to refer to the latest update of the compatibility matrix.

The single biggest, and subsequently most confusing piece of this release has to do with the new 2.0 Storage Client. Now this update includes some great features including support for a preview release of the storage client toolkit for Windows Runtime (Windows Store) apps. However, there are some SIGNIFICANT changes to the client, so I’d recommend you review the list of Breaking Changes and Known Issues before you decide to start converting over. Fortunately, all the new features are in a new set of namespaces (Windows.AzureStorage.StorageClient has become simply Windows.Azurestorage.Storage). So this does allow you to mix and match old functionality with the new. But forewarned is forearmed as they say. So read up before you just dive into the new client headlong.

For more details on some of the known issues with this SDK and the workarounds, refer to the October 2012 release notes and you can learn about all the changes to the Visual Studio tools by checking out “What’s New in the Windows Azure Tools“.

HDInsight – Hadoop on Windows Azure

Technically, this was released the week before BUILD, but I’m going to touch on it none the less. A preview of HDInsight has been launched that allows you to help test out the new Apache™ Hadoop® on Windows Azure service. This will feature support for common frameworks such as Pig and Hive and it also includes a local developer installation of the HDInsight Server and SDK for writing jobs with .NET and Visual Studio.

It’s exciting to see Microsoft embracing these highly popular open source initiatives. So if you’d doing anything with big data, you may want to run over and check out the blog post for additional details.

Windows Azure – coming to China

Doug Hauger also announced that Microsoft has reached an agreement (Memorandum of Understanding, aka an agreement to start negotiations) which will license Windows Azure technologies to 21Vianet. This will in turn allow them to offer Windows Azure in China from local datacenters. While not yet a fully “done deal”, it’s a significant first step. So here’s hoping the discussions are concluded quickly and that this is just the first of many such deals we’ll see struck in the coming year. So all you Aussies, hold out hope! J

Other news

This was just the beginning. The Windows Azure team ran down a slew of other slightly less high-profile but equally important announcements on the team blog. Items like a preview of the Windows Azure Store, GA (general availability) for the Windows Azure dedicated, distributed in-memory cache feature launched back in June with the 1.7 SDK, and finally the launch of the Visual Studio Team Foundation Service which has been in preview for the last year.

In closing…

All in all, it was a GREAT week in the cloud. Or as James Staten put it on ZDNet, “You’re running out of excuses to not try Microsoft Windows Azure“. And this has just been the highlights. If you’d like to learn more, I highly recommend you run over and check out the session recordings from BUILD 2012 or talk to your local Microsoft representative.

PS – Don’t forget to snag your own copy of the great new Windows Azure poster!

Local File Cache in Windows Azure

When creating a traditional on-premise application, it’s not uncommon to leverage the local file system as a place to store temporary files and thus increase system performance. But with Windows Azure Cloud Services, we’ve been taught that we shouldn’t write things to disk because the virtual machines that host our services aren’t durable. So we start going to remote durable storage for everything. This slows down our applications so we need to add back in some type of cache solution.

Previously, I discussed using the Windows Azure Caching Preview to create a distributed, in-memory cache. I love that we finally have a simple way to do to this. But there are times when I think that caching something, for example an image file that doesn’t change often, within a single instance would be fine, especially if I don’t have to use up precious RAM on my virtual machines.

Well there is an option! Windows Azure Cloud Services all include, at no additional cost, an allocation of non-durable local disk space called surprisingly enough “Local Storage”. For each core you get 250gb of essentially temporary disk space. And with a bit of investment, we can leverage that space as a local, file backed cache.

Extending System.Runtime.Caching

So .NET 4.0 introduced the System.Runtime.Caching namespace along with a template base class ObjectCache that can be extended to provide caching functionality with whatever storage system we want to use. Now this namespace also provides a concrete implementation called MemoryCache, but we want to use the file system. So we’ll create our own implementation called FileCache class.

Note: There’s already a codeplex project that provides a file based implementation of ObjectCache. But I still wanted to role my own for the sake of explaining some of the challenges that will arise.

So I create a class library and add a reference to System.Runtime.Caching. Next up, let’s rename the default class “Class1.cs” to “FileCache.cs”. Lastly, inside of the FileCache class, I’ll add a using statement for the Caching namespace and make sure my new class inherits from ObjectCache.

Now if we try to build the class library now, things wouldn’t go very well because there are 18 different abstract members we need to implement. Fortunately I’m running the Visual Studio Power Tools so it’s just a matter of right-clicking on ObjectCache where I indicated I’m inheriting from it and selecting the “Implement Abstract Class”. This gives us shells for all 18 abstract members, but until we add some real implementation in, our FileCache class won’t even be minimally useful.

I’ll start by fleshing out the Get method and adding a public property, CacheRootPath, to the class that designates where our file cache will be kept.

public string CacheRootPath
{
    get { return cacheRoot.FullName; }
    set
    {
        cacheRoot = new DirectoryInfo(value);
        if (!cacheRoot.Exists) // create if it doesn't exist
            cacheRoot.Create();
    }
}

public override bool Contains(string key, string regionName = null)
{
    string fullFileName = GetItemFileName(key,regionName);
    FileInfo fileInfo = null;

    if (File.Exists(fullFileName))
    {
        fileInfo = new FileInfo(fullFileName);

        // if item has expired, don't return it
        //TODO: 
        return true;
    }
    else
        return false;
}

// return type is an object, but we'll always return a stream
public override object Get(string key, string regionName = null)
{
    if (Contains(key, regionName))
    {
        //TODO: wrap this in some exception handling
        MemoryStream memStream = new MemoryStream();
        FileStream fileStream = new FileStream(GetItemFileName(key, regionName), FileMode.Open);
        fileStream.CopyTo(memStream);
        fileStream.Close();

        return memStream;
    }
    else
        return null;
}

CacheRootPath is just a way for us to set the path to where our cache will be stored. The Contains method is a way to check and see if the file exists in the cache (and ideally should also be where we check to make sure the object isn’t expired), and the Get method leverages Contains to see if the item exists in the cache and retrieves it if it exists.

Now this is where I had my fist real decision to make. Get must return an object, but what type of object should I return. In my case I opted to return a memory stream.  Now I could have returned a file stream that was attached to the file on disk, but because this could lock access to file, I wanted to have explicit control of that stream. Hence I opted to copy the file stream to a memory stream and return that to the caller.

You may also note that I left the expiration check alone. I did this for the demo because your needs for file expiration may differ. You could base this on FileInfo.CreationTimeUTC, or FileInfo.LastAccessTimeUTC. both are valid as may be any other meta data you need to base it on. I do recommend one thing, make a separate method that does the expiration check. We will use it later.

Note: I’m specifically calling out the use of UTC. When in Windows Azure, UTC is your friend. Try to use it whenever possible.

Next up, we have to shell out the three overloaded versions of AddOrGetExisting. These methods are important because even though I won’t be directly accessing them in my implementation, they are leveraged by base cass Add method. And thus, these methods are how we add items into the cache. The first two overloaded methods will call the lowest level implementation.

public override object AddOrGetExisting(string key, object value, CacheItemPolicy policy, string regionName = null)
{
    if (!(value is Stream))
        throw new ArgumentException("value parameter is not of type Stream");

    return this.AddOrGetExisting(key, value, policy.AbsoluteExpiration, regionName);
}

public override CacheItem AddOrGetExisting(CacheItem value, CacheItemPolicy policy)
{
    var tmpValue = this.AddOrGetExisting(value.Key, value.Value, policy.AbsoluteExpiration, value.RegionName);
    if (tmpValue != null)
        return new CacheItem(value.Key, (Stream)tmpValue);
    else
        return null;
}

The key item to note here is that in the first method, I do a check on the object to make sure I’m receiving a stream. Again, that was my design choice since I want to deal with the streams.

The final overload is where all the heavy work is…

public override object AddOrGetExisting(string key, object value, DateTimeOffset absoluteExpiration, string regionName = null)
{
    if (!(value is Stream))
        throw new ArgumentException("value parameter is not of type Stream");

    // if object exists, get it
    object tmpValue = this.Get(key, regionName);
    if (tmpValue != null)
        return tmpValue;
    else
    {
        //TODO: wrap this in some exception handling

        // create subfolder for region if it was specified
        if (regionName != null)
            cacheRoot.CreateSubdirectory(regionName);

        // add object to cache
        FileStream fileStream = File.Open(GetItemFileName(key, regionName), FileMode.Create);

        ((Stream)value).CopyTo(fileStream);
        fileStream.Flush();
        fileStream.Close();

        return null; // successfully added
    }
}

We start by checking to see if the object already exists and return it if found in the cache. Then we create a subdirectory if we have a region (region implementation isn’t required). Finally, we copy the value passed in to our file and save it. There really should be some exception handling in here to make sure we’re handling things in a way that’s a little more thread save (what if the file gets created between when we check for it and start the write). And the get should be checking to make sure the file isn’t already open when doing its read. But I’m sure you can finish that out.

Now there’s still about a dozen other methods that need to be fleshed out eventually. But these give us our basic get and add functions. What’s still missing is handling evictions from the cache. For that we’re going to use a timer.

public FileCache() : base()
{
    System.Threading.TimerCallback TimerDelegate = new System.Threading.TimerCallback(TimerTask);

    // time values should be based on polling interval
    timerItem = new System.Threading.Timer(TimerDelegate, null, 2000, 2000);
}

private void TimerTask(object StateObj)
{
    int a = 1;
    // check file system for size and if over, remove older objects


    //TODO: check polling interval and update timer if its changed
}

We’ll update the FileCache constructor to create a delegate using our new TimerTask method, and pass that into a Timer object. This will execute the TimeTask method and regular intervals in a separate thread. I’m using a hard-coded value, but we really should check to see we have a specific polling interval set. Course we should also put some code into this method so it actually does things like check to see how much room we have in the cache and evict expired items(by checking via the private method I suggested earlier), etc…

The Implementation

With our custom caching class done (well not done but at least to a point where its minimally functional), its time to implement it. For this, I opted to setup an MVC Web Role that allows folks to upload an image file to Windows Azure Blob storage. Then, via a WCF/REST based service, it would retrieve the images twice. The first retrieval would be without using caching, the second would be with caching. I won’t bore you with all the details of this setup, so we’ll focus on just the wiring up of our custom FileCache.

We start appropriately enough with the role’s Global.asax.cs file where we add public property that represents out cache (so its available anywhere in the web application):

public static Caching.FileCache globalFileCache = new Caching.FileCache();

And then I update the Application_Start method to retrieve our LocalResource setting and use it to set the CacheRootPath property of our caching object.

protected void Application_Start()
{
    AreaRegistration.RegisterAllAreas();

    RegisterGlobalFilters(GlobalFilters.Filters);
    RegisterRoutes(RouteTable.Routes);

    Microsoft.WindowsAzure.CloudStorageAccount.SetConfigurationSettingPublisher(
        (configName, configSetter) =>
            configSetter(RoleEnvironment.GetConfigurationSettingValue(configName))
    );

    globalFileCache.CacheRootPath = RoleEnvironment.GetLocalResource("filecache").RootPath;
}

Now ideally we could make it so that the CacheRootPath instead accepted the LocalResource object returned by GetLocalResource. This would then also mean that our FileCache could easily manage against the maximum size of the local storage resource. But I figured we’d keep any Windows Azure specific dependencies out of this base class and maybe later look at creating a WindowsAzureLocalResourceCache object. But that’s a task for another day.

Ok, now to wire up the cache into the service that will retrieve the blobs. Lets start with the basic implementation:

public Stream GetImage(string Name, string container, bool useCache)
{
    Stream tmpStream = null; // could end up being a filestream or a memory stream

    var account = CloudStorageAccount.FromConfigurationSetting("ImageStorage"); 
    CloudBlobClient blobStorage = account.CreateCloudBlobClient();
    CloudBlob blob = blobStorage.GetBlobReference(string.Format(@"{0}/{1}", container, Name));
    tmpStream = new MemoryStream();
    blob.DownloadToStream(tmpStream);

    WebOperationContext.Current.OutgoingResponse.ContentType = "image/jpeg";
    tmpStream.Seek(0, 0); // make sure we start the beginning
    return tmpStream;
}

This method takes the name of a blob and its container, as well as a useCache parameter (which we’ll implement in a moment). It uses the first two values to get the blob and download it to a stream which is then returned to the caller with a content type of “image/jpeg” so it can be rendered by the browser properly.

To implement our cache we just need to add a few things. Before we try to set up the CloudStorageAccount, we’ll add these lines:

// if we're using the cache, lets try to get the file from there
if (useCache)
    tmpStream = (Stream)MvcApplication.globalFileCache.Get(Name);

if (tmpStream == null)
{

This code tries to use the globalFileCache object we defined n the Global.asax.cs file and retrieve the blob from the cache if it exists, providing we told the method useCache=true. If we couldn’t find the file (tmpStream == null), we’ll then fall into the block we had previously that will retrieve the blob image and return it.

But we still have to add in the code to add the blob to the cache. We’ll do right after we DownloadToStream:

    // "fork off" the adding of the object to the cache so we don't have to wait for this
    Task tsk = Task.Factory.StartNew(() =>
    {
        Stream saveStream = new MemoryStream();
        blob.DownloadToStream(saveStream);
        saveStream.Seek(0, 0); // make sure we start the beginning
        MvcApplication.globalFileCache.Add(Name, saveStream, new DateTimeOffset(DateTime.Now.AddHours(1)));
    });
}

This uses an async task to add the blob to the cache. We do this with asynchronously so that we don’t block returning the blob back to the requestor while the write to disk completes. We want this service to return the file back as quickly as possible.

And that does it for our implementation. Now to testing it.

Fiddler is your friend

Earlier, you may have found yourself saying “self, why did he use a service for his implementation”. I did this because I wanted to use Fiddler to measure the performance of calls to retrieve the blob with and without caching. And by putting it in a service and letting fiddler monitor the response times, I didn’t have to write up my own client and put timings around it.

To test my implementation, I fired up fiddler and then launched the service. We should see calls in Fiddler to SimpleService.svc/GetImage, one with cache=false and one with cache=true. If we select those items, and select the Statistics tab, we should see some significant differences in the “Overall Elapsed” times of each call. In my little tests, I was seeing anywhere from a 50-90% reduction in the elapsed time.

image

In fact, if you run the tests several times by hitting refresh on the page, you may even notice that the first time you hit Windows Azure storage for a particular blob, you may have additional delay compare to subsequent calls. Its only a guess but we may be seeing Windows Azure storage doing some of its own internal caching there.

So hopefully I’ve described things well enough here and you can follow what we’ve done. But if not, I’m posting the code for you to reuse. Just make sure you update the storage account settings and please please please finish the half started implementation I’m providing you.

Here’s to speedy responses thanks to caching. Until next time.

The “traffic cop” pattern

So I like design patterns but don’t follow them closely. Problem is that there are too many names and its just so darn hard to find them. But one “pattern” I keep seeing an ask for is the ability to having something that only runs once across a group of Windows Azure instances. This can surface as one-time startup task or it could be the need to have something that run constantly and if one instance fails, another can realize this and pick up the work.

This later example is often referred to as a “self-electing controller”. At the root of this is a pattern I’ve taken to calling a “traffic cop”. This mini-pattern involves having a unique resource that can be locked, and the process that gets the lock has the right of way. Hence the term “traffic cop”. In the past, aka my “mainframe days”, I used this with systems where I might be processing work in parallel and needed to make sure that a sensitive block of code could prevent a parallel process from executing it while it was already in progress. Critical when you have apps that are doing things like self-incrementing unique keys.

In Windows Azure, the most common way to do this is to use a Windows Azure Storage blob lease. You’d think this comes up often enough that there’d be a post on how to do it already, but I’ve never really run across one. That is until today. Keep reading!

But before I dig into the meat of this, a couple footnotes… First is a shout out to my buddy Neil over at the Convective blob. I used Neil’s Azure Cookbook for help me with the blob leasing stuff. You can never have too many reference books in your Kindle library. Secondly, the Windows Azure Storage team is already working on some enhancements for the next Windows Azure .NET SDK that will give us some more ‘native’ ways of doing blob leases. These include taking advantage of the newest features of the 2012-02-12 storage features. So the leasing techniques I have below may change in an upcoming SDK.

Blob based Traffic Cop

Because I want to get something that works for Windows Azure Cloud Services, I’m going to implement my traffic cop using a blob. But if you wanted to do this on-premises, you could just as easily get an exclusive lock on a file on a shared drive. So we’ll start by creating a new Cloud Service, add a worker role to it, and then add a public class to the worker role called “BlobTrafficCop”.

Shell this class out with a constructor that takes a CloudPageBlob, a property that we can test to see if we have control, and methods to Start and Stop control. This shell should look kind of like this:

class BlobTrafficCop
{
    public BlobTrafficCop(CloudPageBlob blob)
    {
    }

    public bool HasControl
    {
        get
        {
            return true;
        }
    }

    public void Start(TimeSpan pollingInterval)
    {
    }

    public void Stop()
    {
    }
}

Note that I’m using a CloudPageBlob. I specifically chose this over a block blob because I wanted to call out something. We could create a 1tb page blob and won’t be charged for 1 byte of storage unless we put something into it. In this demo, we won’t be storing anything so I can create a million of these traffic cops and will only incur bandwidth and transaction charges. Now the amount I’m saving here isn’t even significant enough to be a rounding error. So just note this down as a piece of trivia you may want to use some day. It should also be noted that the size you set in the call to the Create method is arbitrary but MUST be a multiple of 512 (the size of a page). If you set it to anything that’s not a multiple of 512, you’ll receive an invalid argument exception.

I’ll start putting some buts into this by doing a null argument check in my constructor and also saving the parameter to a private variable. The real work starts when I create three private helper methods to work with the blob lease. GetLease, RenewLease, and ReleaseLease.

GetLease has two parts, setting up the blob, and then acquiring the lease. Here’s how I go about creating the blob using the CloudPageBlob object that was handed in:

try
{
    myBlob.Create(512);
}
catch (StorageClientException ex)
{
    // conditionfailed will occur if there's already a lease on the blob
    if (ex.ErrorCode != StorageErrorCode.ConditionFailed)
    {
        myLeaseID = string.Empty;
        throw ex; // re-throw exception
    }
}

Now admittedly, this does require another round trip to WAS, so as a general rule, I’d make sure the blob was created when I deploy the solution and not each time I try to get a lease on it. But this is a demo and we want to make running it as simple as possible. So we’re putting this in. I’m trapping for a StorageClientExcpetion with a specific error code of ConditionFailed. This is what you will see if you issue the Create method against a blob that has an active lease on it. So we’re handing that situation. I’ll get to myLeaseID here in a moment.

The next block creates a web request to lease the blob and tries to get that lease.

try
{
    HttpWebRequest getRequest = BlobRequest.Lease(myBlob.Uri, 30, LeaseAction.Acquire, null);
    myBlob.Container.ServiceClient.Credentials.SignRequest(getRequest);
    using (HttpWebResponse response = getRequest.GetResponse() as HttpWebResponse)
    {
        myLeaseID = response.Headers["x-ms-lease-id"];
    }
}
catch (System.Net.WebException)
{
    // this will be thrown by GetResponse if theres already a lease on the blob
    myLeaseID = string.Empty;
}

BlobRequest.lease will give me a template HttpWebRequest for the least. I then use the blob I received in the constructor to sign the request, and finally I execute the request and get its response. If things go well, I’ll get a response back and it will have a header with the id for the lease which I’ll put into a private variable (the myLeaseID from earlier) which I can use later when I need to renew the lease. I also trap for a WebException which will be thrown if my attempt to get a lease fails because there’s already a lease on the blob.

RenewLease and ReleaseLease are both much simpler. Renew creates a request object, signs and executes it just like we did before. We’ve just changed the LeaseAction to Renew.

HttpWebRequest renewRequest = BlobRequest.Lease(myBlob.Uri, 30, LeaseAction.Renew, myLeaseID);
myBlob.Container.ServiceClient.Credentials.SignRequest(renewRequest);
using (HttpWebResponse response = renewRequest.GetResponse() as HttpWebResponse)
{
    myLeaseID = response.Headers["x-ms-lease-id"];
}

ReleaseLease is just a bit more complicated because we check the status code to make sure we released the lease properly. But again its mainly just creating the request and executing it, this time with the LeaseAction of Release.

HttpWebRequest releaseRequest = BlobRequest.Lease(myBlob.Uri, 30, LeaseAction.Release, myLeaseID);
myBlob.Container.ServiceClient.Credentials.SignRequest(releaseRequest);
using (HttpWebResponse response = releaseRequest.GetResponse() as HttpWebResponse)
{
    HttpStatusCode httpStatusCode = response.StatusCode;
    if (httpStatusCode == HttpStatusCode.OK)
        myLeaseID = string.Empty;
}

Ideally, I’d have liked to do a bit more testing of these to make sure there weren’t any additional exceptions I should handle. But I’m short on time so I’ll leave that for another day.

Starting and Stopping

Blob leases expire after an interval if they are not renewed. So its important that I have a process that regularly renews the lease, and another that will check to see to see if I can get the lease if I don’t already have it. To that end, I’m going to use System.Threading.Timer objects with a single delegate called TimerTask. This delegate is fairly simple, so we’ll start there.

private void TimerTask(object StateObj)
{
    // if we have control, renew the lease
    if (this.HasControl)
        RenewLease();
    else // we don't have control
        // try to get lease
        GetLease();

    renewalTimer.Change((this.HasControl ? TimeSpan.FromSeconds(45) : TimeSpan.FromMilliseconds(-1)), TimeSpan.FromSeconds(45));
    pollingTimer.Change((!this.HasControl ? myPollingInterval : TimeSpan.FromMilliseconds(-1)), TimeSpan.FromSeconds(45));
}

We start by checking that HasControl property we created in our shell. This property just checks to see if myLeaseID is a string with a length > 0.  If so, then we need to renew our lease. If not, then we need to try and acquire the lease. I then change the intervals on two System.Threading.Timer objects (we’ll set them up next), renewalTimer and pollingTimer. Both are private variables of our class.

If we have control, then the renewal timer will be set to fire again in 45 seconds(15 seconds before our lease expires), and continue to fire every 45 seconds after that. If we don’t have control, renewal will stop checking. pollingTimer works in reverse, polling if we don’t have a lease, and stopping when we do. I’m using two separate timers because the renewal timer needs to fire every minute if I’m to keep control. But the process that’s leveraging may want to control the interval at which we poll for control, so I want that on a separate timer.

Now lets start our traffic cop:

public void Start(TimeSpan pollingInterval)
{
    if (this.IsRunning)
        throw new InvalidOperationException("This traffic cop is already active. You must call 'stop' first.");

    this.IsRunning = true;

    myPollingInterval = pollingInterval;

    System.Threading.TimerCallback TimerDelegate = new System.Threading.TimerCallback(TimerTask);

    // start polling immediately for control
    pollingTimer = new System.Threading.Timer(TimerDelegate, null, TimeSpan.FromMilliseconds(0), myPollingInterval);
    // don't do any renewal polling
    renewalTimer = new System.Threading.Timer(TimerDelegate, null, TimeSpan.FromMilliseconds(-1), TimeSpan.FromSeconds(45));
}

We do a quick check to make sure we’re not already running, then set a flag to say we are (just a private boolean flag). I save off the control polling interval that was passed in and set up a TimerDelegate using the TimerTask method we set up a moment before. Now it’s just a matter of creating our Timers.

The polling timer will start immediately and fire again at the interval the calling process set. The renewal timer, since we’re just starting out attempts to get control, will not start, but will be set up to check every 45 seconds so we’re ready to renew the lease once we get it.

When we call the start method, it essentially causes our polling timer to fire immediately (asyncronously). So when TaskTimer is executed by that timer, HasControl will be false and we’ll try to get a lease. If we succeed, the polling timer will be stopped and the renewal timer will be activated.

Now to stop traffic:

public void Stop()
{
    // stop lease renewal
    if (renewalTimer != null)
        renewalTimer.Change(TimeSpan.FromMilliseconds(-1), TimeSpan.FromSeconds(45));
    // start polling for new lease
    if (pollingTimer != null)
        pollingTimer.Change(TimeSpan.FromMilliseconds(-1), myPollingInterval);

    // release a lease if we have one
    if (this.HasControl)
        ReleaseLease();

    this.IsRunning = false;
}

We’ll stop and dispose of both timers,  release any locks we have, and then reset our boolean “IsRunning” flag.

And that’s the basics of our TrafficCop class. Now for implementation….

Controlling the flow of traffic

Now the point of this is to give us a way to control when completely unrelated processes can perform an action. So let’s flip over to the WorkerRole.cs file and put some code to leverage the traffic copy into its Run method. We’ll start by creating an instance of the CloudPageBlog object that will be our lockable object and passed into our TrafficCop class.

var account = CloudStorageAccount.FromConfigurationSetting("TrafficStorage");

// create blob client
CloudBlobClient blobStorage = account.CreateCloudBlobClient();
CloudBlobContainer container = blobStorage.GetContainerReference("trafficcopdemo");
container.CreateIfNotExist(); // adding this for safety

// use a page blog, if its empty, there's no storage costs
CloudPageBlob pageBlob = container.GetPageBlobReference("singleton");

This creates an object, but doesn’t actually create the blob. I made the conscious decision to go this route and keep any need for the TrafficCop class to have to directly manage storage credentials or the container out of things. Your individual needs may vary. The nice thing is that once this is done, starting the cop is a VERY simple process:

myTrafficCop = new BlobTrafficCop(pageBlob);
myTrafficCop.Start(TimeSpan.FromSeconds(60));

So this will tell the copy to use a blob called “singleton” in the blob container “trafficcopdemo” as our controlling process and to check for control every 30 seconds. But that’s not really interesting. If we ran this with two instances, what we’d see is that one instance would get control and keep it until something went wrong with getting the lease. So I want to alter the infinite loop of this worker role so  I can see the cop is doing its job and also that I can pass control back and forth.

So I’m going to alter the default loop so that it will sleep for 15 seconds every loop and each time through will write a message to the console that it either does or does not have control. Finally, I’ll use a counter so that if an instance has control, it will only keep control for 75 seconds then release it.

int controlcount = 0;
while (true)
{
    if (!myTrafficCop.IsRunning)
        myTrafficCop.Start(TimeSpan.FromSeconds(30));

    if (myTrafficCop.HasControl)
    {
        Trace.WriteLine(string.Format("Have Control: {0}", controlcount.ToString()), "TRAFFICCOP");
        controlcount++;
    }
    else
        Trace.WriteLine("Don't Have Control", "TRAFFICCOP");

    if (controlcount >= 4)
    {
        myTrafficCop.Stop();
        controlcount = 0;
        Thread.Sleep(TimeSpan.FromSeconds(15));
    }

    Thread.Sleep(TimeSpan.FromSeconds(15));
}

Not the prettiest code I’ve ever written, but it gets the job done.

Looking at the results

So to see the demo at work, we’re going to increase the instance count to 2, and I’m also going to disable diagnostics. Enabling diagnostics will just cause some extra messages in the console output that I want to avoid. Otherwise, you can leave it in there. Once that’s done, it’s just a matter of setting up the TrafficStorage configuration setting to point at a storage account and pressing F5 to run the demo. If everything goes well, the role should deploy, and we can see both instances running in the Windows Azure Compute Emulator UI (check the little blue flag in the tool tray to view the UI).

If everything is working as intended, you’ll see output sort of like this:

image

Notice that the role is going back and forth with having control, just as we’d hoped. You may also note that the first message was that we didn’t have control. This is because our attempts to get control is happening asynchronously in a separate thread. Now you can change that if you need to, but in out case this isn’t necessary. I just wanted to point it out.

Now as I mentioned, this is just a mini-pattern. So for my next post I hope to wrap this in another class that demonstrates the self-electing controller. Again leveraging async processes to execute something for our role instance in a separate thread. But done so in a way where we don’t need to monitor and manage what’s happening ourselves. Meanwhile, I’ve uploaded the code. So please make use of it.

Until next time!

Service Bus and “pushing” notifications

I put quotes around the word ‘pushing’ in the title of this post because this isn’t a pure “push” scenario but more of a ‘solicited push’. Check out this blog post where Clemens Vasters discusses the differences and realize I’m more pragmatic then purist. :)

So the last several projects I’ve worked on, I’ve wanted to have a push notification system that I could use to send messages to role instances so that they could take actions. There’s several push notification systems out there, but I was after some simple that would be included as part of my Windows Azure services. I’ve put a version of this concept into several proposals, but this week finally received time to create a practical demo of the idea.

For this demo, I’ve selected to use Windows Azure Service Bus Topics. Topics, unlike Windows Azure Storage queues give me the capability to have multiple subscribers each receive a copy of a message. This was also an opportunity to dig into a feature of Windows Azure I haven’t worked with in over a year. Given how much the API has changed in that time, it was a frustrating, yet rewarding exercise.

The concept is fairly simple. Messages are sent to a centralized topic for distribution. Each role instance then creates its own subscriber with the appropriate filter on it so it receives the messages it cares about. This solution allows for multiple publishers and subscribers and will give me a decent amount of scale. I’ve heard reports/rumors of issues when you get beyond several hundred subscribers, but for this demo, we’ll be just fine.

Now for this demo implementation, I want to keep it simple. It should be a central class that can be used by workers or web roles to create their subscriptions and receive notifications with very little effort. And to keep this simplicity going, give me just as easy a way to send messages back out.

NotificationAgent

We’ll start by creating a class library for our centralized class, adding references to it for Microsoft.ServiceBus (so we can do our brokered messaging) and Microsoft.WindowsAzure.ServiceRuntime (for access to the role environment). I’m also going to create my NotificationTopic class.

Note: there are several supporting classes in the solution that I won’t cover in this article. If you want the full code for this solution, you can download it here.

The first method we’ll add to this is a constructor that takes the parameters we’ll need to connect to our service bus namespace as well as the name/path for the topic we’ll be using to broadcast notifications on. The first of these is creating a namespace manager so I can create topics and subscriptions and a messaging factory that I’ll use to receive messages. I’ve split this out a bit so that my class can support being passed a TokenProvider (I hate demo’s that only use the service owner). But here is the important lines:

TokenProvider tmpToken = TokenProvider.CreateSharedSecretTokenProvider(issuerName, issuerKey);
Uri namespaceAddress = ServiceBusEnvironment.CreateServiceUri(“sb”, baseAddress, string.Empty);
this.namespaceManager = new NamespaceManager(namespaceAddress, tokenProvider);
this.messagingFactory = MessagingFactory.Create(namespaceAddress, tokenProvider);

We create a URI and a security token to use for interaction with our service bus namespace. For the sake of simplicity I’m using issuer name (owner) an the service administration key. I’d never recommend this for a production solution, but its fine for demonstration purposes. We use these to create a NamespaceManager and MessagingFactory.

Now we need to create the topic, if it doesn’t already exist.

try
{
// doesn’t always work, so wrap it
if (!namespaceManager.TopicExists(topicName))
this.namespaceManager.CreateTopic(topicName);
}
catch (MessagingEntityAlreadyExistsException)
{
// ignore, timing issues could cause this
}

Notice that I check to see if the topic exists, but I also trap for the exception. That’s because I don’t want to assume the operation is single threaded. With this block of code running in many role instances, its possible that between checking if it doesn’t exist and the create. So I like to wrap them in a try/catch. You can also just catch the exception, but I’ve long liked to avoid the overhead of unnecessary exceptions.

Finally, I’ll create a TopicClient that I’ll use to send messages to the topic.

So by creating an instance of this class, I can properly assume that the topic exists, and I have all the items I need to send or receive messages.

Sending Messages

Next up, I create a SendMessage method that accepts a string message payload, the type of message, and a TImeSpan value that indicates how long the message should live. In this method we first create a BrokeredMessage giving it an object that represents my notification message. We use the lifespan value that is passed in and set the type as a property. Finally, we send the message using the TopicClient we created earlier and do appropriate exception handling and cleanup.

try
{
bm = new BrokeredMessage(msg);
bm.TimeToLive = msgLifespan;
// used for filtering
bm.Properties[MESSAGEPROPERTY_TYPE] = messageType.ToString();
topicClient.Send(bm);
success = true;
}
catch (Exception)
{
success = false;
// TODO: do something
}
finally
{
if (bm != null) // if was created successfully
bm.Dispose();
}

Now the important piece here is the setting of a BrokeredMessage property. It’s this property that can be used later on to filter the messages we want to receive. So let’s not forget that. And you’ll also notice I have a TODO left to add some intelligent exception handling. Like logging the issue.

Start Receiving

This is when things get a little more complicated. Now the experts (meaning the folks I know/trust that responded to my inquiry), recommend that instead of going “old school” and having a thread that’s continually polling for responses, we instead leverage async processing. So we’re going to make use of delegates.

First we need to define a delegate for the callback method:

public delegate bool RecieverCallback(NotificationMessage mesage, NotificationMessageType type);

We then reference the new delegate in the method signature for the message receiving starter:

public void StartReceiving(RecieverCallback callback, NotificationMessageType msgType = NotificationMessageType.All)

More on this later….

Now inside this method we first need to create our subscriber. Since I want to have one subscriber for each role instance, I’ll need to get this from the Role Environment.

// need to parse out deployment ID
string instanceId = Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.CurrentRoleInstance.Id;
subscriptionName = instanceId.Substring(instanceId.IndexOf(‘.’)+1);SubscriptionDescription tmpSub = new SubscriptionDescription(topicName, subscriptionName);

Now is the point where we’ll add the in a filter using the Property that we set on the notification when we created it.

{
Filter tmpFilter = new SqlFilter(string.Format(“{0} = ‘{1}’”, MESSAGEPROPERTY_TYPE, msgType));
subscriptionClient.AddRule(SUBFILTER, tmpFilter);
}

I’m keeping it simple and using a SqlFilter using the property name we assigned when sending. So this subscription will only receive messages that match our filter criteria.

Now that all the setup is done, we’ll delete the subscription if it already exists (this gets rid of any messages and allows us to start clean) and create it new using the NameSpaceManager we instantiated in the class constructor. Then we start our async operation to retrieve messages:

asyncresult = subscriptionClient.BeginReceive(waittime, ReceiveDone, subscriptionClient);

Now in this, ReceiveDone is the callback method for the operation. This method is pretty straight forward. We make sure we’ve gotten a message (in case the operation simply timed out) and that we can get the payload. Then, using the delegate we set up earlier, And then we end by starting another async call to get another message.

if (result != null)
{
SubscriptionClient tmpClient = result.AsyncState as SubscriptionClient;    BrokeredMessage brokeredMessage = tmpClient.EndReceive(result);
//brokeredMessage.Complete(); // not really needed because your receive mode is ReceiveAndDeleteif (brokeredMessage != null)
{
NotificationMessage tmpMessage = brokeredMessage.GetBody<NotificationMessage>();

// do some type mapping here

recieverCallback(tmpMessage, tmpType);
}
}

// do recieve for next message
asyncresult = subscriptionClient.BeginReceive(ReceiveDone, subscriptionClient);

Now I’ve added two null checks in this method just to help out in case a receive operation fails. Even the, I won’t guarantee this works for all situations. In my tests, when I set the lifespan of a message to less than 5 seconds, still had some issues (sorting those out yet, but wanted to get this sample out).

Client side implementation

Whew! Lots of setup there. This is where our hard work pays off. We define a callback method we’re going to hand into our notification helper class using the delegate we defined. We’ll keep it super simple:

private bool NotificationRecieved(NotificationMessage message, NotificationMessageType type)
{
Console.WriteLine(“Recieved Notification”);    return true;
}

Now we need to instantiate our helper class and start the process of receiving messages. We can do this with a private variable to hold on our object and a couple lines into role’s OnStart.

tmpNotifier = new NotificationTopic(ServiceNamespace, IssuerName, IssuerKey, TopicName);
tmpNotifier.StartReceiving(new NotificationTopic.RecieverCallback(NotificationRecieved), NotificationMessageType.All);

Now if we want to clean things up, we can also add some code to the role’s OnStop.

try
{
if (tmpNotifier != null)
tmpNotifier.StopReceiving();
}
catch (Exception e)
{
Console.WriteLine(“Exception during OnStop: “ + e.ToString());
}base.OnStop();

And that’s all we need.

In Closing

So that’s it for our basic implementation. I’ve uploaded the demo for you to use at your own risk. You’ll need to update the WebRole, WorkerRole, and NotifierSample project with the information about your Service Bus namespace. To run the demo, you will want to set the cloud service project as the startup project, and launch it. Then right click on the NotifierSample project and start debugging on it as well.

While this demo may work fine for certain applications, there is definitely room for enhancement. We can tweak our message lifespan, wait timeouts, and even how many messages we retrieve at one time. And it’s also not the only way to accomplish this. But I think it’s a solid starting point if you need this kind of simple, self-contained notification service.

PS – As configured, this solution will require the ability to send outbound traffic on port 9354.

Meet Windows Azure–Christmas in June

November 2010 marked the release/launch of Windows Azure. In November of 2011, we received the 1.3 SDK and our first major updates to the service since its launch a year before. Over the next 18 months, there were numerous updates that added features. But we really didn’t have a fundamental shift in the product. All that changed on June 7th 2012.

The BIG NEWS

June 7th marked the Meet Windows Azure Virtual conference. This three hour event was broadcast on the internet from San Francisco in front of a small, live audience. And in its first hour took thecovers off of several HUGE new features:

  • Persistent Virtual Machines – IaaS style hosting of Windows or Linux based virtual machines
  • Windows Azure Web Sites – high density hosting
  • Dedicated Cache – a new distributed, in-memory dedicated cache feature
  • Windows Azure Virtual Network – create trust relationships with cloud hosted VM’s via your existing VPN gateway

Also announced were:

  • A new management portal – compatible with multiple browsers and devices (it’s a preview though, not 100% feature complete)
  • “Hosted Services” renamed to “cloud services”
  • new 1.7 SDK w/ Visual Studio 2012 support
  • updated Windows Azure Storage Pricing – transaction costs reduced by 90% and option to turn off geo-replication and save $0.032/gb
  • Media Services (already announced, but general preview now available)
  • Additional country support (89 total countries and 19 local currencies)

The reality is that bloggers all over the world area already working on posts on the new features. I had limited bandwidth these days (I’d love consulting if it wasn’t for all those pesky clients – just kidding folks), so I figured I’d provide you with some links for you to explore until I’m able to spend some time exploring the new features on your behalf and diving into them in detail. Smile

Virtual Machines, Web Sites, and a new Cache option

The first update that came out a day before the event from Bill Laing, Corporate Vice President of Server and Cloud at Microsoft (aka the person that owns the datacenter side of Windows Azure). In his Announcing New Windows Azure Services to Deliver “Hybrid Cloud” post, Bill gave a quick intro to what was coming. But this wasn’t much more than a teaser.

The next big post was from “the Gu” himself and posted as he was giving his kick-off presentation. In Meet the new Windows Azure, Scott was kind enough to dive into some of the new features complete with pictures. So if you don’t have a subscription you can see the preview of the new management portal (it’s a preview because its not yet 100% complete, so expect future updates). He also discussed the new Windows Azure Virtual Machines feature. Unlike the previous VM Role, Virtual Machines are persistent (the PaaS roles are all stateless) and MSFT is providing support not just for Windows Server 2008 R2 and Windows Server 2012 (RC) but also Linux distros CentOS 6.2, OpenSUSE, and Ubuntu. You may also see a pre-defined SQL Server 2012 image. So this indicates we may see more Microsoft server products available as Windows Azure Virtual Machine images.

The real wow factor of the event seemed to be Windows Azure Web Sites. For lack of a better explanation, this is a high density hosting solution for web sites that features both inexpensive shared hosting or dedicated (non-multi-tenant) hosting. With this you can do just a couple clicks and deploy many common packages such as WordPress to Windows Azure Web Sites in just a few minutes. And to top it all off, this supports multiple publishing models.

The distributed cache feature was the one I was really waiting for. I was fortunate enough to get early access to this feature because of a project I was working on. And I think someone at MSFT might have taken a bit of pity on me when I posted a while back that I was going to build my own distributed cache system. This new feature allows you to set aside Windows Azure Cloud Services resources (memory from our deployed compute instances) and use them to create a “ring” that is an in-memory distributed cache. Some call this a “free” cache, but I don’t like that term because you are paying for it. You’re just able to leverage any left-over memory you might have in existing instances. If there isn’t any, you’re forced to spin up new instances (maybe even a specific role that does nothing) to host it. And hosting those VM’s still costs you per hour. So “free” isn’t the word I’d use to describe the distributed cache, I prefer “awesome”.

Windows Azure Storage Pricing Changes

Now the most confusing announcement yesterday was some changes to Windows Azure pricing. It was so confusing that the storage team has published two separate blog posts on the subject. The first post was simply announcing the that the “per unit” pricing for Azure Storage transactions went from 10,000 to 100,000, all for the same $0.01 per unit. This is great news and takes away a pricing disparity between Windows Azure and Amazon Web Services.

The next big change is that the Geo-replication features that were announced last fall (I can’t recall it was at BUILD or the “Learn Windows Azure” event), can be turned off. Now Azure storage costs were already reduced to $0.125/gb back in March of 2012. Well with this latest announced, you can turn off geo-replication and save yourself an additional $0.032/gb.

Brad Calder if you read this, thanks for taking the time to help clarify these changes! I would have simply said “it’s a net win!”

Videos, Videos, Videos

Now as you can see, there’s lots to cover. Fortunately, MSFT was prepared and posted slew of new videos.

MeetWindowsAzure.com has a series of Chalk Talk videos covering many of the new features. These range from 10 to 30 minutes in length (with most being only just under 10 minutes) and are great “why should I care” introductions. And as if that weren’t enough, the WindowsAzure account/channel over on YouTube has posted over 20 “tech bite” sized videos of the new features ranging from 2 to 10 minutes in length. You can’t go wrong with these quick and simple intros.

Wrap-up

So its still pretty exciting right now. I was present for most of yesterday’s live broadcast. But I still spent a good portion of today sorting through the news to pull this post together. I think these new features merit a honest and open re-evaluation of Windows Azure for anyone that has dismissed it in the past. And for those of us that already like and use the platform, we have some great new tools to help us better deliver exciting solutions.

BTW, if you have a Windows Azure subscription and would like to test drive the preview of some of these new features, you can sign up for it here!

So until next time, I’m going to try and take some time to learn this new features and you can bet I’ll be bringing you along for he ride!  Safe travels.

PS – I wonder if there are any surprises left in store for next week at TechEd North America 2012.

Meet Windows Azure (aka Learn Windows Azure v2)

Remember that massively popular “Learn Windows Azure” even that was held last December? Well the Windows Azure team evidently had so much fun with it that its back again on June 7th as Meet Windows Azure!

Last time around many of us (the Azure junkies) were also on twitter answering messages that came up. Well Magnus Mårtensson decided to help formalize that a bit more and set up a Lanyrd site so it will be that much easier for all the tweeps to get together.

And as if that wasn’t enough, we’re also starting an unofficial Blog Relay. So please create a post on this event and send a note to @noopman letting him know so we can get you added to the list. Here’s the last participants I have heard of:

MEET Windows Azure Blog Relay:

I hope to see you online on June 7th, look for @brentcodemonkey!

Partial Service Upgrades

So I as working on an answer for a stack overflow question yesterday and realized it was a topic that I hadn’t put down in my blog yet. So rather than just answer the question, I figured I’d blog about it here so I could include some screen shots and further explanation. The question was essentially how can I control the deployment of individual parts of my service.

So for this, I create a simple Windows Azure service with a Web Role, and a Worker Role. Its already up and hosted when we start this.

NOTE: this post only details doing this via the windows.azure.com portal. We’ll leave doing I programmatically via the management API for another day.

Upgrading a Single Role

imageThis is actually surprising simple. I open up the service, and select the role (not instances) I want to upgrade. Then we can right click and select upgrade, or click on the “upgrade” button on the toolbar.

Either option will launch the “Upgrade Deployment” diaglog box. If you look at this box (and presuming you have the role selected, you’ll notice that in the box, the “Role to Upgrade” option will list the role you had selected. If you didn’t properly select the role, this may list “All”.

Take a peek at the highlighted section of the following screen shot for an example of how this should look.

image

Note: while creating this post, I did receive an unhandled exception message from the silverlight portal. This has been reported to MSFT and I’ll update his when I get a response.

Manual Upgrades

I’ve run out of time today, but next time I’d like to cover doing a manual upgrade. Of course, I still have two posts in my PHP series I need to finish. Smile with tongue out So we’ll see which of these I eventually get back around to first.

Until next time!

A Custom High-Availability Cache Solution

For a project I’m working on, we need a simple, easy to manage session state service. The solution needs to be highly available, low latency, but not persistent. Our session caches will also be fairly small in size (< 5mb per user). But given that our projected high end user load could be somewhere in the realm of 10,000-25,000 simultaneous users (not overly large by some standards), we have serious concerns about the quota limits that are present in todays version of the Windows Azure Caching Service.

Now we looked around, Memcached, ehCache, MonboDB, nCache to name some. And while they all did various things we needed, there were also various pros and cons. Memcached didn’t have the high availability we wanted (unless you jumped through some hoops). MongoDB has high availability, but raised issues about deployment and management. ehCache and nCache well…. more of the same. Add to them all that anything that had a open source license would have to be vetted by the client’s legal team before we could use it (a process that can be time consuming for any organization).

So I spent some time coming up with something I thought we could reasonably implement.

The approach

I started by looking at how I would handle the high availability. Taking a note from Azure Storage, I decided that when a session is started, we would assign that session to a partition. And that partitions would be assigned to nodes by a controller with a single node potentially handling multiple partitions (maybe primary for one and secondary for another, all depending on overall capacity levels).

The cache nodes would be Windows Azure worker roles, running on internal endpoints (to achieve low latency). Within the cache nodes will be three processes, a controller process, the session service process, and finally the replication process.

The important one here is the controller process. Since the controller process will attempt to run in all the cache nodes (aka role instances), we’re going to use a blob to control which one actually acts as the controller. The process will attempt to lock a blob via a lease, and if successful will write its name into that blob container. It will then load the current partition/node mappings from a simple Azure Storage table (I don’t predict us having more then a couple dozen nodes in a single cache) and verify that all the nodes are still alive.  It then begins a regular process of polling the nodes via their internal endpoints to check on their capacity.

The controller also then manages the nodes by tracking when they fall in and out of service, and determining which nodes handle which partitions. If a node in a partition fails, it will assign that a new node to that partition, and make sure that the node is in different fault and upgrade domains from the current node. Internally, the two nodes in a partition will then replicate data from the primary to the secondary.

Now there will also be a hook in the role instances so that the RoleEnvironment Changing ad Changed events will alert the controller process that it may need to rescan. This could be a response to the controller being torn down (in which case the other instances will determine a new controller) or some node being torn down so the controller needs to reassign their the partitions that were assigned to those nodes to new nodes.

This approach should allow us to remain online without interruptions for our end users even while we’re in the middle of a service upgrade. Which is exactly what we’re trying to achieve.

Walkthrough of a session lifetime

So here’s how we see this happening…

  1. The service starts up, and the cache role instances identify the controller.
  2. The controller attempts to load any saved partition data and validate it (scanning the service topology)
  3. The consuming tier, checks the blob container to get the instance ID of the controller, and asks if for a new session ID (and its resulting partition and node instance ID)
  4. The controller determines if there is room in an existing partition or creates a new partition.
  5. If a new partition needs to be created, it locates two new nodes (in separate domains) and notifies them of the assignment, then returns the primary node to the requestor.
  6. If a node falls out (crashes, is being rebooted), the session requestor would get a failure message, and goes back to the controller for a new node for that partition.
  7. The controller provides the name of the previously identified secondary node (which is of course now the primary), and also takes on the process of locating a new node.
  8. The new secondary node will contact the primary node to begin replicate its state. The new primary will start sending state event change messages to the secondary.
  9. If the controller drops (crash/recycle), the other nodes will attempt to become the controller by leasing the blob. Once established as a controller, it will start over at step 2.
  10. Limits

    So this approach does have some cons. We do have to write our own synchronization process, and session providers. We also have to have our own aging mechanism to get rid of old session data. However, its my believe that these shouldn’t be horrible to create so its something we can easily overcome.

    The biggest limitation here is that because we’re going to be managing the in-memory cache ourselves, we might have to get a bit tricky (multi-gigabyte collections in memory) and we’re going to need to pay close attention to maximum session size (which we believe we can do).
    Now admittedly, we’re hoping all this is temporary. There’s been mentions publically that there’s more coming to the Windows Azure Cache service. And we hope that we can at that time, swap out our custom session provider for one that’s built to leverage whatever the vNext of Azure Caching becomes.
    So while not ideal, I think this will meet our needs and do so in a way that’s not going to require months of development. But if you disagree, I’d encourage you to sound off via the site comments and let me know your thoughts. .
Follow

Get every new post delivered to your Inbox.

Join 934 other followers