The “traffic cop” pattern

So I like design patterns but don’t follow them closely. Problem is that there are too many names and its just so darn hard to find them. But one “pattern” I keep seeing an ask for is the ability to having something that only runs once across a group of Windows Azure instances. This can surface as one-time startup task or it could be the need to have something that run constantly and if one instance fails, another can realize this and pick up the work.

This later example is often referred to as a “self-electing controller”. At the root of this is a pattern I’ve taken to calling a “traffic cop”. This mini-pattern involves having a unique resource that can be locked, and the process that gets the lock has the right of way. Hence the term “traffic cop”. In the past, aka my “mainframe days”, I used this with systems where I might be processing work in parallel and needed to make sure that a sensitive block of code could prevent a parallel process from executing it while it was already in progress. Critical when you have apps that are doing things like self-incrementing unique keys.

In Windows Azure, the most common way to do this is to use a Windows Azure Storage blob lease. You’d think this comes up often enough that there’d be a post on how to do it already, but I’ve never really run across one. That is until today. Keep reading!

But before I dig into the meat of this, a couple footnotes… First is a shout out to my buddy Neil over at the Convective blob. I used Neil’s Azure Cookbook for help me with the blob leasing stuff. You can never have too many reference books in your Kindle library. Secondly, the Windows Azure Storage team is already working on some enhancements for the next Windows Azure .NET SDK that will give us some more ‘native’ ways of doing blob leases. These include taking advantage of the newest features of the 2012-02-12 storage features. So the leasing techniques I have below may change in an upcoming SDK.

Blob based Traffic Cop

Because I want to get something that works for Windows Azure Cloud Services, I’m going to implement my traffic cop using a blob. But if you wanted to do this on-premises, you could just as easily get an exclusive lock on a file on a shared drive. So we’ll start by creating a new Cloud Service, add a worker role to it, and then add a public class to the worker role called “BlobTrafficCop”.

Shell this class out with a constructor that takes a CloudPageBlob, a property that we can test to see if we have control, and methods to Start and Stop control. This shell should look kind of like this:

class BlobTrafficCop
{
    public BlobTrafficCop(CloudPageBlob blob)
    {
    }

    public bool HasControl
    {
        get
        {
            return true;
        }
    }

    public void Start(TimeSpan pollingInterval)
    {
    }

    public void Stop()
    {
    }
}

Note that I’m using a CloudPageBlob. I specifically chose this over a block blob because I wanted to call out something. We could create a 1tb page blob and won’t be charged for 1 byte of storage unless we put something into it. In this demo, we won’t be storing anything so I can create a million of these traffic cops and will only incur bandwidth and transaction charges. Now the amount I’m saving here isn’t even significant enough to be a rounding error. So just note this down as a piece of trivia you may want to use some day. It should also be noted that the size you set in the call to the Create method is arbitrary but MUST be a multiple of 512 (the size of a page). If you set it to anything that’s not a multiple of 512, you’ll receive an invalid argument exception.

I’ll start putting some buts into this by doing a null argument check in my constructor and also saving the parameter to a private variable. The real work starts when I create three private helper methods to work with the blob lease. GetLease, RenewLease, and ReleaseLease.

GetLease has two parts, setting up the blob, and then acquiring the lease. Here’s how I go about creating the blob using the CloudPageBlob object that was handed in:

try
{
    myBlob.Create(512);
}
catch (StorageClientException ex)
{
    // conditionfailed will occur if there's already a lease on the blob
    if (ex.ErrorCode != StorageErrorCode.ConditionFailed)
    {
        myLeaseID = string.Empty;
        throw ex; // re-throw exception
    }
}

Now admittedly, this does require another round trip to WAS, so as a general rule, I’d make sure the blob was created when I deploy the solution and not each time I try to get a lease on it. But this is a demo and we want to make running it as simple as possible. So we’re putting this in. I’m trapping for a StorageClientExcpetion with a specific error code of ConditionFailed. This is what you will see if you issue the Create method against a blob that has an active lease on it. So we’re handing that situation. I’ll get to myLeaseID here in a moment.

The next block creates a web request to lease the blob and tries to get that lease.

try
{
    HttpWebRequest getRequest = BlobRequest.Lease(myBlob.Uri, 30, LeaseAction.Acquire, null);
    myBlob.Container.ServiceClient.Credentials.SignRequest(getRequest);
    using (HttpWebResponse response = getRequest.GetResponse() as HttpWebResponse)
    {
        myLeaseID = response.Headers["x-ms-lease-id"];
    }
}
catch (System.Net.WebException)
{
    // this will be thrown by GetResponse if theres already a lease on the blob
    myLeaseID = string.Empty;
}

BlobRequest.lease will give me a template HttpWebRequest for the least. I then use the blob I received in the constructor to sign the request, and finally I execute the request and get its response. If things go well, I’ll get a response back and it will have a header with the id for the lease which I’ll put into a private variable (the myLeaseID from earlier) which I can use later when I need to renew the lease. I also trap for a WebException which will be thrown if my attempt to get a lease fails because there’s already a lease on the blob.

RenewLease and ReleaseLease are both much simpler. Renew creates a request object, signs and executes it just like we did before. We’ve just changed the LeaseAction to Renew.

HttpWebRequest renewRequest = BlobRequest.Lease(myBlob.Uri, 30, LeaseAction.Renew, myLeaseID);
myBlob.Container.ServiceClient.Credentials.SignRequest(renewRequest);
using (HttpWebResponse response = renewRequest.GetResponse() as HttpWebResponse)
{
    myLeaseID = response.Headers["x-ms-lease-id"];
}

ReleaseLease is just a bit more complicated because we check the status code to make sure we released the lease properly. But again its mainly just creating the request and executing it, this time with the LeaseAction of Release.

HttpWebRequest releaseRequest = BlobRequest.Lease(myBlob.Uri, 30, LeaseAction.Release, myLeaseID);
myBlob.Container.ServiceClient.Credentials.SignRequest(releaseRequest);
using (HttpWebResponse response = releaseRequest.GetResponse() as HttpWebResponse)
{
    HttpStatusCode httpStatusCode = response.StatusCode;
    if (httpStatusCode == HttpStatusCode.OK)
        myLeaseID = string.Empty;
}

Ideally, I’d have liked to do a bit more testing of these to make sure there weren’t any additional exceptions I should handle. But I’m short on time so I’ll leave that for another day.

Starting and Stopping

Blob leases expire after an interval if they are not renewed. So its important that I have a process that regularly renews the lease, and another that will check to see to see if I can get the lease if I don’t already have it. To that end, I’m going to use System.Threading.Timer objects with a single delegate called TimerTask. This delegate is fairly simple, so we’ll start there.

private void TimerTask(object StateObj)
{
    // if we have control, renew the lease
    if (this.HasControl)
        RenewLease();
    else // we don't have control
        // try to get lease
        GetLease();

    renewalTimer.Change((this.HasControl ? TimeSpan.FromSeconds(45) : TimeSpan.FromMilliseconds(-1)), TimeSpan.FromSeconds(45));
    pollingTimer.Change((!this.HasControl ? myPollingInterval : TimeSpan.FromMilliseconds(-1)), TimeSpan.FromSeconds(45));
}

We start by checking that HasControl property we created in our shell. This property just checks to see if myLeaseID is a string with a length > 0.  If so, then we need to renew our lease. If not, then we need to try and acquire the lease. I then change the intervals on two System.Threading.Timer objects (we’ll set them up next), renewalTimer and pollingTimer. Both are private variables of our class.

If we have control, then the renewal timer will be set to fire again in 45 seconds(15 seconds before our lease expires), and continue to fire every 45 seconds after that. If we don’t have control, renewal will stop checking. pollingTimer works in reverse, polling if we don’t have a lease, and stopping when we do. I’m using two separate timers because the renewal timer needs to fire every minute if I’m to keep control. But the process that’s leveraging may want to control the interval at which we poll for control, so I want that on a separate timer.

Now lets start our traffic cop:

public void Start(TimeSpan pollingInterval)
{
    if (this.IsRunning)
        throw new InvalidOperationException("This traffic cop is already active. You must call 'stop' first.");

    this.IsRunning = true;

    myPollingInterval = pollingInterval;

    System.Threading.TimerCallback TimerDelegate = new System.Threading.TimerCallback(TimerTask);

    // start polling immediately for control
    pollingTimer = new System.Threading.Timer(TimerDelegate, null, TimeSpan.FromMilliseconds(0), myPollingInterval);
    // don't do any renewal polling
    renewalTimer = new System.Threading.Timer(TimerDelegate, null, TimeSpan.FromMilliseconds(-1), TimeSpan.FromSeconds(45));
}

We do a quick check to make sure we’re not already running, then set a flag to say we are (just a private boolean flag). I save off the control polling interval that was passed in and set up a TimerDelegate using the TimerTask method we set up a moment before. Now it’s just a matter of creating our Timers.

The polling timer will start immediately and fire again at the interval the calling process set. The renewal timer, since we’re just starting out attempts to get control, will not start, but will be set up to check every 45 seconds so we’re ready to renew the lease once we get it.

When we call the start method, it essentially causes our polling timer to fire immediately (asyncronously). So when TaskTimer is executed by that timer, HasControl will be false and we’ll try to get a lease. If we succeed, the polling timer will be stopped and the renewal timer will be activated.

Now to stop traffic:

public void Stop()
{
    // stop lease renewal
    if (renewalTimer != null)
        renewalTimer.Change(TimeSpan.FromMilliseconds(-1), TimeSpan.FromSeconds(45));
    // start polling for new lease
    if (pollingTimer != null)
        pollingTimer.Change(TimeSpan.FromMilliseconds(-1), myPollingInterval);

    // release a lease if we have one
    if (this.HasControl)
        ReleaseLease();

    this.IsRunning = false;
}

We’ll stop and dispose of both timers,  release any locks we have, and then reset our boolean “IsRunning” flag.

And that’s the basics of our TrafficCop class. Now for implementation….

Controlling the flow of traffic

Now the point of this is to give us a way to control when completely unrelated processes can perform an action. So let’s flip over to the WorkerRole.cs file and put some code to leverage the traffic copy into its Run method. We’ll start by creating an instance of the CloudPageBlog object that will be our lockable object and passed into our TrafficCop class.

var account = CloudStorageAccount.FromConfigurationSetting("TrafficStorage");

// create blob client
CloudBlobClient blobStorage = account.CreateCloudBlobClient();
CloudBlobContainer container = blobStorage.GetContainerReference("trafficcopdemo");
container.CreateIfNotExist(); // adding this for safety

// use a page blog, if its empty, there's no storage costs
CloudPageBlob pageBlob = container.GetPageBlobReference("singleton");

This creates an object, but doesn’t actually create the blob. I made the conscious decision to go this route and keep any need for the TrafficCop class to have to directly manage storage credentials or the container out of things. Your individual needs may vary. The nice thing is that once this is done, starting the cop is a VERY simple process:

myTrafficCop = new BlobTrafficCop(pageBlob);
myTrafficCop.Start(TimeSpan.FromSeconds(60));

So this will tell the copy to use a blob called “singleton” in the blob container “trafficcopdemo” as our controlling process and to check for control every 30 seconds. But that’s not really interesting. If we ran this with two instances, what we’d see is that one instance would get control and keep it until something went wrong with getting the lease. So I want to alter the infinite loop of this worker role so  I can see the cop is doing its job and also that I can pass control back and forth.

So I’m going to alter the default loop so that it will sleep for 15 seconds every loop and each time through will write a message to the console that it either does or does not have control. Finally, I’ll use a counter so that if an instance has control, it will only keep control for 75 seconds then release it.

int controlcount = 0;
while (true)
{
    if (!myTrafficCop.IsRunning)
        myTrafficCop.Start(TimeSpan.FromSeconds(30));

    if (myTrafficCop.HasControl)
    {
        Trace.WriteLine(string.Format("Have Control: {0}", controlcount.ToString()), "TRAFFICCOP");
        controlcount++;
    }
    else
        Trace.WriteLine("Don't Have Control", "TRAFFICCOP");

    if (controlcount >= 4)
    {
        myTrafficCop.Stop();
        controlcount = 0;
        Thread.Sleep(TimeSpan.FromSeconds(15));
    }

    Thread.Sleep(TimeSpan.FromSeconds(15));
}

Not the prettiest code I’ve ever written, but it gets the job done.

Looking at the results

So to see the demo at work, we’re going to increase the instance count to 2, and I’m also going to disable diagnostics. Enabling diagnostics will just cause some extra messages in the console output that I want to avoid. Otherwise, you can leave it in there. Once that’s done, it’s just a matter of setting up the TrafficStorage configuration setting to point at a storage account and pressing F5 to run the demo. If everything goes well, the role should deploy, and we can see both instances running in the Windows Azure Compute Emulator UI (check the little blue flag in the tool tray to view the UI).

If everything is working as intended, you’ll see output sort of like this:

image

Notice that the role is going back and forth with having control, just as we’d hoped. You may also note that the first message was that we didn’t have control. This is because our attempts to get control is happening asynchronously in a separate thread. Now you can change that if you need to, but in out case this isn’t necessary. I just wanted to point it out.

Now as I mentioned, this is just a mini-pattern. So for my next post I hope to wrap this in another class that demonstrates the self-electing controller. Again leveraging async processes to execute something for our role instance in a separate thread. But done so in a way where we don’t need to monitor and manage what’s happening ourselves. Meanwhile, I’ve uploaded the code. So please make use of it.

Until next time!

About these ads

One Response to The “traffic cop” pattern

  1. smarx says:

    See also http://blog.smarx.com/posts/managing-concurrency-in-windows-azure-with-leases. BTW, a zero-byte block blob should cost about the same amount as a zero-byte page blob. I typically just use block blobs for this.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,129 other followers

%d bloggers like this: