Local File Cache in Windows Azure

 

When creating a traditional on-premise application, it’s not uncommon to leverage the local file system as a place to store temporary files and thus increase system performance. But with Windows Azure Cloud Services, we’ve been taught that we shouldn’t write things to disk because the virtual machines that host our services aren’t durable. So we start going to remote durable storage for everything. This slows down our applications so we need to add back in some type of cache solution.

Previously, I discussed using the Windows Azure Caching Preview to create a distributed, in-memory cache. I love that we finally have a simple way to do to this. But there are times when I think that caching something, for example an image file that doesn’t change often, within a single instance would be fine, especially if I don’t have to use up precious RAM on my virtual machines.

Well there is an option! Windows Azure Cloud Services all include, at no additional cost, an allocation of non-durable local disk space called surprisingly enough “Local Storage”. For each core you get 250gb of essentially temporary disk space. And with a bit of investment, we can leverage that space as a local, file backed cache.

Extending System.Runtime.Caching

So .NET 4.0 introduced the System.Runtime.Caching namespace along with a template base class ObjectCache that can be extended to provide caching functionality with whatever storage system we want to use. Now this namespace also provides a concrete implementation called MemoryCache, but we want to use the file system. So we’ll create our own implementation called FileCache class.

Note: There’s already a codeplex project that provides a file based implementation of ObjectCache. But I still wanted to role my own for the sake of explaining some of the challenges that will arise.

So I create a class library and add a reference to System.Runtime.Caching. Next up, let’s rename the default class “Class1.cs” to “FileCache.cs”. Lastly, inside of the FileCache class, I’ll add a using statement for the Caching namespace and make sure my new class inherits from ObjectCache.

Now if we try to build the class library now, things wouldn’t go very well because there are 18 different abstract members we need to implement. Fortunately I’m running the Visual Studio Power Tools so it’s just a matter of right-clicking on ObjectCache where I indicated I’m inheriting from it and selecting the “Implement Abstract Class”. This gives us shells for all 18 abstract members, but until we add some real implementation in, our FileCache class won’t even be minimally useful.

I’ll start by fleshing out the Get method and adding a public property, CacheRootPath, to the class that designates where our file cache will be kept.

public string CacheRootPath
{
    get { return cacheRoot.FullName; }
    set
    {
        cacheRoot = new DirectoryInfo(value);
        if (!cacheRoot.Exists) // create if it doesn't exist
            cacheRoot.Create();
    }
}

public override bool Contains(string key, string regionName = null)
{
    string fullFileName = GetItemFileName(key,regionName);
    FileInfo fileInfo = null;

    if (File.Exists(fullFileName))
    {
        fileInfo = new FileInfo(fullFileName);

        // if item has expired, don't return it
        //TODO: 
        return true;
    }
    else
        return false;
}

// return type is an object, but we'll always return a stream
public override object Get(string key, string regionName = null)
{
    if (Contains(key, regionName))
    {
        //TODO: wrap this in some exception handling
        MemoryStream memStream = new MemoryStream();
        FileStream fileStream = new FileStream(GetItemFileName(key, regionName), FileMode.Open);
        fileStream.CopyTo(memStream);
        fileStream.Close();

        return memStream;
    }
    else
        return null;
}

CacheRootPath is just a way for us to set the path to where our cache will be stored. The Contains method is a way to check and see if the file exists in the cache (and ideally should also be where we check to make sure the object isn’t expired), and the Get method leverages Contains to see if the item exists in the cache and retrieves it if it exists.

Now this is where I had my fist real decision to make. Get must return an object, but what type of object should I return. In my case I opted to return a memory stream.  Now I could have returned a file stream that was attached to the file on disk, but because this could lock access to file, I wanted to have explicit control of that stream. Hence I opted to copy the file stream to a memory stream and return that to the caller.

You may also note that I left the expiration check alone. I did this for the demo because your needs for file expiration may differ. You could base this on FileInfo.CreationTimeUTC, or FileInfo.LastAccessTimeUTC. both are valid as may be any other meta data you need to base it on. I do recommend one thing, make a separate method that does the expiration check. We will use it later.

Note: I’m specifically calling out the use of UTC. When in Windows Azure, UTC is your friend. Try to use it whenever possible.

Next up, we have to shell out the three overloaded versions of AddOrGetExisting. These methods are important because even though I won’t be directly accessing them in my implementation, they are leveraged by base cass Add method. And thus, these methods are how we add items into the cache. The first two overloaded methods will call the lowest level implementation.

public override object AddOrGetExisting(string key, object value, CacheItemPolicy policy, string regionName = null)
{
    if (!(value is Stream))
        throw new ArgumentException("value parameter is not of type Stream");

    return this.AddOrGetExisting(key, value, policy.AbsoluteExpiration, regionName);
}

public override CacheItem AddOrGetExisting(CacheItem value, CacheItemPolicy policy)
{
    var tmpValue = this.AddOrGetExisting(value.Key, value.Value, policy.AbsoluteExpiration, value.RegionName);
    if (tmpValue != null)
        return new CacheItem(value.Key, (Stream)tmpValue);
    else
        return null;
}

The key item to note here is that in the first method, I do a check on the object to make sure I’m receiving a stream. Again, that was my design choice since I want to deal with the streams.

The final overload is where all the heavy work is…

public override object AddOrGetExisting(string key, object value, DateTimeOffset absoluteExpiration, string regionName = null)
{
    if (!(value is Stream))
        throw new ArgumentException("value parameter is not of type Stream");

    // if object exists, get it
    object tmpValue = this.Get(key, regionName);
    if (tmpValue != null)
        return tmpValue;
    else
    {
        //TODO: wrap this in some exception handling

        // create subfolder for region if it was specified
        if (regionName != null)
            cacheRoot.CreateSubdirectory(regionName);

        // add object to cache
        FileStream fileStream = File.Open(GetItemFileName(key, regionName), FileMode.Create);

        ((Stream)value).CopyTo(fileStream);
        fileStream.Flush();
        fileStream.Close();

        return null; // successfully added
    }
}

We start by checking to see if the object already exists and return it if found in the cache. Then we create a subdirectory if we have a region (region implementation isn’t required). Finally, we copy the value passed in to our file and save it. There really should be some exception handling in here to make sure we’re handling things in a way that’s a little more thread save (what if the file gets created between when we check for it and start the write). And the get should be checking to make sure the file isn’t already open when doing its read. But I’m sure you can finish that out.

Now there’s still about a dozen other methods that need to be fleshed out eventually. But these give us our basic get and add functions. What’s still missing is handling evictions from the cache. For that we’re going to use a timer.

public FileCache() : base()
{
    System.Threading.TimerCallback TimerDelegate = new System.Threading.TimerCallback(TimerTask);

    // time values should be based on polling interval
    timerItem = new System.Threading.Timer(TimerDelegate, null, 2000, 2000);
}

private void TimerTask(object StateObj)
{
    int a = 1;
    // check file system for size and if over, remove older objects

    //TODO: check polling interval and update timer if its changed
}

We’ll update the FileCache constructor to create a delegate using our new TimerTask method, and pass that into a Timer object. This will execute the TimeTask method and regular intervals in a separate thread. I’m using a hard-coded value, but we really should check to see we have a specific polling interval set. Course we should also put some code into this method so it actually does things like check to see how much room we have in the cache and evict expired items(by checking via the private method I suggested earlier), etc…

The Implementation

With our custom caching class done (well not done but at least to a point where its minimally functional), its time to implement it. For this, I opted to setup an MVC Web Role that allows folks to upload an image file to Windows Azure Blob storage. Then, via a WCF/REST based service, it would retrieve the images twice. The first retrieval would be without using caching, the second would be with caching. I won’t bore you with all the details of this setup, so we’ll focus on just the wiring up of our custom FileCache.

We start appropriately enough with the role’s Global.asax.cs file where we add public property that represents out cache (so its available anywhere in the web application):

public static Caching.FileCache globalFileCache = new Caching.FileCache();

And then I update the Application_Start method to retrieve our LocalResource setting and use it to set the CacheRootPath property of our caching object.

protected void Application_Start()
{
    AreaRegistration.RegisterAllAreas();

    RegisterGlobalFilters(GlobalFilters.Filters);
    RegisterRoutes(RouteTable.Routes);

    Microsoft.WindowsAzure.CloudStorageAccount.SetConfigurationSettingPublisher(
        (configName, configSetter) =>
            configSetter(RoleEnvironment.GetConfigurationSettingValue(configName))
    );

    globalFileCache.CacheRootPath = RoleEnvironment.GetLocalResource("filecache").RootPath;
}

Now ideally we could make it so that the CacheRootPath instead accepted the LocalResource object returned by GetLocalResource. This would then also mean that our FileCache could easily manage against the maximum size of the local storage resource. But I figured we’d keep any Windows Azure specific dependencies out of this base class and maybe later look at creating a WindowsAzureLocalResourceCache object. But that’s a task for another day.

Ok, now to wire up the cache into the service that will retrieve the blobs. Lets start with the basic implementation:

public Stream GetImage(string Name, string container, bool useCache)
{
    Stream tmpStream = null; // could end up being a filestream or a memory stream

    var account = CloudStorageAccount.FromConfigurationSetting("ImageStorage"); 
    CloudBlobClient blobStorage = account.CreateCloudBlobClient();
    CloudBlob blob = blobStorage.GetBlobReference(string.Format(@"{0}/{1}", container, Name));
    tmpStream = new MemoryStream();
    blob.DownloadToStream(tmpStream);

    WebOperationContext.Current.OutgoingResponse.ContentType = "image/jpeg";
    tmpStream.Seek(0, 0); // make sure we start the beginning
    return tmpStream;
}

This method takes the name of a blob and its container, as well as a useCache parameter (which we’ll implement in a moment). It uses the first two values to get the blob and download it to a stream which is then returned to the caller with a content type of “image/jpeg” so it can be rendered by the browser properly.

To implement our cache we just need to add a few things. Before we try to set up the CloudStorageAccount, we’ll add these lines:

// if we're using the cache, lets try to get the file from there
if (useCache)
    tmpStream = (Stream)MvcApplication.globalFileCache.Get(Name);

if (tmpStream == null)
{

This code tries to use the globalFileCache object we defined n the Global.asax.cs file and retrieve the blob from the cache if it exists, providing we told the method useCache=true. If we couldn’t find the file (tmpStream == null), we’ll then fall into the block we had previously that will retrieve the blob image and return it.

But we still have to add in the code to add the blob to the cache. We’ll do right after we DownloadToStream:

    // "fork off" the adding of the object to the cache so we don't have to wait for this
    Task tsk = Task.Factory.StartNew(() =>
    {
        Stream saveStream = new MemoryStream();
        blob.DownloadToStream(saveStream);
        saveStream.Seek(0, 0); // make sure we start the beginning
        MvcApplication.globalFileCache.Add(Name, saveStream, new DateTimeOffset(DateTime.Now.AddHours(1)));
    });
}

This uses an async task to add the blob to the cache. We do this with asynchronously so that we don’t block returning the blob back to the requestor while the write to disk completes. We want this service to return the file back as quickly as possible.

And that does it for our implementation. Now to testing it.

Fiddler is your friend

Earlier, you may have found yourself saying “self, why did he use a service for his implementation”. I did this because I wanted to use Fiddler to measure the performance of calls to retrieve the blob with and without caching. And by putting it in a service and letting fiddler monitor the response times, I didn’t have to write up my own client and put timings around it.

To test my implementation, I fired up fiddler and then launched the service. We should see calls in Fiddler to SimpleService.svc/GetImage, one with cache=false and one with cache=true. If we select those items, and select the Statistics tab, we should see some significant differences in the “Overall Elapsed” times of each call. In my little tests, I was seeing anywhere from a 50-90% reduction in the elapsed time.

image

In fact, if you run the tests several times by hitting refresh on the page, you may even notice that the first time you hit Windows Azure storage for a particular blob, you may have additional delay compare to subsequent calls. Its only a guess but we may be seeing Windows Azure storage doing some of its own internal caching there.

So hopefully I’ve described things well enough here and you can follow what we’ve done. But if not, I’m posting the code for you to reuse. Just make sure you update the storage account settings and please please please finish the half started implementation I’m providing you.

Here’s to speedy responses thanks to caching. Until next time.

3 Responses to Local File Cache in Windows Azure

  1. Pingback: Windows Azure and Cloud Computing Posts for 8/2/2012+ - Windows Azure Blog

  2. You code is unavailable to download…:( Maybe the container is private.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.