Azure Storage – Overview
April 9, 2009 Leave a comment
Sorry to have taken so long to get back to my blog. I’ve been distracted with my personal life and admittedly a bit overwhelmed about the directions I could take with my next topic. I could jump straight into a hands on session, contrast Azure Storage with SQL Data Services, discuss the differences between RDBMS and cloud data storage needs… there’s just so much that could be done.
In the end though, settled on breaking this up into a couple of more easily digestible pieces starting with an overview of what Azure Storage is and how its structured. So here it is…
As I believe I’ve mentioned previously, Azure Storage is not a RDBMS as much as its an abstract of the local file system for Windows Azure. Since Windows Azure has abstracted away the hardware, there no more tossing a file out onto a network share so it can be accessed by multiple processes. Instead, using a Storage Account, we can create different containers into which we can shove types of data. Like almost everything in Windows Azure, this storage is exposed as a series of REST (representational state transfer) based API’s.
So lets start there, with the Storage Account. When I started my web role project, I talked about creating a local storage account. We need to do the same thing in Windows Azure by create a Storage Account project. We give it a name and description and end up with a pile of information that seems overwhelming at first: 3 endpoints and two access keys. The endpoints, relate to the 3 types of entities/things/objects that can be stored: Blobs, Queues, and Tables.
Blob Storage is exactly what it sounds like. You can insert large objects up to 50gb in size (2gb for development storage) and organize them using containers. Each container and blob can have metadata associated with it. We can get a list of containers and iterate through the blobs in containers. We can also read the entire blob, or just a range of bytes within it.
Queue Storage is where we have queues, a common enough concept that needs little explanation. Queues provide for a reliable way of passing messages between processes. While a queue can have an unlimited number of messages stored within it, each message is limited to a maximum of 8k in size. When a process reads a message from the queue, it is expected to process and remove it. once read, the message is hidden from any other processes looking at that queue for a given period of time. If the message is not removed before that interval expires, the message will become visible to all processes once again.
And then there is Table Storage. Tables in Azure Storage are not like the tables you’re used too. Tables are logical contains that can be spread across multiple partitions in storage (for load balancing). Tables contain entities (think rows) which are in turn comprised of properties and their values (columns). Each row is identified by its Partition Key and Row Key. However, Table Storage does not enforce any schema, its up to the consuming client to enforce any such rules.
That wasn’t so bad, was it. 🙂 Well, nearly everything looks good from 10,000 feet. But I’ll dig into the details of each of these more in the coming weeks/months. But for now, I believe its important to point out a couple differences between Azure Storage and Development Storage. Development storage is not accessible to any process outside of the local machine. Ok, that’s not entirely true as you can use various port forwarding tools to redirect requests. But even then, Dev storage wasn’t built to scale well so I wouldn’t recommend even trying this. Additionally, while I mentioned that Tables do not require a schema, they do when you’re dealing with Development Storage. You also can’t create/drop tables dynamically in development storage. Lastly, the URI’s for access development storage a fixed, unlike in the cloud.
Now, if you’ve read this and are feeling a bit gutsy, you can jump right into using Azure Storage by accessing the sample StorageClient project that comes with the Azure SDK. Or you can also search around the web and find several good blog articles that discuss accessing storage directly via the API. Understanding this approach, and more importantly thinking about ways to use code generation tools to create more traditional style CRUD layers for accessing it is what I’ll be focusing on in my next blog posting.
I promise there won’t be as much time lag between posts as they was this time. So please check back in the near future when I do another “hands on session”, this time with an Azure Worker Role, Queues, and maybe even a Table.
Until then, I’d like to leave you with links to a couple excellent resources: