Page, after page, after page (Year of Azure Week 8)

To view the live demo of this solution, check out Episode 57 of Microsoft’s Channel  9 Cloud Cover Show.

Last week, I blogged on writing page blobs one page at a time. I also said that I’d talk more this week about why I was doing this. So here we are with the second half of the demo, the receiver. Its going to download the page blob as its being uploaded.

We’ll skip over setting up the CloudStorageAcount, CloudBlobClient, an CloudBlobContainer (btw, I really need to write a reusable method that streamlines all this for me). This works exactly as it did for the transmitter part of our solution.

The first thing we need to do is pull a list of blobs and iterate through them.. To do this we create a foreach loop using the following line of code:

foreach (CloudPageBlob pageBlob in container.ListBlobs().OfType<CloudPageBlob>())

Note the “linqy” OfType part. My buddy Neil Mackenzie shared this tip with me via his new Azure Development Cookbook. It allows me to make sure I’m only retrieving page blobs from storage. A nice trick to help ensure I don’t accidently throw an exception by trying to treat a block blob like a page blob.

Quick product plug… I highly recommend Neil’s book. Not because I helped edit it, but because Neil did an excellent job writing it. There’s a SLEW of great tips and tricks contained in its pages.

Ok, moving on…

Now I need to get the size metadata tag I added to the blob in the transmitter. While the line above does get me a reference to the page blob, I didn’t populate the metadata property. To get those values, I need to call pageBlob.FetchAttibute. I follow this up by creating a save name for the file and associating it with a file stream.

pageBlob.FetchAttributes(); // have to get the metadata
long totalBytesToWrite = int.Parse(pageBlob.Metadata[“size”].ToString());

//string fileName = string.Format(@”D:\Personal Projects\{0}”, Path.GetFileName(pageBlob.Attributes.Uri.LocalPath));
string fileName = Path.GetFileName(pageBlob.Attributes.Uri.LocalPath);
FileStream theFileStream = new FileStream(fileName, FileMode.OpenOrCreate);

Now we’re ready to start receiving the data from the blobs populated pages. We use GetPageRanges to see where the blob has data, we check that against the last endpoint we read, and we sleep for 1 sec if we’re already read all the available information (waiting for more pages to be written). And we’ll keep doing that until we’ve written the total size of the blob.

long lastread = 0; // last byte read
while (totalBytesToWrite > 0)
{
foreach (PageRange pageRange in pageBlob.GetPageRanges())
{
// if we have more to write…
if (pageRange.EndOffset > lastread)
{
// hidden region to write pages to file
}
}
Thread.Sleep(1000);  // wait for more stuff to writ
}

Ok,there’s a couple things I need to call out here. My sample assumes that the pages in the blob will be written in succession. It also assumes that we’re only going to write the blobs that exist when my application started (I only list the blobs in the container once). So if blobs get added after we have retrieved our list, or we restart the receiver, we will see some odd results. So what I’m doing is STRICTLY for demonstration purposes. We’ll talk more about that later in this post.

The last big chunk of code is associating the blob with a BlobStream and writing it to a file. We do this again, one page at a time…

BlobStream blobStream = pageBlob.OpenRead();

// Seek to the correct starting offset in the page blob stream
blobStream.Seek(lastread, SeekOrigin.Begin);

byte[] streambuffer = new byte[512];

long numBytesToRead = (pageRange.EndOffset + 1 – lastread);
while (numBytesToRead > 0)
{
int n = blobStream.Read(streambuffer, 0, streambuffer.Length);
if (n == 0)
break;

numBytesToRead -= n;
int bytesToWrite = (int)(totalBytesToWrite – n > 0 ? n : totalBytesToWrite);
lastread += n;
totalBytesToWrite -= bytesToWrite;

theFileStream.Write(streambuffer, 0, bytesToWrite);
theFileStream.Flush(); // just for demo purposes
}

You’ll notice that I’m using a stream to read the blob and not retrieving the individual pages. If I wanted to do that, I’d need to go to the Azure Storage REST API which allows me to get a specific range of bytes from a blob using the Get Blob function. And while that’s fine, I can also demonstrate what I’m after using the stream. And since we’ve already established that I’m a bit of a lazy coder, we’ll just use the managed client. Smile

The rest of this code snippet consists of some fairly ugly counter/position management code that handles the writing of the blob to the file. The most important part of this is that we use bytesToWrite to decide if we write the entire 512 byte buffer, or only just as much data as remains in our blob. This is where my “size” attribute comes in. I’ve used that to determine when the file stored in the series of 512 byte blocks actually has ended. Some files may be forgiving of the extra bytes, but some aren’t. So if you’re using page blobs, you may need to make sure you manage this.

So why are we doing all this?

So if you put a breakpoint on the transmitter app, and write 3-4 pages, then put a breakpoint in the receiver app, you’ll see that it will read those pages, then keep hitting the Sleep command until we go back to the transmitter and write a few more pages. What we’re illustrating here is that unlike a block blob, I can actually read a page blob while it is being written.

You can imagine that this could come in handy if you need to push large files around, basically using page blobs as an intermediary buffer for streaming of files between two endpoints. And after a bit more work and we can start adding restart semantics to this demo.

Now my demo just shows us going in a sequential order through the blob (this is the “STRICLY for demonstration” thing I mentioned above). If we start thinking that our buffers don’t have to be 512 bytes but can instead be up to 4mb, and a 4mb operation against Azure storage may take a few seconds to process, we start thinking about maybe multi-threading the upload/download of the file, potentially realizing a huge increase in throughput while also avoiding delays that would result in me having to wait until the upload completes before starting the download.

So the end result here is that my demo has little practical application. But I hope what it has done is made you think  bit about the oft overlooked page blob. I’m just as guilty as you for this oversight. So in closing, I want to thank Sushma, one of my Azure trainees this past two weeks. Shushma, if you read this, know that your simple question helped teach me new respect for page blobs. And for that… thank you!

BTW, the complete code for this example is available here for anyone that wants it. Just remember to clear the blobs between runs. Smile

Until next time!

One Response to Page, after page, after page (Year of Azure Week 8)

  1. Pingback: Windows Azure and Cloud Computing Posts for 8/26/2011+ - Windows Azure Blog

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.