Page, after page, after page (Year of Azure Week 8)

To view the live demo of this solution, check out Episode 57 of Microsoft’s Channel  9 Cloud Cover Show.

Last week, I blogged on writing page blobs one page at a time. I also said that I’d talk more this week about why I was doing this. So here we are with the second half of the demo, the receiver. Its going to download the page blob as its being uploaded.

We’ll skip over setting up the CloudStorageAcount, CloudBlobClient, an CloudBlobContainer (btw, I really need to write a reusable method that streamlines all this for me). This works exactly as it did for the transmitter part of our solution.

The first thing we need to do is pull a list of blobs and iterate through them.. To do this we create a foreach loop using the following line of code:

foreach (CloudPageBlob pageBlob in container.ListBlobs().OfType<CloudPageBlob>())

Note the “linqy” OfType part. My buddy Neil Mackenzie shared this tip with me via his new Azure Development Cookbook. It allows me to make sure I’m only retrieving page blobs from storage. A nice trick to help ensure I don’t accidently throw an exception by trying to treat a block blob like a page blob.

Quick product plug… I highly recommend Neil’s book. Not because I helped edit it, but because Neil did an excellent job writing it. There’s a SLEW of great tips and tricks contained in its pages.

Ok, moving on…

Now I need to get the size metadata tag I added to the blob in the transmitter. While the line above does get me a reference to the page blob, I didn’t populate the metadata property. To get those values, I need to call pageBlob.FetchAttibute. I follow this up by creating a save name for the file and associating it with a file stream.

pageBlob.FetchAttributes(); // have to get the metadata
long totalBytesToWrite = int.Parse(pageBlob.Metadata["size"].ToString());

//string fileName = string.Format(@”D:\Personal Projects\{0}”, Path.GetFileName(pageBlob.Attributes.Uri.LocalPath));
string fileName = Path.GetFileName(pageBlob.Attributes.Uri.LocalPath);
FileStream theFileStream = new FileStream(fileName, FileMode.OpenOrCreate);

Now we’re ready to start receiving the data from the blobs populated pages. We use GetPageRanges to see where the blob has data, we check that against the last endpoint we read, and we sleep for 1 sec if we’re already read all the available information (waiting for more pages to be written). And we’ll keep doing that until we’ve written the total size of the blob.

long lastread = 0; // last byte read
while (totalBytesToWrite > 0)
{
foreach (PageRange pageRange in pageBlob.GetPageRanges())
{
// if we have more to write…
if (pageRange.EndOffset > lastread)
{
// hidden region to write pages to file
}
}
Thread.Sleep(1000);  // wait for more stuff to writ
}

Ok,there’s a couple things I need to call out here. My sample assumes that the pages in the blob will be written in succession. It also assumes that we’re only going to write the blobs that exist when my application started (I only list the blobs in the container once). So if blobs get added after we have retrieved our list, or we restart the receiver, we will see some odd results. So what I’m doing is STRICTLY for demonstration purposes. We’ll talk more about that later in this post.

The last big chunk of code is associating the blob with a BlobStream and writing it to a file. We do this again, one page at a time…

BlobStream blobStream = pageBlob.OpenRead();

// Seek to the correct starting offset in the page blob stream
blobStream.Seek(lastread, SeekOrigin.Begin);

byte[] streambuffer = new byte[512];

long numBytesToRead = (pageRange.EndOffset + 1 – lastread);
while (numBytesToRead > 0)
{
int n = blobStream.Read(streambuffer, 0, streambuffer.Length);
if (n == 0)
break;

numBytesToRead -= n;
int bytesToWrite = (int)(totalBytesToWrite – n > 0 ? n : totalBytesToWrite);
lastread += n;
totalBytesToWrite -= bytesToWrite;

theFileStream.Write(streambuffer, 0, bytesToWrite);
theFileStream.Flush(); // just for demo purposes
}

You’ll notice that I’m using a stream to read the blob and not retrieving the individual pages. If I wanted to do that, I’d need to go to the Azure Storage REST API which allows me to get a specific range of bytes from a blob using the Get Blob function. And while that’s fine, I can also demonstrate what I’m after using the stream. And since we’ve already established that I’m a bit of a lazy coder, we’ll just use the managed client. Smile

The rest of this code snippet consists of some fairly ugly counter/position management code that handles the writing of the blob to the file. The most important part of this is that we use bytesToWrite to decide if we write the entire 512 byte buffer, or only just as much data as remains in our blob. This is where my “size” attribute comes in. I’ve used that to determine when the file stored in the series of 512 byte blocks actually has ended. Some files may be forgiving of the extra bytes, but some aren’t. So if you’re using page blobs, you may need to make sure you manage this.

So why are we doing all this?

So if you put a breakpoint on the transmitter app, and write 3-4 pages, then put a breakpoint in the receiver app, you’ll see that it will read those pages, then keep hitting the Sleep command until we go back to the transmitter and write a few more pages. What we’re illustrating here is that unlike a block blob, I can actually read a page blob while it is being written.

You can imagine that this could come in handy if you need to push large files around, basically using page blobs as an intermediary buffer for streaming of files between two endpoints. And after a bit more work and we can start adding restart semantics to this demo.

Now my demo just shows us going in a sequential order through the blob (this is the “STRICLY for demonstration” thing I mentioned above). If we start thinking that our buffers don’t have to be 512 bytes but can instead be up to 4mb, and a 4mb operation against Azure storage may take a few seconds to process, we start thinking about maybe multi-threading the upload/download of the file, potentially realizing a huge increase in throughput while also avoiding delays that would result in me having to wait until the upload completes before starting the download.

So the end result here is that my demo has little practical application. But I hope what it has done is made you think  bit about the oft overlooked page blob. I’m just as guilty as you for this oversight. So in closing, I want to thank Sushma, one of my Azure trainees this past two weeks. Shushma, if you read this, know that your simple question helped teach me new respect for page blobs. And for that… thank you!

BTW, the complete code for this example is available here for anyone that wants it. Just remember to clear the blobs between runs. Smile

Until next time!

A page at a time (Page Blobs–Year of Azure Week 7)

Going to make this quick. I’m sitting in SeaTac airport in Seattle enjoying the free wifi and need to knock this out quickly as we’re going to start boarding in about 10-15 minutes. I’ve been here in Seattle doing some azure training and working with a client that’s trying to move to Azure when my topic for this week fell in my lap.

One of the attendees at my training wanted to see an example of doing page blobs. I poked around and couldn’t find one that I liked to I decided I’d come up with one. Then, later in the afternoon we had a discussion about an aspect of their client and the idea of the random access abilities of page blobs came to mind. So while I haven’t had a chance to fully prove our my idea yet, I do want to share the first part of it with you.

The sample below focus’s on how to take a stream, divide it into chunks, and write those to an Azure Storage page blob. Now, in the same I keep each write to storage at 512 bytes (the size of a page), but you could use any multiple of 512. I just wanted to be able to demonstrate the chunk/reassemble process.

We start off by setting up the account, and creating a file stream that we’ll write to Azure blob storage:

MemoryStream streams = new MemoryStream();// create storage account
var account = CloudStorageAccount.DevelopmentStorageAccount;
// create blob client
CloudBlobClient blobStorage = account.CreateCloudBlobClient();CloudBlobContainer container = blobStorage.GetContainerReference(“guestbookpics”);
container.CreateIfNotExist(); // adding this for safetystring uniqueBlobName = string.Format(“image_{0}.jpg”, Guid.NewGuid().ToString());

System.Drawing.Image imgs = System.Drawing.Image.FromFile(“waLogo.jpg”);

imgs.Save(streams, ImageFormat.Jpeg);

You may remember this code from the block blob samples I did a month or two back.

Next up, I need to create the page blob:

CloudPageBlob pageBlob = container.GetPageBlobReference(uniqueBlobName);
pageBlob.Properties.ContentType = “image\\jpeg”;
pageBlob.Metadata.Add(“size”, streams.Length.ToString());
pageBlob.Create(23552);

Notice that I’m setting it to a fixed size. This isn’t ideal, but in my case I know exactly what size the file I’m uploading is and this is about twice what I need. We’ll get to why I’ve done that later. The important part is that the size MUST be a multiple of 512. No partial pages allowed!

And finally, we write start reading my file stream into a byte array buffer, convert that buffer into a memory stream (I know there’s got to be a way to avoid this but I was in a hurry to write the code for this update), and writing each “page” to the page blob.

streams.Seek(0, SeekOrigin.Begin);
byte[] streambuffer = new byte[512];int numBytesToRead = (int)streams.Length;
int numBytesRead = 0;
while (numBytesToRead > 0)
{
// Read may return anything from 0 to 10.
int n = streams.Read(streambuffer, 0, streambuffer.Length);
// The end of the file is reached.
if (n == 0)
break;MemoryStream theMemStream = new MemoryStream();
theMemStream.Write(streambuffer, 0, streambuffer.Length);
theMemStream.Position = 0;
pageBlob.WritePages(theMemStream, numBytesRead);numBytesRead += n;
numBytesToRead -= n;
}

Simple enough, and it works pretty well to boot! You’ll also notice that I’m doing this one 512 byte page at a time. This is just for demonstration purposes as the maximum size you can write (based on the REST API documentation) is 4mb. But as part of my larger experiment, the one page at a time method means I can use smaller sample files. Smile

The one piece we’re missing however is the ability to shrink the page blob down to the actual minimum size I need. For that, we’re going to use the code snippet below:

Uri requestUri = pageBlob.Uri;
if (blobStorage.Credentials.NeedsTransformUri)
requestUri = new Uri(blobStorage.Credentials.TransformUri(requestUri.ToString()));HttpWebRequest request = BlobRequest.SetProperties(requestUri, 200,
pageBlob.Properties, null, 12288
);blobStorage.Credentials.SignRequest(request);
using (WebResponse response = request.GetResponse())
{
// call succeeded
};

You’ll notice this is being done via a REST request directly to blob storage, resizing a blob isn’t supported via the storage client. I also need to give credit for this last snippet to the Azure Storage Team.

As I mentioned, I’m in a hurry and wanted to get this out before boarding. So you’ll need to wait until next week to see why I’m playing with this and hopefully the potential may excite you. Until then, I’ll try to refine the code a bit and get the entire solution posted online for you.

Until next time!

Introduction to Azure Storage Analytics (YOA Week 6)

Since my update on Storage Analytics last week was so short, I really wanted to dive back into it this. And fortunately, its new enough that there was some new ground to tread here. While is great because I hate just putting up another blog post that doesn’t really add anything new.

Steve Marx posted his sample app last week and gave us a couple nice methods for updating the storage analytics settings. The Azure Storage team did two solid updates on working with both the Metrics and Logging. However, neither of them dove deep into working with the API I wanted more meat on how to do exactly this. By digging through Steve’s code and the MSDN documentation on the API, I can hopefully shed some additional light on this.

Storage Service Properties (aka enabling logging and metrics)

So the first step is turning this on. Well, actually its understanding what we’re turning on and why, but we’ll get to that in a few. Steve posted on his blog a sample ‘Save’ method. This is a implementation of the Azure Storage Analytics API’s “Set Storage Service Properties” call. However, the key to that method is an XML document that contains the analytics settings. It looks something like this:

<?xml version="1.0" encoding="utf-8"?>
<StorageServiceProperties>
  <Logging>
    <Version>version-number</Version>
    <Delete>true|false</Delete>
    <Read>true|false</Read>
    <Write>true|false</Write>
    <RetentionPolicy>
      <Enabled>true|false</Enabled>
      <Days>number-of-days</Days>
    </RetentionPolicy>
  </Logging>
  <Metrics>
    <Version>version-number</Version>
    <Enabled>true|false</Enabled>
    <IncludeAPIs>true|false</IncludeAPIs>
    <RetentionPolicy>
      <Enabled>true|false</Enabled>
      <Days>number-of-days</Days>
    </RetentionPolicy>
  </Metrics>
</StorageServiceProperties>

Cool stuff, but what does it mean. Well fortunately, its all explained in the API documentation. Also fortunately, I won’t make you click a link to look at it. I’m nice that way.

Version – the service version / interface number to help with service versioning later one, just use “1.0” for now.

Logging->Read/Write/Delete – these nodes determine if we’re going to log reads, writes, or deletes. So you can get just the granularity of logging you want.

Metrics->Enabled – turn metrics capture on/off

Metrics->IncludeAPIs – set to true if you want to include capture of statistics for your API operations (like saving/updating analytics settings). At least I think it is, I’m still playing/researching this one.

RetentionPolicy – Use this to enabled/disable a retention policy and set the number of days to retain information for. Now without setting a policy, data will be retained FOREVER, or at least until your 20TB limit is reached. So I recommend you set a policy and leave it on at all times. The maximum value you can set is 365. To learn more about the retention policies, check out the MSDN article on them.

Setting Service Properties

Now Steve did a slick little piece of code, but given that I’m not what I’d call “MVC fluent” (I’ve been spending too much time doing middle/backend services I guess), I took a bit of deciphering, at least for me, to figure out what was happening. And I’ve done low level Azure Storage REST operations before. So I figured I’d take a few minutes to explain what was happening in his “Save” method.

First off, Steve setup the HTTP request we’re going to send to Azure Storage:

var creds = new StorageCredentialsAccountAndKey(Request.Cookies["AccountName"].Value, Request.Cookies["AccountKey"].Value);
var req = (HttpWebRequest)WebRequest.Create(string.Format("http://{0}.{1}.core.windows.net/?restype=service&comp=properties", creds.AccountName, service));
req.Method = "PUT";
req.Headers["x-ms-version"] = "2009-09-19";

 

So this code snags the Azure Storage account credentials from the cookies (where it was stored when you entered it). They are then used it to generate an HttpWebRequest object using the account name, and the service (blob/table/queue) that we want to update the settings for. Lastly, we set a method and x-ms-version properties for the request. Note: the service was posted to this method by the javascript on Steve’s MVC based page.

Next up, we need to digitally sign our request using the account credentials and the length of our XML analytics config xml document.

            req.ContentLength = Request.InputStream.Length;
            if (service == "table")
                creds.SignRequestLite(req);
            else
                creds.SignRequest(req);

Now what’s happening here, is that our XML document came to this method via the javascript/AJAX post to our code-behind method via Request.InputStream. We sign the request using the StorageCredentialsAccountAndKey object we created earlier, doing either a SignRequestLite for a call to the Table service, or SignRequest for the blob or queue service.

Next up, we need to copy our XML configuration settings to our request object…

            using (var stream = req.GetRequestStream())
            {
                Request.InputStream.CopyTo(stream);
                stream.Close();
            }

 

This chunk of code uses GetRequestStream to get the stream we’ll copy our payload to, copy it over, then close the stream so we’re ready to send the request.

            try
            {
                req.GetResponse();
                return new EmptyResult();
            }
            catch (WebException e)
            {
                Response.StatusCode = 500;
                Response.TrySkipIisCustomErrors = true;
                return Content(new StreamReader(e.Response.GetResponseStream()).ReadToEnd());
            }

Its that first line that we care about. req.GetResponse will send our request to the Azure Storage service. The rest of this snippet is really just about exception handling and returning results back to the AJAX code.

Where to Next

I had hoped to have time this week to create a nice little wrapper around the XML payload so you could just have an Analytics configuration object that you could hand a connection too and set properties on, but I ran out of time (again). I hope to get to it and actually put something our before we get the official update to the StorageClient library. Meanwhile, I think you can see how easy it is to generate your own REST requests to get (which we didn’t cover here) and set (which we did) the Azure Storage Analytics settings.

For more information, be sure to check out Steve Marx’s sample project and the MSDN Storage Analytics API documentation.

Azure Tools for Visual Studio 1.4 August Update–Year of Azure Week 5

Good evening folks. Its 8pm on Friday August 5th, 2011 (aka international beer day) as I write this. Last week’s update to my year of Azure series was weak, but this week’s will be even lighter. Just too much to do and not enough time I’m afraid.

As you can guess from the title of this update, I’d like to talk about the new 1.4 SDK update. Now I could go to great length about all the updates, but given that the Windows Azure team blog already did, and that Wade and Steve already covered it in this week’s cloud cover show. So instead, I’d like to focus on just one aspect of this update, the Azure Storage Analytics.

I can’t tell you all how thrilled I am. The best part of being a Microsoft MVP is all the great people you get to know. The second best part is getting to have an impact in the evolution of a product you’re passionate about. And while I hold no real illusion that anything I’ve said or done has led to the introduction of Azure Storage analytics, I can say its something I (and others) have specifically asked for.

I don’t have enough time this week to write up anything. Fortunately, Steve Marx has already put together the basics on how to interact with it. If that’s not enough, I recommend you go and check out the MSDN documentation on the new Storage Analytics API.

One thing I did run across while reading through the documentation tonight was that the special container that Analytics information gets written to, $Logs, has a 20TB limit. And that this limit is independent of he 100TB limit that is on rest of the storage account. This container is also subject to the being billed for data stored, and read/write actions. However, delete operations are a bit different. If you do it manually, its billable. But if its done as a result of the retention policies you set, its now.

So again, apologies for an extremely week update this week. But I’m going to try and ramp things up and take what Steve did and give you a nice code snippet that you can easily reuse. If possible, I’ll see if I can’t get that cranked out this weekend. Smile

Follow

Get every new post delivered to your Inbox.

Join 1,129 other followers