Generate Identity / Sequence Values from Azure Storage

by

If you’re developing a business application, you frequently need to use sequential integers to provide friendly IDs for things.  For example, if you are writing a billing and invoicing application, you’ll probably want to create something like “Invoice Number”.  If you’re using SQL Server or Oracle, this is relatively straightforward – you’ll either use an Identity column (@@IDENTITY, SCOPE_IDENTITY()) or, in Oracle, a SEQUENCE.  We’re so used to just having these available and having them just work “automagically” but these structures do quite a bit. 

Here are a handful of requirements for IDENTITYs/SEQUENCEs:

  • Can’t return the same value twice.  The values have to keep getting incremented by 1 and the value never gets re-used.
  • A sequence is shared by all callers to the database and must be thread-safe.  Funky behavior isn’t allowed just because the database is under load. 
  • Fast. Got to be fast.  Get the value.  Return.  Done.  Minimal locking/blocking calls to the sequence from other connections.
  • Persistent.  If the database is shut down or goes down, the value of the sequence doesn’t go back to zero.

Now, if you’re writing an application with Azure Storage (not SQL Azure), you don’t have any comparable functionality at the moment.  If you want to create that “Invoice Number” or “Order Number” how are you going to do this? 

Yah…once you start thinking about it, it’s pretty tricky, huh? 

I put some time in to this and came up with a way to do it using a some utility classes, Azure Storage Queues and a Worker Role.  I wanted to be able to have multiple sequences (for example, Employee sequence, Order sequence, Customer, etc.) so every sequence has a name.  When a piece of code wants a new identity value, it calls in to SequenceFactory, asks for an instance of a Sequence by name, and SequenceFactory returns an instance of a Sequence class.  The Sequence class is basically just a wrapper around an Azure Queue that is filled with sequential integers (1, 2, 3, 4, 5, etc).  Sequence has a method called NextValue() that calls in to it’s Queue and returns the next Int32 value.  Since Azure Queues are thread-safe, this ensures that no one ever get the same Int32 value twice. 

image

Behind the scenes, there’s an Azure Worker Role that knows about all the Queues used by the sequences and is continually checking them to make sure that they are filled up with values.  The Worker Role also is in charge of creating new Sequences and storing the highest integer used by each Sequence so that the queues could be restarted and refilled from scratch if necessary (think RESEED and CHECK_IDENT).  Disclaimer: the current version of this code can only have one instance of the Worker Role running at a time due to a threading issue. 

Here’s a Visual Studio 2010 Activity Diagram that shows the process for some client code getting the next value from a sequence.

image

Here’s a Visual Studio 2010 Activity Diagram that shows how the sequences are managed from the Worker Role.

image

In the sample code, I created a Web Role that provides an ASP.NET page that allows you to create named sequences and get values.

image

Click here to download the code.

 

Enjoy.

-Ben

— Looking for training on Windows Azure & Azure Storage?  Check out our Hands-on Windows Azure Application Architecture & Development course.


6 Responses to "Generate Identity / Sequence Values from Azure Storage"
  1. Hi

    Is there any reason why could can’t simply use a table storage entity to keep track of the next sequence number? Something along these lines:

    1. Read the next sequence number.
    2. Increment the sequence number.
    3. Create the invoice with the sequence number.
    4. Save the changes from step 2-3 in a batch.
    5. On failure due to etag version repeat.

    It does away with the queue and the worker role. You may sacrifice some performance but you get simplicity instead.

    I know this post is almost two years old, but I just found it. 🙂

    • …besides, I think A queue guarantees that each entry is read at least once not that it is read exactly once. In practice it’s read exactly once, but are there any guarantees?

      • Sorry for the spamming. I didn’t think that solution through properly before I posted.

        Entity Group Transactions can’t span partitions and that will cause problems. But what about the solution below?

        The counter is stored in a property of a dedicated entity. When you need a new number then you:
        1. Read the next sequence number
        2. Increment the sequence counter
        3. Save the changes and on etag version failure go to 1
        4. Use the new sequence number for e.g. a new Invoice number.

        This can leave gaps if saving the Invoice fails, but SQL Server Identity columns have that problem too and as far as I can tell so does the solution you suggested.

        • Hi Johan,

          I have to admit that 1) it’s been a really long time since I wrote this post and that 2) I’ve never tried this at scale. It was more of an experiment for me to see if I could recreate an identity column. If I were going to do this again, the simplest way to do it would probably to use SQL Azure and just use the functionality in SQL Server.

          Thanks for commenting!

          -Ben

Leave a Reply to Johan Levin Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.