This is Chapter 3 of Azure Cosmos DB for .NET Developers. Previous: Chapter 2: Document Thinking.
In the first two chapters, we talked about trees and boxes. Your domain model is a tree. Relational databases store boxes. Cosmos DB stores trees — your domain object serialized to JSON, structure intact, no chopping required.
Great. Now let's actually start building something.
But before we write any code, we need to understand how Cosmos DB is organized. There are four concepts you need to know, and one of them — the partition key — is the most important design decision you'll make in any Cosmos application. We're going to spend some time on it because getting it right matters more than almost anything else.
(We're also going to punt on the question of "where do I run this?" for a bit. Cloud? Local emulator? We'll get there in the next chapter. First, let's understand what we're building against.)
The Four Things: Database, Container, Partition Key, Id
Cosmos DB has a nesting structure that's straightforward but important to understand.
Database
A Cosmos DB database is the outermost grouping. It's roughly analogous to a database in SQL Server — a logical container for related data. Most applications have one. You might have more if you have genuinely separate workloads that need different throughput configurations, but for most projects, one database is fine.
Not much drama here. Create one, give it a name, move on.
Container
A container lives inside a database. This is where your documents actually live. If you're coming from the relational world, a container is the closest analog to maybe a table — but the analogy crumbles almost immediately. A container can hold documents with completely different shapes. Your Order documents and your Customer documents can live in the same container, side by side, with totally different structures. Try that with a SQL Server table.
(How Cosmos tells them apart is through some kind of a discriminator property. That's a topic we'll get to in the data modeling chapter. For now, just know that a container isn't limited to one type of document.)
You need at least one container. I've been pleasantly surprised by how far I can get with just a single container in my apps. But there are definitely reasons why you might want to add more containers and the decision is typically dictated by the access patterns and indexing needs of the data.
(BTW, I'm trying really hard to not get bogged down in the details yet because it gets complicated fast.)
Partition Key
Partition keys are about 90% of the game in Cosmos DB. Get it right and you live an easy life of delight and fulfillment as you endlessly marvel at glory of Cosmos DB's performance and scalability. Get it wrong and you're well and truly hosed. Doomed.
So. Anyway. Partition Keys.
Every container has a partition key definition that gets set when you create the container. There are two types of partition keys: single-value and hierarchical. The single-value partition key uses a single property in your documents. Hierarchical uses multiple properties on your documents as the partition key.
Cosmos DB containers are happy to store just about any document (data) as long as it conforms to the partition key. Compare that to relational databases where a table has a fixed schema that's enforced by the database engine. Cosmos DB doesn't enforce much about your data beyond the partition key.
Once you define the partition key, all the documents in the container will conform to that definition and that forms the foundation of how Cosmos DB will physically store your data and handle scalability for your container.
I'm going to spend the next few sections on partition keys because they're the fundamental unit of scalability in Cosmos DB. If you understand partition keys, you understand why Cosmos works the way it works.
Detour: Defining 'Identity'
There's a concept in data modeling called identity. It describes how you identify individual pieces of data. If we're talking about humans out in real life, we've all got names and that's pretty good at telling the difference between say "Mary Smith" and "Saul Rosenberg". In C#, identity is governed by the GetHashCode() method that's available on every object. If two objects have the same hash code value, they're the same object. In SQL Server, it's the primary key value for a table.
Identity establishes uniqueness.
Up to this point, I've been deliberately been avoiding abbreviating "Partition Key". The natural abbreviation is "PK" but that also happens to be the common abbreviation for "Primary Key" in relational database land. Cosmos PK != SQL Server PK.
In SQL Server a PK is unique and establishes identity. In Cosmos DB, the PK isn't unique but it's the start of uniqueness — the start of establishing identity.
The id Property
Partition keys are only the first part of establishing document identity. The other part is the id property. Every document in a container automatically gets an id property.
Since "id" is typically though of as the abbreviation for "identity", you'd think that id would establish document identity. It doesn't. That id value doesn't even have to be unique in the container. (Pro tip: you probably should try to make it unique. More on this later.)
Document identity comes from the combination of the partition key value and the id value.
Every document in Cosmos DB must have an id property. It's a string. It uniquely identifies the document within its partition. If you try to save a document without an id property or with an empty string id value, Cosmos DB will automatically populate your document with a Guid value for id.
Why Partition Keys Matter More Than Anything Else
I remember about 10-15 years ago, I kept hearing people talk about relational database sharding. "We need to shard the database." "Our sharding strategy needs work." At conferences, in blog posts, in architecture reviews. It was everywhere for a while. I'm pretty sure I even wrote my own sharding engine at some point just to try it out.
Now that I think back on all that sharding chatter, it was the world slowly starting to notice the fundamental scalability limitations of relational databases. When your data lives in tables with foreign keys and JOINs, scaling horizontally ("scaling out") is hard. You can scale vertically (bigger server, "scaling up") for a while, but eventually you hit a ceiling of what you're willing to pay or the limits of available hardware. Sharding — splitting data across multiple database instances based on some subset of a key — was the industry's attempt to solve that. And it was painful. Manually managed. Application-level routing. Cross-shard queries and transactions were a nightmare.
What those architects were homing in on, without having the vocabulary for it yet, was the essential design insight that Cosmos DB is built on: the partition key.
In Cosmos DB, you can't scale any smaller than the partition key. That's the fundamental unit. Cosmos automatically distributes partitions across physical hardware. When load increases, Cosmos splits and rebalances partitions across more machines. You don't manage shards. You don't configure routing. You chose your partition key when you created the container, and Cosmos handles the rest.
But that also means your partition key choice determines how Cosmos can scale your data. If you choose poorly, Cosmos can't help you. If you choose well, Cosmos scales almost invisibly.
All of the performance characteristics, all of the cost behavior, all of the query patterns — they flow from this one decision. The partition key is the most important thing in your Cosmos architecture. Get it right and most other things fall into place. Get it wrong and you'll fight the system forever.
One more thing: the partition key is set when the container is created and cannot be changed later. You're stuck with it.
Let's say you've figured out that your partition key config is just plain wrong and you want to change it. The only solution is to create a new container and migrate the data to the new container. It's do-able...but it's kind of a pain. (Guess how I know. Yah. I've messed it up before.)
My Recommended Partition Key for the Chronically Impatient
Before we get in to how you choose, I'm going to give you my TL;DR partition key recommendation.
(Man! You're impatient! Sheesh!)
Here's the recommendation:
- Create a container that uses hierarchical partition keys
- Set the partition key to "/tenantId,/entityType".
tenantIdis the owner of the data (me, you, someone else, etc)entityTypetype is the type of data (Order, Address, Claims, Person, Settings, etc.)
This is what I use all the time and it's served me well.
Now that that's out of the way, let's get into the actual thought process.
Choosing a Partition Key: The Cardinality Sweet Spot
So how do you choose? Let's start with the two extremes, because they illustrate the tradeoff.
Extreme 1: Maximum Uniqueness
You could make your partition key a Guid.NewGuid().ToString(). Every single document gets its own unique partition. That's the ultimate in granular scalability and Cosmos can spread your documents across as many physical partitions as it needs.
But it leads to a problem: cross-partition queries. If you need to find all orders for a specific customer, and every order has a random GUID as its partition key, Cosmos has to check every partition to answer your query. That's called a fan-out query, and it's expensive. It consumes more request units, takes more time, and gets worse as your data grows. Cross-partition queries are not desirable and they get expensive quickly. (More on this later.)
Extreme 2: Everything in One Partition
On the other end, you could set every document's partition key to the same value: "my_partition_key". Now everything is in one partition. Queries are simple — Cosmos knows exactly where to look.
But you've short-circuited Cosmos's ability to scale on your behalf. All your data is on one physical machine. And there's an upper limit — a single logical partition can hold a maximum of 20 GB. Hit that ceiling and you're stuck. You've turned your infinitely scalable document database into a single box that tops out at 20 gigs.
Yah. Not good.
The Middle Ground
What you're aiming for is something in between.
Now, I'm a musician by training. I have a degree in music performance. I did not take a lot of CS courses. So when I tell you that the term we need here is cardinality, I want you to know that I find this term unnecessarily fancy. It basically means "uniqueness." How many distinct values does a property have?
Low cardinality = few unique values. Think of a Status field with three possible values: "Active", "Inactive", "Pending". That's low cardinality. Low potential for uniqueness.
High cardinality = many unique values. Think of a GUID. That's extremely high cardinality. Every value is unique.
Here's how cardinality maps to your partition key choice:
quadrantChart
title Partition Key Cardinality Tradeoff
x-axis Low Cardinality --> High Cardinality
y-axis Low Scalability --> High Scalability
quadrant-1 Scales great but cross-partition query pain
quadrant-2 Unlikely in practice
quadrant-3 Doesn't scale and hits size limits
quadrant-4 Unlikely in practice
Same value for all docs: [0.1, 0.1]
Status field 3 values: [0.2, 0.2]
Category or Type: [0.3, 0.35]
Customer or Tenant ID: [0.55, 0.65]
Order ID: [0.75, 0.8]
Guid per document: [0.9, 0.9]
Low cardinality, low scalability: everything jammed into a few partitions. Hits size limits. Bad.
High cardinality, high scalability: everything spread across unique partitions. Scales beautifully. But every query that isn't a point read becomes a cross-partition fan-out. Expensive.
The sweet spot is the middle of that diagonal. You want a partition key with enough cardinality to distribute data well, but that also aligns with how you actually query your data.
The practical heuristic: what do you query by most often? If most of your queries are "get all orders for customer X" or "get all documents for tenant Y," then customerId or tenantId is probably your partition key. Your most common query runs within a single partition (fast and cheap), and you have enough unique values to distribute data across partitions as you grow.
Don't overthink it. Pick the property that most often appears in your WHERE clauses. That's usually your partition key.
A Quick Note on Hierarchical Partition Keys
Cosmos DB also supports hierarchical partition keys — partition keys with multiple levels. Instead of a single property like /customerId, you can define a key like /tenantId, /customerId. This gives you finer-grained distribution while still allowing efficient queries at different levels of the hierarchy.
We'll get deeper into hierarchical partition keys when we talk about data modeling. For now, just know they exist and they're useful when your data has natural grouping levels. (I love hierarchical partition keys.)
Let's See Some Code
Enough concepts. Let's look at some code and store a tree. This whole book assumes that you're writing an application using .NET Core so the natural place to start is with the Microsoft.Azure.Cosmos SDK.
We're going to keep this simple. One domain class, a couple of basic operations. The goal isn't to build a production application — it's to see what the raw SDK looks like and get a feel for some code.
A Simple Domain Model
public class Note
{
public string id { get; set; } = Guid.NewGuid().ToString();
public string OwnerId { get; set; } = string.Empty;
public string Title { get; set; } = string.Empty;
public string Body { get; set; } = string.Empty;
public DateTime CreatedDate { get; set; } = DateTime.UtcNow;
public List<string> Tags { get; set; } = new();
}
A couple of things to notice. The id property is lowercase — Cosmos DB requires that specific property name for the document identifier. If you're used to C# conventions, this will bug you...don't worry, we'll fix it later.
The OwnerId is going to be our partition key. All of a user's notes will be in the same partition. That means "get all notes for user X" is a single-partition query.
And the Tags property is a List<string>. In a relational database, tags would be a separate table with a foreign key back to Notes. Here, they're just a list that's nested inside the document. Part of the tree. No boxes needed. Easy.
Connecting, Saving, and Reading
using Microsoft.Azure.Cosmos;
// Connect
var client = new CosmosClient(
"https://localhost:8081", // we'll talk about this endpoint next chapter
"YOUR_KEY_HERE",
new CosmosClientOptions
{
SerializerOptions = new CosmosSerializationOptions
{
PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
}
}
);
// Get references to database and container
var database = client.GetDatabase("NotesDb");
var container = database.GetContainer("NotesContainer");
// Save a note
var note = new Note
{
OwnerId = "user-123",
Title = "First Cosmos Note",
Body = "Look ma, no boxes!",
Tags = new List<string> { "cosmos", "getting-started" }
};
var saveResponse = await container.UpsertItemAsync(
note,
new PartitionKey(note.OwnerId)
);
Console.WriteLine($"Saved. Cost: {saveResponse.RequestCharge} RUs");
// Read it back
var readResponse = await container.ReadItemAsync<Note>(
note.id,
new PartitionKey(note.OwnerId)
);
var loaded = readResponse.Resource;
Console.WriteLine($"Title: {loaded.Title}");
Console.WriteLine($"Tags: {string.Join(", ", loaded.Tags)}");
Console.WriteLine($"Read cost: {readResponse.RequestCharge} RUs");
That's a tree going into Cosmos and a tree coming back out. The Note object with its nested Tags list gets serialized to JSON, stored as a single document, and deserialized back to the same structure. No shredding. No boxes. No adapter layer.
A few things to notice before we move on:
UpsertItemAsync creates the document if it doesn't exist, or replaces it if it does (matched by partition key + id). I use upsert rather than create in most situations — it's simpler to reason about.
new PartitionKey(note.OwnerId) — you have to manually construct a PartitionKey object and pass it with every operation. Every save. Every read. Every query. If you forget it, the SDK won't throw an error. It'll silently do a more expensive operation instead.
response.RequestCharge — that's the cost of the operation in Request Units. It's how Cosmos DB bills you. I predict that you'll start getting in the habit of looking at that number.
We'll dig into all of these patterns — and discover what the SDK doesn't tell you — in the next chapter when we actually run this code.
What's Next
We've got the mental model: Database → Container → Partition Key → id. We understand why the partition key choice is the most important decision in a Cosmos application. We've got cardinality as the framework for making that choice. And we've got some code that saves and reads a document.
But we haven't actually run anything yet. In the next chapter, we'll set up a local development environment, execute this code against the Cosmos DB emulator, and then look at what's actually stored in the database. Because what Cosmos stores isn't quite what you sent it — there are some extra fields in there, and understanding them matters more than you'd think.