by Arpit Kumar
27 Jun, 2023
7 minute read
Troubleshooting a hot MongoDb shard

Caveats of MongoDb sharding while working at scale


I recently had the opportunity to work on a streaming job that involved writing data to MongoDB every 30 seconds. The primary objective of this job was to process events and gather statistics for a specific 30-second window. However, due to the high volume of traffic, the MongoDB updates were becoming slower and impacting overall performance.

In order to accommodate a larger user base and ensure scalability, we made the decision to shard the database. While some members of the team had prior experience with sharding MongoDB, this was my first time working with MongoDB at such a scale.

By implementing sharding, we aimed to distribute the data across multiple MongoDB shards, thereby distributing the workload and improving performance. This approach would allow us to handle the increased traffic and maintain the desired level of responsiveness.

We made the decision to shard our database into three shards, with each shard consisting of one primary and two secondary replicas. To ensure a smooth transition, we chose to perform the sharding process gradually over a week during off-peak hours. This allowed us to minimize any potential disruptions to the system while migrating data and balancing the shards.

During the data migration process, we encountered an unexpected issue. Despite the data distribution across the shards being nearly equal, we noticed that one particular shard was receiving an overwhelming majority of the update queries, accounting for about 90% of them. Recognising the importance of maintaining an even load distribution, I felt compelled to investigate further.

The shard key used was a composite key consisting of “_id” and a timestamp field “t”. This key determined the start of the time window for which statistics were pushed to the document. Now, before we proceed, let’s delve into the concept of sharding in MongoDB.

Understanding Sharding Concepts


Sharded Cluster Production Architecture
source @ (mongodb.com)

Let’s go over keywords used in context of mongodb

Shard - A shard is a physical server that stores a subset of data from a database. It is a technique used to distribute data across multiple servers in order to improve performance and scalability.

Shard Key - A shard key is a field or set of fields that is used to determine which shard a document will be stored on. The shard key must be unique for each document in the database.

Example -

db.my_collection.ensureSharding( { _id: ‘hashed’} )

In this example, the _id field is used as the shard key. This means that documents will be distributed across shards based on the value of the _id field. The hashed keyword tells MongoDB to hash the value of the _id field before using it as the shard key. This helps to distribute documents evenly across shards.

Here is an explanation of the different types of shard keys:

  • Simple shard keys are based on a single field. For example, the _id field in the above example is a simple shard key.
  • Composite shard keys are based on multiple fields. For example, a composite shard key could be based on the _id field and the country field.
  • Hashed shard keys are based on the hash of a field or combination of fields. For example, the hashed keyword in the above example tells MongoDB to hash the value of the _id field before using it as the shard key.

Note - As per hashing implementation it seems to be md5 hash.

Shard keys are important because they determine how documents are distributed across shards. Choosing the right shard key can improve the performance of your MongoDB application.

Balancer - A balancer is a component that distributes traffic across the shards in a MongoDB cluster. The balancer ensures that all shards receive an equal amount of traffic.

Config Servers - Config servers are responsible for storing the cluster configuration information, such as the shard key, the number of shards, and the location of the shards.

Mongos Routers - Mongos routers are responsible for routing client requests to the appropriate shard. Mongos routers also provide a single point of entry for clients, which makes it easier to manage and administer a MongoDB cluster.

Chunk as a key aspect of sharding

In MongoDB, chunk implementation is a key aspect of sharding, which is a technique used to distribute and partition data across multiple servers or “shards” to achieve horizontal scalability. Each shard contains a subset of the data, and the chunks define the ranges or divisions of data within a shard.

When a collection is sharded, MongoDB automatically splits the data into chunks based on the shard key values.

MongoDB uses a process called chunk migration to balance the data distribution across shards dynamically. As the data grows or the distribution becomes imbalanced, MongoDB redistributes the chunks to ensure an even distribution of data across shards. This migration process takes place in the background and is transparent to applications.

The size of a chunk is determined by the configuration of the cluster and can vary based on factors such as hardware capabilities, workload patterns, and performance requirements. The goal is to have chunks that are large enough to minimize the overhead of metadata management while being small enough to allow for efficient data distribution and migration.

Now this is an important piece of implementation detail - each chunk is assigned a range based on the shard key values. For example, if the shard key is a numeric field, MongoDB may assign a chunk for the range of 0-100, another chunk for 101-200, and so on. The ranges are exclusive, meaning that the lower bound is inclusive, but the upper bound is exclusive.


Chunk Keyspace

So if you remember from the initial discussion we were using the ObjectID field “_id” as part of the shard key, but it poses a lot of challenges and limitations. Here are a few issues which after understanding the implementation details became clear to me, some of this is directly from the documentation:

  1. Monotonically Increasing Values: ObjectIDs generated by MongoDB are composed of a timestamp, machine identifier, process identifier, and a counter. The timestamp component ensures that each ObjectID is unique and also provides a natural ordering. However, when using ObjectIDs as shard keys, this can result in a monotonically increasing pattern. As new data is added, it tends to be appended to the end of the shard, potentially leading to hotspots and imbalanced data distribution.
  2. Write Scalability: When ObjectIDs are used as shard keys, write operations can become a bottleneck. Since new data is typically appended at the end of the shard, multiple write operations occurring concurrently can contend for the same chunk, causing contention and reduced write scalability.
  3. Limited Query Flexibility: ObjectIDs are primarily designed to provide uniqueness and ordering within a single shard. When used as a shard key, it may limit the flexibility of queries that span multiple shards. For example, if a query needs to retrieve data based on a range of ObjectIDs, it may involve querying multiple shards, leading to slower performance and increased complexity.
  4. Data Distribution and Chunk Migration: With ObjectID as the shard key, the data distribution across shards may become imbalanced due to the monotonically increasing pattern. MongoDB’s automatic chunk migration may struggle to evenly distribute the chunks, resulting in suboptimal data distribution and performance.

In addition to this, the second component of our shard key was also the starting timestamp of the processing window. This is also in effect a monotonically increasing key. All our new keys were based on recent timestamps which made them lie in the same chunk range and ultimately created a hot shard for updates.

It’s important to consider the trade-offs and choose a sharding strategy that aligns with your specific query patterns and requirements.

Final Thoughts and Resources

As a general rule, using a hashed key for sharding can help distribute data uniformly across a sharded cluster. However, this approach may not be optimal if your query pattern primarily consists of range queries, as it can lead to increased scatter-gather operations across different physical shards.

If you are considering using sharding for your MongoDB applications, I encourage you to learn more about it and to consider the factors that are important to your specific needs.

There are a number of resources available to help you learn more about MongoDB sharding, including the MongoDB documentation, the MongoDB community, and the MongoDB training courses. Definitely read these resources before you start the sharding process -

Recent Posts

Understanding Asynchronous I/O in Linux - io_uring
Explore the evolution of I/O multiplexing from `select(2)` to `epoll(7)`, culminating in the advanced io_uring framework
Building a Rate Limiter or RPM based Throttler for your API/worker
Building a simple rate limiter / throttler based on GCRA algorithm and redis script
MicroVMs, Isolates, Wasm, gVisor: A New Era of Virtualization
Exploring the evolution and nuances of serverless architectures, focusing on the emergence of MicroVMs as a solution for isolation, security, and agility. We will discuss the differences between containers and MicroVMs, their use cases in serverless setups, and highlights notable MicroVM implementations by various companies. Focusing on FirecrackerVM, V8 isolates, wasmruntime and gVisor.

Get the "Sum of bytes" newsletter in your inbox
No spam. Just the interesting tech which makes scale possible.