MongoDB – Sharding

For most organizations, the main reason for using a NoSQL database is the ability to deal with the storage and compute demands of storing and querying large amounts of data. MongoDB Sharding is the method by which MongoDB handles large amounts of data.

It can be seen as the process in which large datasets are split into smaller datasets that are stored across multiple MongoDB Instances. This is done because querying on large datasets could lead to high CPU utilization on the MongoDB Server.

The following image shows the structure of a MongoDB Database:

A sizable number of Collections make up each MongoDB Database. Each Collection is composed of numerous documents that contain data in the form of Key-Value pairs. A huge Collection is divided into several smaller Collections, known as Shards, through MongoDB Sharding. Large Collections can be divided into smaller units called “Shards” so that MongoDB can process queries while not overtaxing the server.

MongoDB Sharding can be implemented by creating a Cluster of MongoDB Instances. The following image shows how MongoDB Sharding works in a Cluster.

The three main components of Sharded Cluster are as follows:

  • Shard
  • Config Servers
  • Query Routers

1) Shard

Shard is the most basic unit of a Shared Cluster that is used to store a subset of the large dataset that has to be divided. Shards are designed in such a way that they are capable of providing high data availability and consistency.

2) Config Servers

Config Servers are supposed to store the metadata of the MongoDB Sharded Cluster. This metadata consists of information about what subset of data is stored in which Shard. This information can be used to direct user queries accordingly. Each Sharded Cluster is supposed to have exactly 3 Config Servers.

3) Query Routers

Query Routers can be seen as Mongo Instances that form an interface to the client applications. The Query Routers are responsible for forwarding user queries to the right Shard.

Benefits of MongoDB Sharding

MongoDB Sharding is important because of the following reasons:

  • In a setup in which MongoDB Sharding has not been implemented, the Master nodes handle the potentially large number of write operations whereas the Slave Nodes are responsible for reading operations and maintaining backups. Since MongoDB Sharding utilizes Replica Sets, queries are distributed equally among all nodes.
  • The storage capacity of the Sharded Cluster can be increased without performing any complex hardware restructuring by simply adding additional Shards to the Cluster.
  • The Cluster’s other Shards will continue to function even if one or more of them fall offline, making it possible to access the data contained in those live Shards without any problems. 

Steps to Set up MongoDB Sharding

MongoDB Sharding can be set up by implementing the following steps:

Step 1: Creating a Directory for Config Server

The first step is performed in order to set up MongoDB Sharding would be to create a separate directory for Config Server. This can be done using the following command:

mkdir /data/configdb

Step 2: Starting MongoDB Instance in Configuration Mode

One Server has to be set up as the Configuration Server. Suppose you have a Server named “ConfServer” which would be used as the Configuration Server, the following command can be executed to perform that operation:

mongod –configdb ConfServer: 27019

Step 3: Starting Mongos Instance

The Mongos Instance can be launched after the Configuration Server has been configured by running the following command and adding the name of your Configuration Server:

mongos –configdb ConfServer: 27019

Step 4: Connecting to Mongos Instance

A connection can be formed to the Mongos Instance by running the following command from the Mongo Shell:

mongo –host ConfServer –port 27017

Step 5: Adding Servers to Clusters

All Servers that have to be included in the Cluster can be added by the following command:

sh.addShard("SA:27017")

“SA” here has to be replaced with the name of your Server that has to be added to the Cluster. This command can be executed for all Servers that have to be added to the Cluster.

Step 6: Enabling Sharding for Database

Sharding for the necessary database must be enabled after the Sharded Cluster has been configured. The command below can be used to achieve this:

sh.enableSharding(db_test)

In the above command, “db_test” has to be replaced with the name of the database that you wish to Shard.

Limitations of MongoDB Sharding

The limitations of MongoDB Sharding are as follows:

  • Setting up MongoDB Sharding is a complex operation and hence, careful planning and high maintenance is required.
  • There are certain MongoDB operations that cannot be executed in a Sharded Cluster. For example, the geoSpace command.
  • Once a Collection in MongoDB has been sharded, there is no way to un-shard it and restore the Collection in the original format. 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *