MongoDB is an open-source, cross-platform, and distributed document-based database designed for ease of application development and scaling. It is a NoSQL database developed by MongoDB Inc.
MongoDB’s name is derived from the word “Humongous” which means huge, and enormous. MongoDB database is built to store a huge amount of data and also perform fast.
MongoDB is not a Relational Database Management System (RDBMS). It is referred to as a “NoSQL” database. In contrast to SQL-based databases, which standardize data under schemas and tables where each table has a fixed structure, this database does not do so. Instead of enforcing schemas, it saves data in the collections as JSON-based documents. In contrast to conventional SQL (RDBMS) databases, it lacks tables, rows, and columns.
How does it work?
We’ll see how things actually work behind the scenes now. As is well known, MongoDB serves as a database server, and these databases are where the data is kept. Or, to put it another way, the MongoDB environment provides you with a server that you can launch and use to host several databases utilizing MongoDB.
Because of its NoSQL database, the data is stored in collections and documents. Hence the database, collection, and documents are related to each other as shown below:
- The MongoDB database contains collections just like the MYSQL database contains tables. You are allowed to create multiple databases and multiple collections.
- Now inside of the collection we have documents. These documents contain the data we want to store in the MongoDB database and a single collection can contain multiple documents you are schema-less means it is not necessary that one document is similar to another.
- The documents are created using the fields. Fields are key-value pairs in the documents, it is just like columns in the relation database. The value of the fields can be of any BSON data type like double, string, boolean, etc.
- The data stored in MongoDB is in the format of BSON documents. Here, BSON stands for Binary representation of JSON documents. Or in other words, in the backend, the MongoDB server converts the JSON data into a binary form that is known as BSON and this BSON is stored and queried more efficiently.
- You are able to store nested data in MongoDB documents. In contrast to SQL, this data nesting lets you to construct complicated relationships between data and store them in the same document, which makes working with and obtaining data incredibly efficient. To obtain the data from tables 1 and 2, you must create intricate joins in SQL. The BSON document can be up to 16MB in size.
NOTE: In the MongoDB server, you are allowed to run multiple databases.
For example, we have a database named Codelivly. Inside this database, we have two collections and in these collections, we have two documents. And in these documents, we store our data in the form of fields. As shown in the below image:
How MongoDB is different from RDBMS?
A NoSQL database is MongoDB. It is freely available. It employs BSON, which is a binary representation of JSON and is used in document-oriented databases. A document storage format is called BSON. MongoDB does not utilize SQL to query databases; instead, it saves data as documents. It has a rich data model and enables distributed servers.
|1||Concept||RDBMS is a relational database management system that works on a relational database.||MongoDB is a non-relational, document-oriented database management system that works on document-based database.|
|2||Hierarchical||Difficult to store hierarchical data.||Have inbuilt support to store hiearchical data.|
|3||Scalability||RDBMS is vertically scalable. Performance increases with the increase of RAM.||MongoDB is horizontally scalable as well. Its performance increases with addition of processor.|
|4||Schema||A schema needs to be defined in RDBMS before using a database.||Schema can be dynamically created and accessed in MongoDB.|
|5||SQL Injection||Vulnerable to SQL Injection attack.||SQL injection is not possible.|
|6||Principle||Follows the ACID principle, Atomicity, Consistency, Isolation, and Durability.||Follows CAP theorem, Consistency, Availability, and Partition tolerance.|
|7||Basis||The database uses Row.||The database uses Document.|
|8||Basis||The database uses Column.||The database uses Field.|
|9||Performance||RDBMS is slower in processing large hierachical data.||MongoDB is blazingly fast in processing large hierachical data.|
|10||Joins||RDBMS supports complex joins.||MongoDB has no support for complex joins.|
|12||Query Language||RDBMS uses SQL to query database.||MongoDB uses BSON to query database.|
Understanding the Pros and Cons of MongoDB
Advantages of MongoDB
MongoDB stores most of the data in the RAM. It allows a quicker performance while executing queries.
Data is directly gathered from RAM rather than the hard drive, and the returns are made more quickly. To achieve higher performance levels, a system must have RAM and accurate indexes.
MongoDB is a document-based database solution. It has attributes like replication and gridFS.
Its attributes allow an increase in data availability. It is also easy to access documents using indexing.
MongoDB performs 100 times faster than other relational databases and provides high performance.
MongoDB offers a simple query syntax that is much easier to grasp than SQL. It provides an expressive query language that users find helpful during development.
This feature has allowed users to confidently select NoSQL structures. It also provides quicker learning and training opportunities than SQL databases.
MongoDB’s schema is not predefined. It means that it has a dynamic schematic architecture that works with non-structured data and storage.
Businesses evolve, and so do the data they keep. It is critical to have a versatile database model that can adapt to these changes.
MongoDB uses sharding while handling large datasets. Sharding is the process of dividing data from a large set and distributing it to multiple servers.
In case, there is an issue where the server cannot handle the data due to its size, it automatically divides it further without pausing the activity.
Scalability is one of the most important advantages of MongoDB. As seen, MongoDB uses “sharding”, which expands the storage capacity.
Unlike SQL databases that use vertical scalability, sharding allows MongoDB to use horizontal scalability.
An ad-hoc query is a non-standard inquiry. It is generated to gain information if and when required.
MongoDB offers an enhanced ad-hoc queries feature. This allows an application to prepare for fore coming queries that may occur in the future.
MongoDB is in the class of “Document Stores”, here the term document refers to data collection.
MongoDB provides accurate documentation, which means it does not tether with the data while storing it. It provides data for each version, edition, or requirement in order to provide users with an excellent documentation process.
MongoDB offers technical support for the various services that it provides. There is technical support for the community forums, Atlas or Cloud Manager as well as Enterprise or Ops Manager.
In case of any issues, the professional customer support team is ready to assist clients.
Transactions refer to the process of reviewing and eliminating unwanted data. MongoDB uses multi-document ACID (Atomicity, Consistency, Isolation, and Durability) transactions.
The majority of the application does not require transactions, but a few do to update multiple documents and collections. This is one of MongoDB’s major limitations, as it may result in data corruption.
Joining documents in MongoDB can be a very tedious task. It fails to support joins as a relational database.
Although there are teams deployed to fix this disadvantage, it is still in the initial stages and would take time to mature.
By manually entering the code, users can use the joins functionality. However, obtaining data from multiple collections necessitates multiple queries, which can result in scattered codes and waste time.
MongoDB offers high-speed performance with the right indexes. In case if the indexing is implemented incorrectly or has any discrepancies, MongoDB will perform at a very low speed.
Fixing the errors in the indexes would also consume time. This is another one of the major limitations of MongoDB.
MongoDB allows a limited size of only 16 MB for a document. Performance nesting for documents is also limited to only 100 levels.
Another one of the major limitations of MongoDB is the duplication of data. The limitation makes it difficult to handle data sets as the relations are not defined well.
Eventually, the duplication of data may lead to corruption as it is not ACID compliant.
MongoDB requires a high amount of storage due to the lack of joins functionalities which leads to the duplication of data. There is an increase in data redundancy which takes up unnecessary space in the memory.
Dwight Merriman and Eliot Horowitz founded MongoDB after encountering development and scalability issues with traditional relational database approaches while building web applications at DoubleClick, an online advertising company now owned by Google Inc. To represent the idea of supporting large amounts of data, the database’s name was derived from the word humongous.
Merriman and Horowitz helped form 10Gen Inc. in 2007 to commercialize MongoDB and related software. The company was renamed MongoDB Inc. in 2013 and went public in October 2017 under the ticker symbol MDB.
The DBMS was released as open-source software in 2009 and is available under the terms of Version 3.0 of the Free Software Foundation’s GNU Affero General Public License, in addition to the commercial licenses offered by MongoDB Inc.
MongoDB has been used by organizations such as MetLife for customer service applications, other websites such as Craigslist for data archiving, and the CERN physics lab for data aggregation and discovery. The New York Times has also used MongoDB to power a form-building application for photo submissions.
The below example shows how a document can be modeled in MongoDB.
- The _id field is added by MongoDB to uniquely identify the document in the collection.
- What you can note is that the Order Data (OrderID, Product, and Quantity ) in RDBMS will normally be stored in a separate table, while in MongoDB it is actually stored as an embedded document in the collection itself. This is one of the key differences in how data is modeled in MongoDB.
Below are a few of the common terms used in MongoDB
- _id – This is a mandatory field in all MongoDB documents. The _id field in a MongoDB document represents a unique value. The _id field functions similarly to the primary key of the document. If you create a new document without a _id field, MongoDB will create one for you. So, in the case of the above customer table, Mongo DB will assign a 24-digit unique identifier to each document in the collection.
- Collection – This is a grouping of MongoDB documents. A collection is the equivalent of a table that is created in any other RDMS such as Oracle or MS SQL. A collection exists within a single database. As seen from the introduction collections don’t enforce any sort of structure.
- Cursor – This is a pointer to the result set of a query. Clients can iterate through a cursor to retrieve results.
- Database – This is a container for collections like in RDMS wherein it is a container for tables. Each database gets its own set of files on the file system. A MongoDB server can store multiple databases.
- Document – A record in a MongoDB collection is basically called a document. The document, in turn, will consist of field names and values.
- Field – A name-value pair in a document. A document has zero or more fields. Fields are analogous to columns in relational databases. The following diagram shows an example of Fields with Key value pairs. So in the example below CustomerID and 11 are one of the key-value pair’s defined in the document.
Just a quick note about the given information between the _id field and a standard collection field. The _id field is used to uniquely identify documents in a collection and is added automatically when the collection is created by MongoDB.
Why Use MongoDB?
Below are the few of the reasons as to why one should start using MongoDB
- Since MongoDB is a NoSQL type database, instead of having data in a relational type format, it stores the data in documents. This makes MongoDB very flexible to real business world situations and requirements.
- Ad hoc queries
- MongoDB supports search by field, range queries, and regular expression searches. Queries can be made to return specific fields within documents.
- Indexes can be created to improve the performance of searches within MongoDB. Any field in a MongoDB document can be indexed.
- MongoDB offers high availability with replica sets. A replica set contains two or more mongo DB instances.
- Each replica set member may act in the role of the primary or secondary replica at any time.
- The primary replica is the main server which interacts with the client and performs all the read/write operations.
- The Secondary replicas keep a copy of the data of the primary using built-in replication.
- When a primary replica fails, the replica set automatically switches over to the secondary, and then it becomes the primary server.
- Load balancing
- MongoDB uses the concept of sharding to scale horizontally by splitting data across multiple MongoDB examples.
- MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure.
Data Modelling in MongoDB
As we saw in the Introduction section, MongoDB data has a flexible schema. Unlike SQL databases, which require you to declare the schema of a table before inserting data, MongoDB’s collections do not enforce document structure. This kind of adaptability is what makes MongoDB so effective.
When modeling data in Mongo, keep the following things in mind
- What are the needs of the application – Look at the business needs of the application and see what data and the type of data are needed for the application. Based on this, ensure that the structure of the document is decided accordingly.
- What are data retrieval patterns – If you foresee heavy query usage then consider the use of indexes in your data model to improve the efficiency of queries.
- Are frequent inserts, updates, and removals happening in the database? Reconsider the use of indexes or incorporate sharding if required in your data modeling design to improve the efficiency of your overall MongoDB environment.