MongoDB – Datatypes

In this article, you will learn about one of the leading NoSQL databases – MongoDB. You will understand the fundamentals of MongoDB and how data is stored in a NoSQL database, but the majority of the article will concentrate on the data types supported by MongoDB.
MongoDB is a cross-platform, document-oriented. NoSQL database. MongoDB is known for its high scalability, amazing availability and higher performance compared to a similar SQL database like MySQL.

In any NoSQL database, data is stored as a set of key-value pairs. Here is an example.

"name":  "Codelivly"

When we store related key-value pairs together in a set of key-value pairs, the set is known as a document. Here is an example of a document that contains data about an employee.

{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_salary":  40000,
"employee_status":  true,
}

Introduction to Data Types

You can see in the document above that we have stored multiple values for an employee. This is similar to how we would store data within a row in a traditional RDBMS. A collection is a grouping of similar documents. Collections are the NoSQL equivalent of RDBMS tables, with a few key differences that we will not go over in this article.

In the above document, you can see that we have 4 different key-value pairs. The values can be of different types, for example, in this case the employee_name and employee_skills have the values of String type, employee_salary is of the number type and the employee_status is of the boolean type.  

Having these (and more) data types in MongoDB allows us to store the data in a more efficient format and also perform highly efficient and robust queries on the stored data.

Using the correct data type for storing the data fields in a document is crucial to the success of the database system. Here are some of the most used data types available in MongoDB.

  • String
  • Integer
  • Boolean
  • Double
  • Date
  • Mix/Max keys
  • Arrays
  • Timestamp
  • Object
  • Null
  • Symbol
  • Regular Expressions

We will have a look at all of these with examples but before that, let’s have a look at JSON and BSON to understand how MongoDB stores the data.

JSON and BSON

JSON stands for JavaScript Object Notation. It is a very common format used by APIs and web services to return the data to the client. This format is widely used because of its simplicity and ease of parsing. Most modern programming languages do not need an additional application layer to parse JSON data.

JSON objects are simple associative containers that store data as a collection of key-value pairs. In this case, a key is associated with a value (which can be a number, string, function, or even another object).

MongoDB also stores the data as JSON documents but the JSON data is binary encoded. This results in BSON. BSON simply stands for Binary JSON. BSON’s binary structure encodes type and length information, which allows it to be parsed much more quickly and therefore delivers better performance.

In a nutshell, MongoDB stores data in BSON format both internally and over the network, but this does not preclude using MongoDB as a JSON database. Anything that can be represented in JSON can be natively stored and retrieved in JSON.

Different MongoDB data types

Let’s have a look at each data type offered by MongoDB with examples and understand the best use cases for them.

  • String – BSON strings are UTF-8 and are the most commonly used data type in MongoDB. As a result, while serializing and de-serializing BSON, the drivers for each programming language convert from the language’s string format to UTF-8. The string must be in UTF-8 format.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_salary":  40000,
"employee_status":  true,
}

The above document contains two keys with String values. String values are assigned to employee name and employee skills. These are the most basic values, and they are used to represent a variety of characters.

  • Integer – In MongoDB, the integer data type is used to store an integer value. We can store integer data type in two forms 32 -bit signed integer and 64 – bit signed integer.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_salary":  40000,
"employee_status":  true,
}

The key employee_salary stores a numeric value and therefore it is of the type integer.

  • Double – The double datatype is used to store numeric values with 8 bytes (64-bit IEEE 754 floating point) floating-point. Here is an example of a document that contains a double value in the field employee_score.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_score":  97.67,
"employee_status":  true,
}
  • Boolean – The boolean datatype is used to store boolean (true or false) values. In the below example, you can see that the field employee_status stores the value true, hence this field is of the type boolean.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_score":  97.67,
"employee_status":  true,
}

Booleans use less storage than an integer or string and avoid any unexpected side effects of comparison.

  • Arrays – The Array is the set of values. It can store the same or different data types values in it. In MongoDB, the array is created using square brackets([])
{
"employee_name":  "John Doe",
"employee_skills":  ["UI Design", "Graphic Design", "2D Animation"],
"employee_score":  97.67,
"employee_status":  true,
}

In the above example, the employee_skills field contains an array of type String where each value within the array is a String.

Here is another example where instead of an array of a simple type (String), documents are embedded within the array.

{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_stock": [{
"warehouse": "Warehouse A",
"qty": 1200
}, {
"warehouse": "Warehouse B",
"qty": 900
}],
}

In the above document, the field item_stock contains an array of embedded documents.

  • Date – Date data type stores date. It is a 64-bit integer that represents the number of milliseconds. BSON data type generally supports UTC DateTime and it is signed. If the value of the date data type is negative then it represents the dates before 1970. There are various methods to return a date, it can be returned either as a string or as a date object. Some methods for the date:
    • Date(): It returns the current date in string format.
    • new Date(): Returns a date object. Uses the ISODate() wrapper.
    • new ISODate(): It also returns a date object. Uses the ISODate() wrapper.

Here is an example of how a date is stored in a document.

{
"student_name": "Bob Stan",
"student_dob": ISODate("2006-02-10T10:50:42.389Z"),
"student_marks": 78.98
}

In the above example, the stored date can be easily converted to a readable format using JavaScript’s new Date(“2006-02-10T10:50:42.389Z”) function. It will return the following output.

Fri Feb 10 2006 16:20:42 GMT+0530 (India Standard Time)

Internally, Date objects are stored as a signed 64-bit integer representing the number of milliseconds since the Unix epoch (Jan 1, 1970).

  • Min/Max keys – Min and Max’s keys are both internal data types. It is used to compare a value against the lowest and highest BSON elements.
  • Object – Object data type stores embedded documents. Embedded documents are also known as nested documents. Embedded document or nested documents are those types of documents which contain a document inside another document.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_dimensions": {
"item_height": 1200,
"item_width": 100,
"item_depth": 900,
},
"item_availability": true,
}

In the above example, the item_dimensions field is an embedded document as it contains its own set of key-value pairs. This field therefore is of the type Object.

  • Timestamp – The timestamp type is a special type for internal MongoDB use and is not associated with the regular Date type. This internal timestamp type is a 64-bit value where the most significant 32 bits are seconds since the Unix epoch and the least significant 32 bits are an incrementing ordinal for operations within a given second.

Here is what the timestamp value looks like in the document when it is queried.

{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_created": Timestamp(1412180887, 1),
"item_availability": true,
}

The timestamp data type is generally used to keep track of document creation/editing/updating times. The new Timestamp() function is used during the insertion and the server automatically adds the timestamp to the field.

  • Null – The null datatype is used to store null or non-existent values. Here is what a field in a document with a null value would look like when queried.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_color": null,
"item_availability": true,
}

This is similar to the following document as well where the field is completely absent.

{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_availability": true,
}
  • ObjectID – This datatype is used to store a document’s unique ID. No two documents in a collection can have the same ObjectIDs. It is a 12-byte value that contains the timestamp, a random value, and an incrementing counter value as well, all combined together to generate a unique ID.

Here is an example.

{
"_id": "5349b4ddd2781d08c09890f3",
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_availability": true,
}

The _id field is automatically added for every document if you do not specify a field explicitly with the ObjectID type.

  • Binary – This datatype is used to store binary data in a field. This data type corresponds to the Blob type in a Relational DBMS. There is, however, a limit of 16MB per document in MongoDB, so if the binary data plus other fields have a total size less than 16MB, then binary data can be embedded within the document using the Binary data type.

Here is an example.

{
"_id": "5349b4ddd2781d08c09890f3",
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_availability": true,
"item_picture":BinData(1, "wekud3298eyx2398ey293..."),
}

BinData here is the base64 representation of the binary content.

  • Undefined – This datatype is used to store the undefined value in a field. Note that MongoDB differentiates between null and undefined but the shell casts both to null. This behavior can, however, be changed.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_color": undefined,
"item_availability": true,
}

Undefined is now deprecated in MongoDB 4.4.

  • Regular Expression – This datatype is used to store Regular Expressions or RegExs in a field. These can be used for pattern matching across different languages. Here is an example.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_color": undefined,
"item_prefix": /%_Y675%,
}
  • JavaScript with Scope – It is possible to store a live function in MongoDB within a field. The functions with closure can also be stored. They will bind to the scope of the MongoDB session when they’re executed.

In BSON, there are two different types defined for functions without closures, JavaScript, and another one for functions with closures, JavaScript with Scope. JavaScript with Scope is now deprecated in MongoDB 4.4.

So these are all the key and most prominent datatypes in MongoDB. BSON supports more data types than JSON. Some older and less often used datatypes are removed from the MongoDB support shelf and the range or support for newer types is improved with time. This is an evergreen process. 

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *