MongoDB – Data Modelling

Data in MongoDB has a flexible schema. documents in the same collection. They do not need to have the same set of fields or structure Common fields in a collection’s documents may hold different types of data. 

What is Data Modeling?

Data modeling is the blueprint for creating a full-fledged database system. A data model’s primary function is to provide visual information about the relationship between two or more data points. The layout/design would then be critical in maintaining petabyte-scale data repositories to store data from multiple business functions and teams, ranging from sales to marketing and beyond.

The process of ideating a data model is always continuous and evolving and requires multiple feedback loops, and direct connection with the stakeholders to incorporate new data models or reiterate definitions on an existing one.

Formalized schemas and techniques are used to develop competent data models, ensuring a standard, consistent, and predictable way to run business processes and strategize data resources in an organization.

On the basis of the level of details or specificity, data models for a database system can be conceptualized into three categories: Conceptual data models, Logical data models, and Physical data models. 

script async src=”https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3034206653468213″ crossorigin=”anonymous”>

Let’s learn about them briefly. 

  • Conceptual Data Models: Conceptual Data Models can be described as rough drawings offering the big picture, answering where the data/information from across business functions in the database system would get stored and the relationships they will be entangled with. A conceptual data model typically contains the entity class, characteristics, constraints, and the relationship between security and data integrity requirements.
  • Logical Data Models: Logical data models offer more in-depth, subjective information about the relationships between data sets. We can clearly relate to what data types and relations are used at this point. Logical data models are typically ignored in agile business environments, but they are useful in projects that are data-driven and require extensive procedure implementation.
  • Physical Data Models: Physical data model provides a schema/layout for the data storing rituals within a database. A physical data model offers a finalized proposition that can be implemented in a relational database. 

Embedded & Normalized Data Models in MongoDB Data Modeling

MongoDB Data Modeling | Embedded & Normalized Data Models
Embedded versus normalized data models

When data professionals begin building data models in MongoDB, they are faced with the decision of embedding the information or storing it separately in a collection of documents. As a result, there are two concepts for efficient MongoDB Data Modeling:

  1. The Embedded Data Model, and
  2. The Normalized Data Model.

Embedded Data Model

Embedded data modeling in MongoDB Data Modeling — a denormalized data model — is applied when two data sets contain a relationship. Hence an embedded data model sets relationships between data elements, keeping documents in a single document structure. You can save information in an array or in a field, depending upon the requirements. 

Normalized Data Model

In a normalized data model, object references are used to model relationships between data elements/documents. This model reduces duplication of data; hence many-to-many relationships can be documented without the duplication of content fairly easily. Normalized data models are best for modeling large hierarchical datasets, referencing across collections.

Defining Relationships in MongoDB Data Modeling

The most important consideration in your MongoDB data modeling project is defining relationships for your schema. These relationships define how your system will use data. MongoDB Data Modeling defines three types of relationships: one-to-one, one-to-many, and many-to-many.

One-to-one Relationship

A great example of this relationship would be your name. Because one user can have only one name. One-to-one data can be modeled as the key-value pairs in your database. Look at the example given below:

{
    "_id": "ObjectId('AAA')",
    "name": "Joe Karlsson",
    "company": "MongoDB",
    "twitter": "@JoeKarlsson1",
    "twitch": "joe_karlsson",
    "tiktok": "joekarlsson",
    "website": "joekarlsson.com"
}

One-to-many Relationship

Consider the following scenario: you are creating a page for an e-commerce site with a schema that displays product information. As a result, we save information in the system for many elements that comprise a single project. Thousands of subparts and relationships could be saved thanks to the schema. Let’s take a look at some of its works:

{
    "_id": "ObjectId('AAA')",
    "name": "Joe Karlsson",
    "company": "MongoDB",
    "twitter": "@JoeKarlsson1",
    "twitch": "joe_karlsson",
    "tiktok": "joekarlsson",
    "website": "joekarlsson.com",
    "addresses": [
        { "street": "123 Sesame St", "city": "Anytown", "cc": "USA" },  
        { "street": "123 Avenue Q",  "city": "New York", "cc": "USA" }
    ]
}

Many-to-many Relationship

Consider imagining a to-do app to better understand many-to-many relationships. A user in the application may have many tasks, and each task may be assigned to multiple users. As a result, references will exist between one user and many tasks in order to preserve relationships between users and tasks. Let’s look at this with the help of the following example:

Users:

{
    "_id": ObjectID("AAF1"),
    "name": "Kate Monster",
    "tasks": [ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73")]
}

Tasks:

{
    "_id": ObjectID("ADF9"),
    "description": "Write blog post about MongoDB schema design",
    "due_date": ISODate("2014-04-01"),
    "owners": [ObjectID("AAF1"), ObjectID("BB3G")]
}

MongoDB Data Modeling Schema

MongoDB Data modeling, by default, has a flexible scheme that is not identical for all documents. This can be referred to as a paradigm shift in how we view data in conformity in tables from an SQL point of view where all rows and columns are defined to have a fixed data type. 

script async src=”https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3034206653468213″ crossorigin=”anonymous”>

What is a flexible schema?

In a flexible schema model, it’s unnecessary to define data type in a specific field as a field can differ across documents. Flexible schema proves advantageous when adding, removing, or changing new areas to an existing table, even updating documents to a new structure.

Let’s explain with an example. In the below-given example, there are two documents in the same collection:

{ "_id" : ObjectId("5b98bfe7e8b9ab9875e4c80c"),
     "StudentName" : "George  Beckonn",
     "ParentPhone" : 75646344,
     "age" : 10
}
{ "_id" : ObjectId("5b98bfe7e8b9ab98757e8b9a"),
     "StudentName" : "Fredrick  Wesonga",
     "ParentPhone" : false,
}

In the first set, we have the field ‘age’ but in the second set, we don’t have that field. Furthermore, the data type for the field ‘ParentPhone’ in the first set is set to numerical, whereas in the second set, it’s set to ‘False,’ which is a boolean type data set. 

What is Rigid Schema?

In a rigid schema, all documents in a collection share a similar structure, giving you a better chance while setting up some new document validation rules to enhance data integrity during insert and update options. Some examples of rigid schema data types are as follows: String, number, boolean, date, buffer, objected, array, mixed, deciman128, map.

Below-given example shows what a sample schema looks like:

var userSchema = new mongoose.Schema({
    userId: Number,
    Email: String,
    Birthday: Date,
    Adult: Boolean,
    Binary: Buffer,
    height: Schema.Types.Decimal128,
    units: []
   });

Its example use case is as follows:

var user = mongoose.model(‘Users’, userSchema )
var newUser = new user;
newUser.userId = 1;
newUser.Email = “example@gmail.com”;
newUser.Birthday = new Date;
newUser.Adult = false;
newUser.Binary = Buffer.alloc(0);
newUser.height = 12.45;
newUser.units = [‘Circuit network Theory’, ‘Algerbra’, ‘Calculus’];
newUser.save(callbackfunction);

What is Schema Validation?

Schema validation proves vital when validating data from the server’s end. There exist some schema validation rules to achieve the same. The validation rules are applied to operations related to insertion and deletion. The rules can also be added to an existing collection using the ‘collMod’ command. The updates will not get applied to an existing document unless an update is applied to them.

The validator command can be issued when creating a new collection using the ‘dv.createCollection()’ command. From MongoDB version 3.6 and onwards, MongoDB supports JSON Schema, and hence you are required to use the ‘$jsonSchema’ operator.

db.createCollection("students", {
   validator: {$jsonSchema: {
         bsonType: "object",
         required: [ "name", "year", "major", "gpa" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            gender: {
               bsonType: "string",
               description: "must be a string and is not required"
            },
            year: {
               bsonType: "int",
               minimum: 2017,
               maximum: 3017,
               exclusiveMaximum: false,
               description: "must be an integer in [ 2017, 2020 ] and is required"
            },
            major: {
               enum: [ "Math", "English", "Computer Science", "History", null ],
               description: "can only be one of the enum values and is required"
            },
            gpa: {
               bsonType: [ "double" ],
               minimum: 0,
               description: "must be a double and is required"
            }
         }
       
   }})

To insert a new document into the schema, follow the below-given example:

db.students.insert({
   name: "James Karanja",
   year: NumberInt(2016),
   major: "History",
   gpa: NumberInt(3)
})

An error will occur due to the callback function because of some violated validation rules as the supplied year is not within the specified limit.

WriteResult({
   "nInserted" : 0,
   "writeError" : {
      "code" : 121,
      "errmsg" : "Document failed validation"
   }
})

Except for the $where, $text, near, and $nearSphere operators, you can add query expressions to the validation option.

db.createCollection( "contacts",
   { validator: { $or:
      [
         { phone: { $type: "string" } },
         { email: { $regex: /@mongodb.com$/ } },
         { status: { $in: [ "Unknown", "Incomplete" ] } }
      ]
   }
} )

script async src=”https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-3034206653468213″ crossorigin=”anonymous”>

Schema Validation Levels in MongoDB Data Modeling

In general, validations are issued to the write operations. But, they can be applied to already existing documents. There exist three levels of validation:

  • Strict: Validation rules are applied to all inserts and updates.
  • Moderate: Validation rules are applied to only those existing documents — during inserts and updates — that fulfill the validation criteria.
  • Off: Validations are off; hence no validation criteria are applied to any document.

For example, let’s insert the data below in a ‘client’ collection.

db.clients.insert([
{
    "_id" : 1,
    "name" : "Brillian",
    "phone" : "+1 778 574 666",
    "city" : "Beijing",
    "status" : "Married"
},
{
    "_id" : 2,
    "name" : "James",
    "city" : "Peninsula"
}
]

After applying the moderate validation level using:

db.runCommand( {
   collMod: "test",
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "phone", "name" ],
      properties: {
         phone: {
            bsonType: "string",
            description: "must be a string and is required"
         },
         name: {
            bsonType: "string",
            description: "must be a string and is required"
         }
      }
   } },
   validationLevel: "moderate"
} )

Hence, the validation rules will only be applied to the document with the ‘_id’ of 1, since it matches the criteria. In the second document, the validation criteria were not met; hence it will not get validated.

Schema Validation Actions

Schema validation actions apply to those documents that violate the validation criteria in the first place. Hence, there exist the need to provide actions when that happens. MongoDB provides two actions for the same: Error and Warn.

Error: This action rejects insert or update if the validation criteria are not met.

Warn: Warn action will record every violation in the MongoDB log and allow the insert or update operator to be completed. For example:

db.createCollection("students", {
   validator: {$jsonSchema: {
         bsonType: "object",
         required: [ "name", "gpa" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
       
            gpa: {
               bsonType: [ "double" ],
               minimum: 0,
               description: "must be a double and is required"
            }
         }
       
   },
validationAction: “warn”
})

If we insert a document like this:

db.students.insert( { name: "Amanda", status: "Updated" } );

The GPA field is missing, but regardless of this fact, as the validation is set out to ‘warn,’ the document will be saved, and an error message will be recorded in the MongoDB log.

MongoDB Data Modeling Schema Design Patterns

There exist 12 patterns in the MongoDB Data Modeling Schema Design. Let’s discuss them briefly.

MongoDB Data Modeling | Schema Design Pattern
Main schema design patterns and their use cases
  • Approximation: Few writes and calculations are done by saving only approximate values.
  • Attribute: On large documents, index and query only on a subset of fields.
  • Bucket: When streaming data or using IoT applications, the bucket values reduce the number of documents. And Pre-aggregation simplifies data access.
  • Computed: By doing reads at writes or at regular intervals, MongoDB avoids repeated computations.
  • Document Versioning: Document versioning allows different versions of documents to coexist.
  • Extended Reference: We avoid many joins by embedding only frequently embedded fields.
  • Outlier: Data models and queries are designed for typical use cases and not influenced by outliers.
  • Pre-Allocation: when document structure is known in advance, pre-allocation reduces memory reallocation and improves performance. 
  • Polymorphic: Polymorphic is useful when similar documents don’t have the same structure.
  • Schema Versioning: Schema is useful when schema evolves during the application’s lifetime and avoids downtime and technical debt.
  • Subset: A subset is useful when the application uses only some data. Because a smaller dataset will fit into RAM and improve performance.
  • Tree: The tree pattern is suited for hierarchical data. The application needs to manage updates to the graph.

Conclusion

In this blog post, we discussed MongoDB Data Modeling and its components in detail. We started discussing data modeling and then got to know about various types of data models and relationships that exist, and are required to be known if working on a MongoDB Data Modeling project.  

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *