Data Modeling

Why Data Modeling Matters

A data model defines how information is structured, connected, and stored before it is ever written to a database.

Unstructured

Alice

Order#21

Paid

ItemA

disconnected data

→

modeling

Structured

👤 Alice📦 21✔ Paid

👤 Bob📦 45✔ Pending

connected relationships

impact on system behavior

latency

complexity

scalability

how you structure data determines how systems perform and scale

Backend systems depend on structured data to function correctly.
Engineers must decide what data exists and how it relates.
The organization of data directly impacts performance and scalability.

Details

Backend systems are built around data. Every feature—users, posts, orders, payments—relies on storing and retrieving structured information reliably.

Before choosing a database or writing queries, engineers must define the data model. This includes identifying what data exists, how different pieces of data relate to each other, and how everything should be organized.

The structure chosen at this stage has long-term consequences. A poor data model leads to inefficient queries, duplicated data, and difficulty scaling the system. Fixing it later is expensive and often requires major refactoring.

The flow is straightforward: the application defines requirements, the data model organizes those requirements, and the database stores the result. Without a clear data model, the database becomes disorganized and hard to maintain.

Entities

Entities represent the core objects in a system—the fundamental pieces of data the application manages.

👤

User

name

📦

Order

userId

total

🛒

Product

name

price

each entity defines its own data, but connects to others through relationships

Entities model real-world concepts like users, orders, or products that the system needs to track.
Each entity defines a distinct type of data with its own attributes and behavior.
Entities map directly to storage structures such as tables, collections, or documents.

Details

Entities are the foundation of any data model. They represent the main objects that exist in the system, such as users, posts, comments, orders, or products. These are derived directly from what the application needs to support.

Each entity groups together related data. For example, a User entity may include fields like name, email, and created date, while a Post entity includes title and content. This grouping keeps data organized and meaningful.

In practice, entities map to database structures. In relational databases, they become tables. In NoSQL systems, they may be represented as collections or documents. This mapping is what connects application concepts to actual stored data.

Defining entities clearly early on is critical. Poorly defined entities lead to confusion, duplicated data, and inefficient queries later in development.

Relationships Between Entities

Relationships define how entities are connected, enabling systems to link and query related data efficiently.

one-to-one connection

Relationships describe how one entity is associated with another.
Common types include one-to-one, one-to-many, and many-to-many.
These connections determine how data is stored and queried.

Details

Entities rarely exist in isolation. Most systems require data to be connected—for example, users create posts, and posts have comments. These connections are defined through relationships.

There are several common relationship types. A one-to-one relationship means each record in one entity corresponds to exactly one in another, such as a user and their profile. A one-to-many relationship means one entity can be linked to many others, such as a user having multiple posts. A many-to-many relationship allows multiple entities on both sides to be connected, such as students enrolled in multiple courses.

These relationships are implemented using references, such as foreign keys in relational databases. They allow the system to link data across entities without duplicating it unnecessarily.

Properly defining relationships is critical for querying data efficiently. It determines how easily the system can retrieve related information and directly impacts performance and data consistency.

Schema Design

A schema defines the structure of data, ensuring consistency, integrity, and clear organization in the database.

Incoming Data

👤 Alice
📧 alice@email.com

Schema Rules

Namerequired

Emailunique

Typestring

Stored Result

✅ Stored

Schema filters data — only valid, structured entries are stored.

Schemas define fields, data types, and relationships for each entity.
Constraints enforce rules like required values and uniqueness.
A well-designed schema keeps data consistent and predictable.

Details

A schema is the blueprint for how data is stored in a database. It defines what fields exist, what types of data they hold, and how different entities are connected.

For example, a Users structure might include fields like id, email, and created_at, while a Posts structure includes id, user_id, title, and content. Each field has a defined data type, such as string, integer, or timestamp.

Schemas also enforce constraints. These rules ensure data integrity—for example, requiring certain fields to exist, enforcing unique values, or maintaining valid relationships between entities.

A well-defined schema prevents inconsistent or invalid data from entering the system. It provides a clear contract between the application and the database, making the system easier to reason about and maintain.

Normalization

Normalization organizes data into separate structures to reduce duplication and maintain consistency.

Before (Duplicated)

👤 Alice📘 Math

👤 Alice📘 Science

👤 Bob📘 Math

repeated user data

✂️

split

After (Structured)

Users

Alice

Bob

Courses

Math

Science

linked instead of duplicated

Normalization removes repetition by separating data and linking it.

Data is split into multiple entities to avoid repeating the same information.
Relationships are used to connect related data instead of duplicating it.
This improves consistency, storage efficiency, and maintainability.

Details

Normalization is the process of structuring data so that each piece of information is stored only once. Instead of repeating the same data across multiple records, it is separated into distinct entities.

For example, user information like name should not be repeated in every post. Instead, user data is stored in a Users structure, and posts reference the user through a user_id.

This approach reduces redundancy and ensures consistency. If a user's name changes, it only needs to be updated in one place rather than across many records.

Normalization also improves storage efficiency and simplifies updates. However, it often requires joining data across multiple entities when querying, which can impact performance if not managed carefully.

Denormalization

Denormalization duplicates data intentionally to improve read performance and reduce query complexity.

Normalized (Join Required)

👤 Alice

ID:1

→

📘 Math

UserID:1

multiple lookups (slower)

⚖️

Denormalized (Pre-joined)

📄 Alice - Math

📄 Alice - Science

→

direct read (fast)

✏️ Update: Alice → Alicia

Alicia - Math

Alice - Science

duplicated data can fall out of sync

Data is duplicated to avoid expensive joins across multiple entities.
This allows faster reads and simpler queries in high-traffic systems.
The tradeoff is potential inconsistency if duplicated data is not kept in sync.

Details

Denormalization takes the opposite approach of normalization. Instead of strictly separating data, some information is intentionally duplicated across entities to make queries faster.

For example, instead of storing only a user_id in posts, the system may also store user_name directly in the Posts structure. This avoids needing to join with the Users data every time posts are queried.

This approach improves performance, especially in read-heavy systems where minimizing database joins is critical. It also simplifies query logic since all needed data can often be retrieved in a single operation.

The downside is consistency. If duplicated data changes, such as a user's name, it must be updated in multiple places. If not handled properly, this can lead to stale or inconsistent data across the system.

Example Data Model

Real-world applications combine entities and relationships to form structured data models that reflect how the system operates.

👤

User

📦

Order

🛒

Product

💳

Payment

creates

Entities form structure. Relationships define how data flows between them.

Applications define multiple entities that work together as a system.
Relationships connect these entities to represent real interactions.
This structure enables efficient querying and data organization.

Details

A typical social system can be modeled using three main entities: users, posts, and comments. Each represents a different part of the application’s functionality.

Users create posts, and posts can have multiple comments. Comments are tied back to both the post and the user who created them. These relationships define how data flows through the system.

In a database, this structure is represented through separate tables or collections, such as Users, Posts, and Comments. Relationships are implemented using references, such as user identifiers in posts and post identifiers in comments.

This example shows how abstract concepts like entities and relationships come together in practice. A well-structured data model mirrors how the application behaves, making it easier to store, retrieve, and manage data efficiently.

Designing Data Models

Data modeling is a structured process that transforms application requirements into a scalable and efficient database design.

Identify Entities

Define Relationships

Optimize Structure

👤

📦

🛒

Entities become connected. Relationships define how data interacts.

Start by identifying core entities from application requirements.
Define how those entities relate to each other.
Design schemas, normalize data, and add indexes for performance.

Details

Designing a data model begins with understanding the application’s requirements. Engineers first identify the main entities the system needs, such as users, orders, or products.

Next, relationships between these entities are defined. This determines how data is connected and how it will be queried. Clear relationships are essential for building efficient and understandable systems.

Once entities and relationships are established, schemas are designed. This includes defining fields, data types, and constraints, followed by normalization to reduce duplication and improve consistency.

Finally, indexes are added to optimize performance for common queries. This step ensures the system can scale and handle real-world usage efficiently.

This entire process turns high-level application requirements into a structured data model that the database can store and serve reliably.

Question Section

Try to answer in your own words first, then flip the card to check.

1 / 5

Why Data Modeling Matters

A data model defines how information is structured, connected, and stored before it is ever written to a database.

Unstructured

Alice

Order#21

Paid

ItemA

disconnected data

→

modeling

Structured

👤 Alice📦 21✔ Paid

👤 Bob📦 45✔ Pending

connected relationships

impact on system behavior

latency

complexity

scalability

how you structure data determines how systems perform and scale

Backend systems depend on structured data to function correctly.
Engineers must decide what data exists and how it relates.
The organization of data directly impacts performance and scalability.

Details

Backend systems are built around data. Every feature—users, posts, orders, payments—relies on storing and retrieving structured information reliably.

Entities

Entities represent the core objects in a system—the fundamental pieces of data the application manages.

👤

User

name

📦

Order

userId

total

🛒

Product

name

price

each entity defines its own data, but connects to others through relationships

Entities model real-world concepts like users, orders, or products that the system needs to track.
Each entity defines a distinct type of data with its own attributes and behavior.
Entities map directly to storage structures such as tables, collections, or documents.

Details

Defining entities clearly early on is critical. Poorly defined entities lead to confusion, duplicated data, and inefficient queries later in development.

Relationships Between Entities

Relationships define how entities are connected, enabling systems to link and query related data efficiently.

one-to-one connection

Relationships describe how one entity is associated with another.
Common types include one-to-one, one-to-many, and many-to-many.
These connections determine how data is stored and queried.

Details

Entities rarely exist in isolation. Most systems require data to be connected—for example, users create posts, and posts have comments. These connections are defined through relationships.

These relationships are implemented using references, such as foreign keys in relational databases. They allow the system to link data across entities without duplicating it unnecessarily.

Properly defining relationships is critical for querying data efficiently. It determines how easily the system can retrieve related information and directly impacts performance and data consistency.

Schema Design

A schema defines the structure of data, ensuring consistency, integrity, and clear organization in the database.

Incoming Data

👤 Alice
📧 alice@email.com

Schema Rules

Namerequired

Emailunique

Typestring

Stored Result

✅ Stored

Schema filters data — only valid, structured entries are stored.

Schemas define fields, data types, and relationships for each entity.
Constraints enforce rules like required values and uniqueness.
A well-designed schema keeps data consistent and predictable.

Details

A schema is the blueprint for how data is stored in a database. It defines what fields exist, what types of data they hold, and how different entities are connected.

Schemas also enforce constraints. These rules ensure data integrity—for example, requiring certain fields to exist, enforcing unique values, or maintaining valid relationships between entities.

Normalization

Normalization organizes data into separate structures to reduce duplication and maintain consistency.

Before (Duplicated)

👤 Alice📘 Math

👤 Alice📘 Science

👤 Bob📘 Math

repeated user data

✂️

split

After (Structured)

Users

Alice

Bob

Courses

Math

Science

linked instead of duplicated

Normalization removes repetition by separating data and linking it.

Data is split into multiple entities to avoid repeating the same information.
Relationships are used to connect related data instead of duplicating it.
This improves consistency, storage efficiency, and maintainability.

Details

For example, user information like name should not be repeated in every post. Instead, user data is stored in a Users structure, and posts reference the user through a user_id.

This approach reduces redundancy and ensures consistency. If a user's name changes, it only needs to be updated in one place rather than across many records.

Denormalization

Denormalization duplicates data intentionally to improve read performance and reduce query complexity.

Normalized (Join Required)

👤 Alice

ID:1

→

📘 Math

UserID:1

multiple lookups (slower)

⚖️

Denormalized (Pre-joined)

📄 Alice - Math

📄 Alice - Science

→

direct read (fast)

✏️ Update: Alice → Alicia

Alicia - Math

Alice - Science

duplicated data can fall out of sync

Data is duplicated to avoid expensive joins across multiple entities.
This allows faster reads and simpler queries in high-traffic systems.
The tradeoff is potential inconsistency if duplicated data is not kept in sync.

Details

Denormalization takes the opposite approach of normalization. Instead of strictly separating data, some information is intentionally duplicated across entities to make queries faster.

Example Data Model

Real-world applications combine entities and relationships to form structured data models that reflect how the system operates.

👤

User

📦

Order

🛒

Product

💳

Payment

creates

Entities form structure. Relationships define how data flows between them.

Applications define multiple entities that work together as a system.
Relationships connect these entities to represent real interactions.
This structure enables efficient querying and data organization.

Details

A typical social system can be modeled using three main entities: users, posts, and comments. Each represents a different part of the application’s functionality.

Users create posts, and posts can have multiple comments. Comments are tied back to both the post and the user who created them. These relationships define how data flows through the system.

Designing Data Models

Data modeling is a structured process that transforms application requirements into a scalable and efficient database design.

Identify Entities

Define Relationships

Optimize Structure

👤

📦

🛒

Entities become connected. Relationships define how data interacts.

Start by identifying core entities from application requirements.
Define how those entities relate to each other.
Design schemas, normalize data, and add indexes for performance.

Details

Designing a data model begins with understanding the application’s requirements. Engineers first identify the main entities the system needs, such as users, orders, or products.

Finally, indexes are added to optimize performance for common queries. This step ensures the system can scale and handle real-world usage efficiently.

This entire process turns high-level application requirements into a structured data model that the database can store and serve reliably.

Question Section

Try to answer in your own words first, then flip the card to check.

1 / 5

Data Modeling

Why Data Modeling Matters

Entities

Relationships Between Entities

Schema Design

Normalization

Denormalization

Example Data Model

Designing Data Models

Question Section

Related lessons

Cookie Consent

Data Modeling

Why Data Modeling Matters

Entities

Relationships Between Entities

Schema Design

Normalization

Denormalization

Example Data Model

Designing Data Models

Question Section

Related lessons