Explainbytes logoExplainbytes

Introduction To Database Management Systems

Getting started with database management system fundamentals

What is a Database Management System (DBMS)?

A DBMS is software that stores, retrieves, and manages data in databases. It provides abstractions for data models (relational, document, key-value), guarantees for transactions, and interfaces for querying and maintaining data.

Databases are at the heart of most applications — understanding DBMS internals helps you design data models, choose the right storage engines, and reason about consistency and performance.

┌─────────────────────────────────────────────────────────────────┐
│                       DBMS Architecture                         │
├─────────────────────────────────────────────────────────────────┤
│  Client API  ──►  Query Processor  ──►  Storage Engine  ──►  Disk │
└─────────────────────────────────────────────────────────────────┘

Why Learn DBMS Internals?

  • Data correctness: Ensuring transactions are ACID when needed.
  • Performance: Indexes, query plans, and storage formats determine speed.
  • Scalability: Replication and sharding strategies let systems scale.

What We'll Cover

From models to distributed databases:

Fundamentals

TopicDescription
Data ModelsRelational, document, key-value, graph
IndexesB-trees, hash indexes, covering indexes
TransactionsACID properties, isolation levels
StorageRow vs column stores, write-ahead logs

Building Blocks

TopicDescription
Query OptimizationParsing, planning, cost-based optimization
ReplicationLeader-follower, multi-leader, eventual consistency
ShardingHorizontal partitioning, consistent hashing
CachingQuery caches, materialized views, memcached/redis patterns

Advanced Topics

TopicDescription
Distributed TransactionsTwo-phase commit, optimistic concurrency
Consensus for MetadataPaxos/Raft for config/leader election
Analytical DatabasesOLAP, columnar storage, vectorized execution

Real-World Examples

SystemKey Concepts
PostgreSQLMVCC, planner, extensibility
MySQL/InnoDBB-tree indexes, clustered indexes
MongoDBDocument model, replica sets, sharding

Prerequisites

┌─────────────────────────────────────────────────────────────────┐
│                    Recommended Background                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ✓ SQL basics                                                   │
│  ✓ Basic data structures & algorithms                           │
│  ✓ Understanding of transactions and concurrency at high level  │
│                                                                 │
│  Nice to have:                                                  │
│  ○ Knowledge of a NoSQL datastore (Redis, MongoDB)              │
│  ○ Familiarity with distributed systems concepts                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

DBMS Mindset

1. Data is the system's state — model it well

Design schemas for access patterns. Denormalize when necessary for read performance, but be mindful of update complexity.

2. Think in terms of cost

Indexing speeds reads but slows writes and uses space. Choose trade-offs based on read/write profile.

3. Boundaries & Consistency

Decide where strong consistency is required and where eventual consistency suffices. Use idempotent operations to handle retries.

Important Concepts & Numbers

ConceptNotes
MVCCMulti-version concurrency control avoids read locks for many workloads
WALWrite-ahead log for durability and crash recovery
Replication lagPractical indicator of eventual consistency

Key Components (Illustrative)

Indexes

TypeUse case
B-treeRange queries, ordered scans
HashExact-match lookups
GiST/GINFull-text, geo, custom indexing

Transactions & Isolation

Isolation LevelGuaranteesAnomalies
Read UncommittedLowestDirty reads
Read CommittedNo dirty readsNon-repeatable reads possible
Repeatable ReadRepeatable readsPhantom reads possible (DB-specific)
SerializableStrongestHigher contention/abort rates

Scaling Strategies

  • Vertical scaling: Bigger machines — simple but limited.
  • Read replicas: Offload reads to followers.
  • Sharding: Partition data across nodes; requires routing and resharding logic.

Observability & Tools

ToolPurpose
EXPLAIN / EXPLAIN ANALYZEInspect query plans
pg_stat_activity / performance_schemaMonitor active queries
Backup toolsConsistent backups, point-in-time recovery

How to Use This Documentation

  1. Start with data models and indexing.
  2. Practice by designing schemas for concrete use cases (e-commerce, analytics).
  3. Experiment with replication and small-scale sharding in a sandbox.

Key Takeaways

┌─────────────────────────────────────────────────────────────────┐
│                      DBMS Design Principles                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Model data for queries, not just storage                     │
│  2. Balance read/write trade-offs with indexes and caching       │
│  3. Plan for durability and backups                              │
│  4. Use replication/sharding appropriately for scale             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Next Steps

  1. Relational Databases — Schema design, normalization, indexes
  2. NoSQL & Key-Value Stores — Use cases and patterns
  3. Distributed Databases — Replication, sharding, consensus

Happy modeling — data matters!