Explainbytes logoExplainbytes

Key Value Stores

Understanding key-value storage systems

Overview

A Key-Value Store is the simplest form of NoSQL database. It stores data as a collection of key-value pairs, where the key is a unique identifier and the value is an opaque blob. This simplicity enables extremely fast lookups and high write throughput, making key-value stores a fundamental building block in distributed systems.

Data model

┌─────────────────────────────────────────┐
│           Key-Value Store               │
├─────────────────┬───────────────────────┤
│      Key        │        Value          │
├─────────────────┼───────────────────────┤
│  user:1001      │  {"name": "Alice"}    │
│  session:abc123 │  {token, expiry, ...} │
│  cache:product5 │  <binary blob>        │
│  counter:views  │  42                   │
└─────────────────┴───────────────────────┘

Key characteristics

  • Key: Unique identifier (string, number, or binary). Often hashed for distribution.
  • Value: Opaque blob — the store doesn't interpret the structure (JSON, binary, serialized objects)
  • Metadata: Optional TTL, version/timestamp, content type

Common use cases

Use CaseExampleWhy Key-Value?
CachingRedis, MemcachedSub-millisecond reads
Session storageUser sessions by session IDFast lookups, TTL support
Feature flagsConfig by feature nameSimple reads, atomic updates
Rate limitingCounters per user/IPAtomic increment operations
User profilesProfile by user IDSchema flexibility
Shopping cartsCart by session/userFast reads/writes

Core operations

Code
# Basic operations
PUT(key, value)      # Insert or update
GET(key) → value     # Retrieve by key
DELETE(key)          # Remove entry
 
# Extended operations
EXISTS(key) → bool   # Check existence
EXPIRE(key, ttl)     # Set time-to-live
INCR(key)            # Atomic increment
MGET(keys[])         # Batch get
MPUT(pairs[])        # Batch put

Design considerations

Performance

  • O(1) lookups: Hash-based key access provides constant-time reads
  • In-memory vs. persistent: Trade-off between speed and durability
  • Write patterns: Append-only logs, LSM trees for write-heavy workloads

Data organization

┌──────────────────────────────────────────────────────────┐
│                    In-Memory Store                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│  │  Hash Table │  │  Skip List  │  │   Bloom     │       │
│  │   (index)   │  │  (sorted)   │  │   Filter    │       │
│  └─────────────┘  └─────────────┘  └─────────────┘       │
└──────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────┐
│                   Persistent Storage                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│  │     WAL     │  │   SSTable   │  │  Snapshots  │       │
│  │ (write log) │  │  (sorted)   │  │  (backup)   │       │
│  └─────────────┘  └─────────────┘  └─────────────┘       │
└──────────────────────────────────────────────────────────┘

Partitioning (Sharding)

Distribute keys across multiple nodes to scale horizontally.

Consistent hashing

        Node A              Node B              Node C
           │                   │                   │
    ┌──────┴──────┐     ┌──────┴──────┐     ┌──────┴──────┐
    │  Keys 0-33  │     │  Keys 34-66 │     │  Keys 67-99 │
    └─────────────┘     └─────────────┘     └─────────────┘

Hash ring with virtual nodes:
    0 ────── A ────── B ────── A ────── C ────── B ────── 100
              │               │               │
           key:user1       key:sess2       key:cart3

Benefits of consistent hashing:

  • Adding/removing nodes moves only K/N keys (K = total keys, N = nodes)
  • Virtual nodes balance load across heterogeneous hardware

Partitioning strategies

StrategyProsCons
Hash partitioningEven distributionNo range queries
Range partitioningRange queriesHot spots possible
Consistent hashingMinimal rebalancingComplexity

Replication

Keep copies of data on multiple nodes for availability and durability.

┌─────────────────────────────────────────────────────────┐
│                    Replication                          │
│                                                         │
│   Write ──► Leader ──► Follower 1                       │
│                    └──► Follower 2                      │
│                    └──► Follower 3                      │
│                                                         │
│   Read  ◄── Any replica (eventual consistency)          │
│         ◄── Leader only (strong consistency)            │
└─────────────────────────────────────────────────────────┘

Replication strategies

  • Leader-follower: Writes go to leader, replicated to followers
  • Leaderless: Any node accepts writes (Dynamo-style)
  • Synchronous: Wait for replicas before acknowledging (durable, slower)
  • Asynchronous: Acknowledge immediately, replicate in background (fast, potential data loss)

Consistency models

Tunable consistency (Dynamo-style)

N = Replication factor (total replicas)
W = Write quorum (replicas that must acknowledge write)
R = Read quorum (replicas that must respond to read)

Strong consistency: W + R > N
Example: N=3, W=2, R=2 → Always read latest write
ConfigurationWRGuarantee
Strong consistency22Always latest (N=3)
Fast writes13Durable reads
Fast reads31Durable writes
Eventual11Fastest, may be stale

Conflict resolution

When concurrent writes occur:

  • Last-write-wins (LWW): Timestamp-based, simple but can lose data
  • Vector clocks: Track causality, detect conflicts
  • CRDTs: Conflict-free data types that auto-merge
  • Application-level: Return conflicts to application to resolve

TTL and eviction

Expiration policies

Code
# Set TTL on write
SET user:session:123 "data" EX 3600  # Expires in 1 hour
 
# Passive expiration: Check on access
GET key → if expired → return null, delete
 
# Active expiration: Background thread samples and deletes

Eviction policies (when memory is full)

PolicyDescriptionUse Case
LRULeast Recently UsedGeneral caching
LFULeast Frequently UsedHot data retention
TTLExpire oldest TTL firstSession data
RandomRandom evictionSimple, fast
No evictionReject writesCritical data

Example systems

SystemTypeKey Features
RedisIn-memoryRich data types, persistence, clustering
MemcachedIn-memorySimple, multi-threaded, distributed
DynamoDBManagedAuto-scaling, serverless, global tables
etcdDistributedStrong consistency, leader election
RocksDBEmbeddedLSM-tree, high write throughput

Designing a distributed key-value store

High-level architecture

┌─────────────────────────────────────────────────────────────┐
│                        Clients                              │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Coordinator Layer                        │
│         (routing, load balancing, request handling)         │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │ Shard 1  │   │ Shard 2  │   │ Shard 3  │
        │┌────────┐│   │┌────────┐│   │┌────────┐│
        ││Leader  ││   ││Leader  ││   ││Leader  ││
        │├────────┤│   │├────────┤│   │├────────┤│
        ││Follower││   ││Follower││   ││Follower││
        ││Follower││   ││Follower││   ││Follower││
        │└────────┘│   │└────────┘│   │└────────┘│
        └──────────┘   └──────────┘   └──────────┘

Design checklist

  1. Partitioning: Use consistent hashing with virtual nodes
  2. Replication: 3 replicas per partition for durability
  3. Consistency: Tunable quorums (default: eventual, option for strong)
  4. Failure detection: Gossip protocol or heartbeats
  5. Handoff: Hinted handoff for temporary failures
  6. Anti-entropy: Merkle trees to sync replicas
  7. Persistence: WAL + periodic snapshots

Operational considerations

Monitoring metrics

  • Latency: p50, p95, p99 for GET/PUT operations
  • Throughput: Operations per second
  • Cache hit ratio: For caching use cases
  • Memory usage: Eviction rates, fragmentation
  • Replication lag: Time for replicas to catch up

Common issues

IssueCauseSolution
Hot keysSkewed accessKey hashing, local caching
Memory pressureData growthEviction policies, sharding
Replication lagHigh write volumeTune async settings
Split brainNetwork partitionQuorum-based consensus

Further reading