Distributed Task Scheduler

Design a distributed task scheduler. Learn task queues, cron-style scheduling, priority queues, worker pools, at-least-once execution, and how to prevent duplicate task execution.

Distributed Task Scheduler

A task scheduler is a system that executes work at a specific time or at recurring intervals. In a single-server application, you can use a cron job or an in-memory timer. In a distributed system, however, running scheduled tasks across multiple servers introduces problems: duplicate execution, missed tasks after crashes, and coordination overhead.

A distributed task scheduler solves these problems by providing reliable, fault-tolerant task scheduling across a cluster of machines.


1. Types of Scheduled Tasks

TypeExampleTrigger
Immediate (async)Send a welcome email after user signup.Event-driven; pushed to a task queue immediately.
DelayedSend a reminder email 24 hours after cart abandonment.Scheduled to execute at a specific future timestamp.
Recurring (Cron)Generate daily sales reports at midnight.Runs on a fixed schedule (e.g., every day, every hour).
Periodic PollingCheck for expired subscriptions every 5 minutes.Repeats at fixed intervals.

2. Architecture of a Distributed Task Scheduler

Components

ComponentResponsibility
Task SchedulerAccepts task submissions, stores them with their execution time, and enqueues them when the time arrives.
Task DatabasePersistent store for task metadata (ID, payload, scheduled time, status, retry count).
Task QueueA work queue (Redis, SQS, or Kafka) that workers pull from. Provides at-least-once delivery.
WorkersStateless processes that pull tasks from the queue, execute them, and report results.

3. Preventing Duplicate Execution

The most critical problem in distributed task scheduling is ensuring that a task is executed exactly once — not zero times (missed), not twice (duplicate).

The Problem

If you have 3 scheduler instances and a cron job set to run at midnight:

Scheduler 1: "It's midnight! Execute daily report task."
Scheduler 2: "It's midnight! Execute daily report task."
Scheduler 3: "It's midnight! Execute daily report task."

Result: The daily report runs 3 times.

Solution 1: Leader Election

Only one scheduler instance is elected as the leader at any time. The leader is responsible for scanning the task database and enqueuing due tasks. If the leader crashes, a new leader is elected.

Implementation: Use a distributed lock via Redis (with the Redlock algorithm), ZooKeeper, or etcd.

Code
async function tryBecomeLeader(): Promise<boolean> {
    // Attempt to acquire a distributed lock with a 30-second TTL
    const acquired = await redis.set(
        "scheduler:leader",
        INSTANCE_ID,
        "EX", 30,   // Lock expires in 30 seconds
        "NX"         // Only set if key does NOT exist
    );
    return acquired !== null;
}
 
async function schedulerLoop() {
    while (true) {
        if (await tryBecomeLeader()) {
            // I am the leader — scan for due tasks
            const dueTasks = await db.query(
                "SELECT * FROM tasks WHERE status = 'PENDING' AND scheduled_at <= NOW()"
            );
            for (const task of dueTasks) {
                await taskQueue.enqueue(task);
                await db.update(task.id, { status: "ENQUEUED" });
            }
            // Renew the lock
            await redis.expire("scheduler:leader", 30);
        }
        await sleep(5000); // Check every 5 seconds
    }
}

Solution 2: Database-Level Locking

Use a database UPDATE ... WHERE clause with a status check to ensure only one scheduler can claim a task:

Code
-- Only one scheduler can transition a task from PENDING to ENQUEUED
UPDATE tasks
SET status = 'ENQUEUED', claimed_by = 'scheduler-1'
WHERE id = 42
  AND status = 'PENDING';
 
-- Returns affected_rows = 1 (success) or 0 (another scheduler already claimed it)

Because the UPDATE is atomic, only one scheduler succeeds. The others see affected_rows = 0 and skip the task.


4. Task States and Lifecycle

Every task moves through a well-defined state machine:

PENDING → ENQUEUED → RUNNING → SUCCEEDED
                        ↓
                      FAILED → RETRYING → ENQUEUED (retry)
                        ↓
                     DEAD (max retries exceeded → Dead Letter Queue)
StateDescription
PENDINGTask is stored in the database, waiting for its scheduled execution time.
ENQUEUEDThe scheduler has placed the task in the work queue.
RUNNINGA worker has picked up the task and is executing it.
SUCCEEDEDThe task completed successfully.
FAILEDThe task failed. Will be retried if retries remain.
RETRYINGThe task is waiting for its next retry attempt (with exponential backoff).
DEADThe task has exhausted all retry attempts. Moved to a Dead Letter Queue for manual inspection.

Handling Stuck Tasks

If a worker crashes mid-execution, the task remains in the RUNNING state forever (it is neither succeeded nor failed). To handle this:

  • Visibility timeout: When a worker picks up a task, it has a fixed time window (e.g., 5 minutes) to complete it. If the task is not acknowledged within that window, the queue makes it visible again for another worker to pick up.
  • Heartbeat: For long-running tasks, the worker periodically sends a heartbeat to extend its visibility timeout. If heartbeats stop, the task is assumed stuck and re-queued.

5. Priority Queues

Not all tasks are equal. A password reset email (urgent) should be processed before a weekly analytics report (low priority).

Implementation with Multiple Queues

High Priority Queue:    [Password reset, Payment retry, Security alert]
Medium Priority Queue:  [Welcome email, Order confirmation]
Low Priority Queue:     [Analytics report, Data cleanup]

Workers check queues in priority order: drain the high-priority queue first, then medium, then low. This ensures urgent tasks are never blocked behind a backlog of low-priority work.

Starvation Prevention

A risk with strict priority is starvation — low-priority tasks never execute because high-priority tasks keep arriving. Mitigate this with:

  • Weighted fair queuing: Pull 60% from high, 30% from medium, 10% from low.
  • Aging: Increase a task's priority the longer it waits. After 1 hour, a low-priority task is promoted to medium.

6. Retry Strategies

When a task fails, the retry strategy determines when to try again:

Exponential Backoff with Jitter

Retry 1: Wait 1 second  + random(0-500ms)
Retry 2: Wait 2 seconds + random(0-500ms)
Retry 3: Wait 4 seconds + random(0-500ms)
Retry 4: Wait 8 seconds + random(0-500ms)
Retry 5: Wait 16 seconds + random(0-500ms)  (max retries reached → Dead Letter Queue)
  • Exponential backoff prevents overwhelming a recovering service with immediate retries.
  • Jitter (random delay) prevents the thundering herd problem — if 1,000 tasks all failed at the same time and retry at exactly the same intervals, they would all hit the recovering service simultaneously.

7. Real-World Systems

SystemTypeKey Feature
Celery (Python)Task queue + schedulerSupports periodic tasks (Celery Beat), priority queues, and multiple broker backends (Redis, RabbitMQ).
Sidekiq (Ruby)Task queueRedis-backed, multi-threaded worker. Supports scheduled jobs and retry with backoff.
BullMQ (Node.js)Task queueRedis-backed, supports delayed jobs, priority queues, rate limiting, and repeatable jobs.
AWS Step FunctionsWorkflow orchestratorVisual workflow editor for chaining Lambda functions with retry, branching, and parallel execution.
TemporalWorkflow engineDurable execution engine. If a worker crashes, the workflow resumes from the exact point of failure (not from the beginning).
Kubernetes CronJobsCron schedulerRuns containerized cron jobs on a schedule. Handles pod lifecycle and failure restart.

[!TIP] In system design interviews, when asked about background processing, always address: (1) how tasks are stored and enqueued, (2) how duplicate execution is prevented (leader election or DB locking), (3) retry strategy with exponential backoff, and (4) Dead Letter Queues for failed tasks.