Unique ID Generation
Generate globally unique identifiers in distributed systems. Learn UUID, Snowflake IDs, ULID, database auto-increment, and how to choose the right strategy for your scale.
Unique ID Generation in Distributed Systems
Every record in a database needs a unique identifier. In a single-database system, this is trivial — the database's AUTO_INCREMENT counter handles it. But in a distributed system with multiple databases, multiple data centers, and thousands of writes per second, generating globally unique IDs without a central coordinator is a hard engineering problem.
1. Why Not Just Use Auto-Increment?
Before diving into solutions, let's understand why the naive approach fails at scale.
| Problem | Explanation |
|---|---|
| Single Point of Failure | Auto-increment relies on a single database to generate IDs. If that database goes down, no service can create new records. |
| Bottleneck | Every write across all microservices must ask the same database for the next ID, creating a severe performance bottleneck. |
| Sharding Conflicts | If you shard your database into 3 servers, each generating its own auto-increment IDs, all three will generate id = 1, id = 2, etc. — causing collisions when data is merged or queried across shards. |
| Predictability | Sequential IDs leak business information. An attacker who creates an account and gets user_id = 50042 knows you have roughly 50,000 users. |
Workaround: Ranged Auto-Increment
You can assign non-overlapping ranges to each database shard:
Shard 1: IDs start at 1, increment by 3 → 1, 4, 7, 10, ...
Shard 2: IDs start at 2, increment by 3 → 2, 5, 8, 11, ...
Shard 3: IDs start at 3, increment by 3 → 3, 6, 9, 12, ...
This avoids collisions but is inflexible — adding a 4th shard requires reassigning the increment step across all existing shards, which is operationally painful.
2. UUID (Universally Unique Identifier)
A UUID is a 128-bit identifier, typically displayed as a 36-character string like:
550e8400-e29b-41d4-a716-446655440000
UUID v4 (Random)
UUID v4 generates a completely random 128-bit number. The probability of two randomly generated UUID v4s colliding is astronomically small — approximately $1$ in $2^$ ($5.3 \times 10^$). For context, you would need to generate 1 billion UUIDs per second for about 85 years to have a 50% chance of a single collision.
import { v4 as uuidv4 } from 'uuid';
const userId = uuidv4();
// "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"Pros:
- Fully decentralized: Any node can generate a UUID independently without coordinating with any other node. No central authority needed.
- Simple: One function call. No infrastructure required.
Cons:
- Large: 128 bits (16 bytes) vs. 64 bits for a Snowflake ID. This increases storage, index size, and network transfer costs.
- Terrible for database indexing: UUID v4 is random, not sequential. When used as a primary key in a B-Tree index (PostgreSQL, MySQL), new inserts land in random positions, causing page splits and index fragmentation. This can degrade write performance by 2-10x compared to sequential IDs.
- Not sortable by time: You cannot sort UUIDs to determine which record was created first.
UUID v7 (Time-Ordered — Modern Standard)
UUID v7 was introduced in RFC 9562 (2024) to fix the indexing problem. It embeds a Unix timestamp in the most significant bits, making it time-sortable while remaining globally unique.
UUID v7 structure (128 bits):
┌──────────────────────────────────────────────────────┐
│ 48-bit timestamp (ms) │ ver │ random │ var │ random │
└──────────────────────────────────────────────────────┘
- Pros: Time-sortable. B-Tree friendly (new IDs are always appended at the end of the index). Fully decentralized.
- Cons: Still 128 bits (larger than Snowflake).
[!TIP] If you need UUIDs, always prefer UUID v7 over UUID v4 for database primary keys. The time-ordering eliminates the index fragmentation problem entirely.
3. Twitter Snowflake IDs
Twitter developed the Snowflake algorithm in 2010 to generate unique, time-sortable, 64-bit integer IDs at massive scale (tens of thousands of IDs per second per machine).
Snowflake ID Structure (64 bits)
┌─────────────────────────────────────────────────────────────────┐
│ 0 │ 41-bit Timestamp (ms) │ 10-bit Machine ID │ 12-bit Seq │
└─────────────────────────────────────────────────────────────────┘
1 41 bits 10 bits 12 bits
Total: 1 + 41 + 10 + 12 = 64 bits
| Component | Bits | Purpose | Capacity |
|---|---|---|---|
| Sign bit | 1 | Always 0. Ensures the ID is a positive number. | — |
| Timestamp | 41 | Milliseconds since a custom epoch (e.g., Twitter's epoch: Nov 4, 2010). | ~69 years of timestamps |
| Machine ID | 10 | Unique identifier for the server generating the ID. Supports 1,024 machines. | Split as 5 bits datacenter + 5 bits machine |
| Sequence | 12 | Counter that increments for IDs generated in the same millisecond on the same machine. | 4,096 IDs per millisecond per machine |
Throughput: Each machine can generate $4,096$ unique IDs per millisecond, which equals $\sim4$ million IDs per second per machine. With 1,024 machines, the system supports $\sim4$ billion IDs per second globally.
Implementation
class SnowflakeGenerator {
private sequence = 0n;
private lastTimestamp = -1n;
// Custom epoch: January 1, 2024
private static EPOCH = 1704067200000n;
constructor(
private machineId: bigint // 0 to 1023 (10 bits)
) {
if (machineId < 0n || machineId > 1023n) {
throw new Error("Machine ID must be between 0 and 1023");
}
}
nextId(): bigint {
let timestamp = BigInt(Date.now()) - SnowflakeGenerator.EPOCH;
if (timestamp === this.lastTimestamp) {
// Same millisecond: increment sequence
this.sequence = (this.sequence + 1n) & 4095n; // 12-bit mask
if (this.sequence === 0n) {
// Sequence overflow: wait for next millisecond
while (timestamp <= this.lastTimestamp) {
timestamp = BigInt(Date.now()) - SnowflakeGenerator.EPOCH;
}
}
} else {
this.sequence = 0n; // New millisecond: reset sequence
}
this.lastTimestamp = timestamp;
return (timestamp << 22n) | (this.machineId << 12n) | this.sequence;
}
}
// Usage:
const generator = new SnowflakeGenerator(1n); // Machine #1
console.log(generator.nextId()); // e.g., 7119438508367872001nPros:
- Compact: 64 bits (fits in a
BIGINTcolumn — half the size of a UUID). - Time-sortable: IDs are roughly ordered by creation time.
- Extremely high throughput with zero coordination at runtime.
- Database-friendly: Sequential inserts into B-Tree indexes.
Cons:
- Machine ID assignment: Each server needs a unique Machine ID. This requires a coordination mechanism at startup (e.g., ZooKeeper, etcd, Kubernetes pod ordinal index).
- Clock sensitivity: If the system clock jumps backward (e.g., NTP correction), the generator may produce duplicate IDs or must halt until the clock catches up.
[!WARNING] Clock drift is a real problem. In production Snowflake implementations, you must handle backward clock jumps by either refusing to generate IDs until the clock advances past the last timestamp, or by alerting operators. Google's Spanner uses GPS and atomic clocks specifically to avoid this problem.
4. ULID (Universally Unique Lexicographically Sortable Identifier)
ULID combines the best of UUIDs (no coordination) and Snowflake (time-sortable) into a 128-bit identifier encoded as a compact 26-character string.
ULID Structure
ULID: 01ARZ3NDEKTSV4RRFFQ69G5FAV
┌──────────────────────────────────────────────┐
│ 48-bit timestamp (ms) │ 80-bit randomness │
└──────────────────────────────────────────────┘
10 characters 16 characters
(Crockford Base32) (Crockford Base32)
import { ulid } from 'ulid';
const id1 = ulid(); // "01HXYZ1234ABCDEFGH567890AB"
const id2 = ulid(); // "01HXYZ1234BCDEFGHI678901CD"
console.log(id1 < id2); // true → Lexicographically sortable!Why Crockford's Base32?
- Uses only uppercase letters and digits (0-9, A-Z), excluding I, L, O, U to avoid human confusion.
- URL-safe — no special characters like
-or=. - Case-insensitive —
01ARZ3is the same as01arz3.
Pros:
- Fully decentralized (no Machine ID coordination).
- Time-sortable AND lexicographically sortable (string comparison works correctly).
- URL-safe, human-readable (26 characters vs. 36 for UUID).
- Database-friendly (sequential inserts).
Cons:
- 128 bits (larger than Snowflake's 64 bits).
- The 80 bits of randomness per millisecond means that within the same millisecond, ordering is random (not strictly sequential like Snowflake's sequence counter).
5. Choosing the Right ID Strategy
Decision Matrix
| Criteria | UUID v4 | UUID v7 | Snowflake | ULID |
|---|---|---|---|---|
| Size | 128 bits | 128 bits | 64 bits | 128 bits |
| Sortable | No | Yes | Yes | Yes |
| Coordination | None | None | Machine ID required | None |
| DB Index Friendly | No (random) | Yes | Yes | Yes |
| Human Readable | Medium | Medium | No (big integer) | Good |
| URL Safe | No (hyphens) | No (hyphens) | Yes | Yes |
| Throughput | Unlimited | Unlimited | ~4M/sec/machine | Unlimited |
Real-World Usage
| Company / System | ID Strategy | Why |
|---|---|---|
| Twitter/X | Snowflake | 64-bit integer for high-throughput tweet IDs |
| Discord | Snowflake variant | Message ordering in real-time chat |
| Snowflake variant | Photo IDs across sharded PostgreSQL | |
| MongoDB | ObjectID (custom) | 12-byte ID with timestamp + machine + counter |
| Stripe | Prefixed random | ch_1234abc — type-prefixed random strings for API clarity |
| GitHub | UUID v4 | Repository and user identifiers |
[!IMPORTANT] Never expose auto-increment IDs in public APIs. An attacker can enumerate your resources (e.g.,
GET /api/users/1,GET /api/users/2, ...) to scrape data or estimate your user count. Use opaque identifiers (UUID, ULID, or Snowflake) for any externally visible ID.
6. Handling Clock Drift in Snowflake Systems
Since Snowflake IDs embed a timestamp, they are vulnerable to clock drift — when a server's clock is slightly ahead or behind the true time. This can happen due to:
- NTP (Network Time Protocol) corrections jumping the clock backward.
- Virtual machine migration causing time discontinuities.
- Hardware clock inaccuracies.
Mitigation Strategies
-
Refuse to generate IDs on backward clock jump: If
currentTimestamp < lastTimestamp, the generator throws an error or blocks until the clock catches up. This guarantees monotonicity but causes temporary unavailability. -
Use a logical clock offset: Instead of relying on the wall clock, maintain a logical offset. If the clock jumps backward by 5ms, add a +5ms offset to all future timestamps until the wall clock catches up.
-
Use GPS/Atomic clocks (Google Spanner): Google's TrueTime API uses GPS receivers and atomic clocks in every data center to provide a globally consistent time source with bounded uncertainty. This eliminates clock drift but requires specialized hardware.
[!TIP] For most systems, Strategy 1 (block on backward jump) is sufficient. Clock jumps from NTP are typically small (a few milliseconds) and infrequent. If you're building a globally distributed database, study Google Spanner's TrueTime approach.