UUID v4 vs v7: collision math, database indexes, forensic decoding — obfus.link

1. Insight

Insight

The problem this article addresses and why it matters.

v4 was the answer for fifteen years, and now it isn't

UUID v4 — 122 bits of randomness, no structure — became the default for "I need a unique identifier" in roughly 2008 and held that position until the late 2010s when database performance teams started measuring what v4 ids actually cost. The problem is the absence of structure: v4 ids are uniformly random, so inserting them into a B-tree index produces random insertion points across the index pages. Every insert touches a random page, every page eventually gets touched by every concurrent writer, and the index becomes a hot mess of cache misses, page splits, and write amplification.

RFC 9562 (published 2024, replacing the older RFC 4122) introduced UUID v7 to fix this: 48 bits of Unix timestamp prefix + 74 bits of randomness. The timestamp prefix is monotonically increasing per millisecond, so inserts arrive at the right edge of the index, page splits drop dramatically, and the cache footprint shrinks. Major databases now have native v7 generation functions: PostgreSQL 17 ships uuidv7(), CockroachDB has it, MySQL 8.4 has it.

Why "just use v7 everywhere" isn't the answer

v7 trades one weakness for another. Because the timestamp is in the prefix, two ids created in the same second from different services are adjacent in the index — that's the whole point — but it also means an attacker who sees one of your ids can predict the rough creation time of every other id in the same shard. That's a privacy regression if your business has timestamps as a side-channel concern (medical records, journalist sources, security incident IDs).

v4 still wins where index ordering is irrelevant or where timestamp leakage matters more than write performance. The right answer is "depends on the workload" — and the only way to decide is to look at the collision math, the database's btree behaviour with each version, and the privacy model of the system you're shipping.

What this article delivers

End-to-end walkthroughs of the three modes: generation with auto-decode annotations, forensic decoding (paste any UUID, get version + embedded timestamp + variant + clock sequence + node id), and collision analysis with a database migration helper. We cover the birthday-problem math for picking version + bit count, the index-strategy difference between v4 and v7 in PostgreSQL / MySQL / DynamoDB, and the cases where neither version is right and you should reach for ULID or KSUID instead.

2. Intent

Intent

What you will be able to do after reading.

By the end of this article you will be able to:

Generate batches of v4 or v7 UUIDs with optional monotonic-sequence guarantees within a batch
Decode any UUID (versions 1, 3, 4, 5, 6, 7, 8) to recover the embedded timestamp, variant, clock sequence, and node id where present
Calculate birthday-problem collision probabilities for a target dataset size and pick the right version
Read the database migration helper output that produces the exact column type, index strategy, and DDL for PostgreSQL / MySQL / CockroachDB / DynamoDB / SQLite
Identify the cases where neither v4 nor v7 fits and reach for ULID or KSUID instead

The Examples section walks through generation with annotations, decoding an unknown UUID found in a log file, and analysing collision risk for a 10-billion-row dataset.

3. Examples

Examples

Annotated code and worked scenarios.

Before / after: annotated v7 generation

You're seeding a test database and want 1000 v7 UUIDs with monotonic ordering so you can verify the index behaviour matches what production will see:

Before: call a v7 library in a tight loop, accept that two calls in the same millisecond might produce ids whose order depends on the library's internal counter rather than wall-clock order. Hope the docs were accurate.

After:

uuidGenerator({
  mode: 'generate',
  version: 7,
  count: 1000,
  sequenceMode: true,
  annotate: true,
});

// uuids: [
//   '01979a3c-1f2d-7000-8000-000000000000',
//   '01979a3c-1f2d-7000-8000-000000000001',
//   '01979a3c-1f2d-7000-8000-000000000002',
//   ...
// ]
// annotations: [
//   { uuid: '01979a3c-1f2d-7...', timestamp: 1747929892141, sequence: 0, isoTime: '2026-05-18T15:24:52.141Z' },
//   { uuid: '01979a3c-1f2d-7...', timestamp: 1747929892141, sequence: 1, isoTime: '2026-05-18T15:24:52.141Z' },
//   ...
// ]

sequenceMode: true is the guarantee: even when two ids generate inside the same millisecond, the sequence counter increments to keep strict monotonic order in the index. Without it, batch insertion order is non-deterministic for within-ms pairs — which breaks tests that assert ordering.

Before / after: forensic decoding

A UUID appears in a log file. You don't know which service emitted it or when:

Before: assume it's v4 (the most common), accept that you can't recover the creation time, move on with less context.

After:

uuidGenerator({
  mode: 'decode',
  uuid: '01979a3c-1f2d-7a4f-89cc-3b7e8d2a1c0f',
});

// decoded: {
//   uuid:       '01979a3c-1f2d-7a4f-89cc-3b7e8d2a1c0f',
//   version:    7,
//   variant:    'RFC4122',
//   timestamp:  1747929892141,
//   isoTime:    '2026-05-18T15:24:52.141Z',
//   isNil:      false,
//   isMax:      false,
//   malformed:  [],
// }

The first 48 bits decode to a Unix timestamp of 2026-05-18T15:24:52.141Z — that's when the id was created. If you're chasing a bug from a log entry, the embedded timestamp tells you which server log file to grep without needing the original created_at column.

A v1 UUID decode goes further:

uuidGenerator({
  mode: 'decode',
  uuid: '6ba7b810-9dad-11d1-80b4-00c04fd430c8',
});

// decoded: {
//   version:       1,
//   timestamp:     978391000000,  // ms precision, Gregorian-epoch shifted
//   isoTime:       '2001-01-02T12:24:00.000Z',
//   clockSequence: 0x80b4,
//   node:          '00c04fd430c8',  // the MAC address (or random replacement) of the issuing machine
//   variant:       'RFC4122',
// }

The node field on a real v1 id is often a MAC address — useful for forensic work (which machine generated this?), problematic for privacy (your hardware identifier leaks).

Before / after: collision analysis + DB migration

You're building a system that will ingest 10 billion events over its lifetime. Should you use v4 or v7?

uuidGenerator({
  mode: 'analyze-collisions',
  targetScale: 10_000_000_000,
  targetDatabase: 'postgresql',
  existingUuids: [], // optional — provide some to check for duplicates
});

collisionAnalysis: {
  existingCount:       0,
  duplicatesFound:     [],
  targetScale:         10000000000,
  birthdayProbability: '1 in 6.8×10^18',
  recommendedVersion:  7,
  reason: 'At 10B inserts, v7\'s time-ordered prefix delivers 5-10× better
           btree insert performance than v4. Collision probability is
           negligible for either version at this scale.',
  dbMigration: {
    database:         'postgresql',
    columnType:       'UUID',
    indexStrategy:    'btree — v7 is monotonically prefix-increasing, so inserts always land at the right edge of the index. Page splits drop to near-zero.',
    createStatement: `CREATE TABLE events (
  id          UUID PRIMARY KEY DEFAULT uuidv7(),
  user_id     UUID NOT NULL REFERENCES users(id),
  payload     JSONB NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);`,
  },
}

The dbMigration.createStatement is paste-ready — paste it into a migration file and ship. The indexStrategy line is what the analysis tool earns its keep on; the difference between v4 and v7 btree performance is documented but rarely tested before the production volume reveals it.

When humans use this

The decode mode is the highest-frequency use: someone debugging an issue pastes a UUID from a log and instantly knows which version it is, what time it was created, and whether it's malformed. The collision analyser is the lower-frequency but higher-leverage use — picking the right version at table-creation time avoids a multi-year tail of bad index decisions.

When agents use this

The agent-pipeline value is concentrated in two places:

Database scaffolding. An agent generating migrations for a new service uses the collision analyser to pick the version, generates the DDL from the migration helper, writes the column type into the schema. The decision is deterministic and reviewable rather than hand-waved.
Log analysis pipeline. An agent processing logs with embedded UUIDs decodes each one to recover the creation timestamp. With v7 ids that's free; with v4 ids the agent learns "no timestamp available" and falls back to the surrounding log timestamp. Either way the agent's pipeline gets richer metadata than the log alone provides.

Edge cases

Mixed-version datasets

A system that started with v4 and migrated to v7 has both versions in production. The decode mode handles them transparently; the collision analyser warns when given mixed-version input and recommends a migration strategy.

Nil and Max UUIDs

00000000-0000-0000-0000-000000000000 (nil) and ffffffff-ffff-ffff-ffff-ffffffffffff (max, defined in RFC 9562) decode to version: 0 and version: 15 respectively with isNil / isMax flags. Some libraries crash on these; this tool surfaces them as sentinel values.

Hyphen-stripping and case variants

'01979a3c1f2d7a4f89cc3b7e8d2a1c0f' (no hyphens), '01979A3C-1F2D-7A4F-89CC-3B7E8D2A1C0F' (uppercase), '{01979a3c-1f2d-7a4f-89cc-3b7e8d2a1c0f}' (braced) all decode correctly. The format parameter on generate mode chooses the output shape.

Custom node IDs and namespaces (v8)

version: 8 accepts user-defined randomness in the layout. The generator supports custom nodeId (48-bit hex) and namespace parameters; the decoder reports the variant but cannot infer semantics without the issuing system's specification.

4. Documentation

Documentation

Reference signatures, edge cases, and lookup tables.

Input parameters

Field	Type	Required	Default	Description
`mode`	`'generate' \| 'decode' \| 'analyze-collisions'`	✓	—	Workflow selector
`version`	`7 \| 8`	for generate mode	`7`	UUID version. v4 is supported via the seed mode parameter on legacy systems.
`count`	`number`	✗	`1`	Generation batch size (1-100)
`format`	`'standard' \| 'no-dashes' \| 'braces' \| 'urn'`	✗	`'standard'`	Output formatting
`sequenceMode`	`boolean`	✗	`false`	Guarantee strict monotonic order within a batch
`annotate`	`boolean`	✗	`false`	Return decoded annotations (timestamp, sequence) alongside each generated UUID
`namespace`	`string`	for v8	—	Custom namespace for v8 layout
`nodeId`	`string`	for v8	—	48-bit hex node identifier for v8 layout
`uuid`	`string`	for decode mode	—	Any UUID — version is auto-detected
`existingUuids`	`string[]`	for analyze-collisions	`[]`	Existing UUIDs to check for duplicates
`targetScale`	`number`	for analyze-collisions	—	Expected total row count for birthday-problem probability
`targetDatabase`	`'postgresql' \| 'mysql' \| 'cockroachdb' \| 'dynamodb' \| 'sqlite'`	for analyze-collisions	—	Drives the migration helper output

Output shape (decode mode)

decoded: {
  uuid:           string;
  version:        number;     // 0 (nil), 1, 3, 4, 5, 6, 7, 8, 15 (max)
  variant:        'RFC4122' | 'NCS' | 'Microsoft' | 'Future';
  timestamp?:     number;     // ms precision, present for v1, v6, v7
  isoTime?:       string;     // ISO 8601, present for time-based versions
  clockSequence?: number;     // present for v1, v6
  node?:          string;     // hex, present for v1, v6 — MAC or random
  isNil:          boolean;
  isMax:          boolean;
  malformed:      string[];   // empty when valid; non-empty lists byte-level issues
}

UUID v4 vs v7: which version to use

The choice between v4 and v7 is the most consequential UUID decision in 2026. v4 has been the default since 2005; v7 (RFC 9562 §5.7, 2024) shipped to address v4's well-known database insert performance problem.

Property	v4	v7
RFC	9562 (formerly 4122)	9562 §5.7
Embedded timestamp	No	Yes (48-bit ms since Unix epoch)
Random bits	122	74
btree insert pattern	Random across pages	Right-edge sequential
Timestamp privacy	None — no time data leaks	Creation time recoverable from the id
Sortable by creation time	No	Yes (lexicographic = chronological)
Index bloat under random writes	High (page splits everywhere)	Low (writes concentrate at the index leaf)
Database support	Universal	PostgreSQL 17+, CockroachDB, MySQL 8.4+, SQLite via extension
Collision rate at 1B rows	1 in 3.4×10^16	1 in 9.4×10^16 (slightly better — more entropy in random bits)

Pick v7 for any new system that writes UUIDs to a btree-indexed database — Postgres, MySQL, CockroachDB, SQL Server, and basically every relational engine. The sequential insert pattern eliminates the page-split storm v4 causes at high write volume. You also get free time-range queries (WHERE id > uuid_at('2026-01-01')) and natural chronological sort.

Pick v4 when the id will be exposed to users and creation-time leakage matters (account ids, share tokens, anything where "when did this exist" is sensitive). v4 leaks no temporal information; v7's first 48 bits decode trivially back to a millisecond timestamp.

Pick v4 when your storage layer can't take advantage of monotonic ids (DynamoDB partition keys want random distribution by design — sequential ids create hot partitions). The decode mode of this tool shows exactly what each version exposes.

The other versions: v1 leaks the MAC address of the generating host (don't use). v6 is a backward-compatible re-ordering of v1 — better than v1, worse than v7, useful only if you must preserve v1-shaped layouts. v8 is the "custom" version — bring your own bit layout, used for ULIDs-as-UUIDs and similar. v3 and v5 are namespace-deterministic (hash-based) — use them when you need the same id from the same input (e.g. deterministic ids for content addressing).

Error codes

Code	When it fires	Recovery
`INPUT_EMPTY`	Decode mode with empty `uuid`, or generate with `count: 0`	Provide a non-empty value
`INPUT_MALFORMED`	UUID doesn't match the 8-4-4-4-12 hex pattern (after format-stripping)	Verify the input is a UUID
`INPUT_INVALID_TYPE`	Version field outside `{7, 8}` for generate mode	Use v7 or v8 explicitly
`PAYLOAD_LIMIT`	`count > 100`	Split into multiple calls
`UNSUPPORTED_FORMAT`	Decode of a UUID with reserved-future variant bits	The id is malformed at the variant byte; manual inspection required

When NOT to use this tool

If your scale and ordering requirements aren't met by v4 or v7, consider ULID (similar to v7 but Crockford-base32 encoded, shorter visually) or KSUID (32-bit timestamp + 128-bit random, shorter than UUID overall). The tool doesn't generate these; use a dedicated library.

For cryptographic randomness in non-id contexts (session tokens, password reset tokens, OAuth state values), use the runtime's CSPRNG directly (crypto.randomBytes in Node, crypto.getRandomValues in browsers). UUIDs are unique identifiers, not security tokens; the formatting overhead is wasted for opaque secrets.

Performance notes

Generate-mode batch of 100: under 5ms. Decode-mode single UUID: under 1ms. Collision analysis with existingUuids array: O(n log n) for the duplicate scan; 50ms typical for arrays of 10K UUIDs. The tool is deterministic for decode and analyze-collisions modes. Generate mode is non-deterministic by definition; REST responses set Cache-Control: no-store.