Skip to content

Pseudo-IDs

Every object generated by Pseudata has a unique, deterministic pseudo-id—a special UUID v8 that encodes three pieces of information: the world-seed, the type-sequence, and the object’s index in its PseudoArray.

PseudoID Bit Layout

Unlike random UUIDs, pseudo-ids are:

  • Deterministic - Same across all languages and executions
  • Sortable - Organized by world, then type, then creation order
  • Self-describing - Contains metadata about the object’s origin
  • RFC 9562 compliant - Standard UUID v8 format for vendor-specific data

Technical Limits:

  • World-Seeds: 64-bit (18.4 quintillion unique worlds)
  • Type-Sequences: 16-bit (65,536 different types)
  • Index Range: 40-bit (1.1 trillion objects per type)
  • Layout Versions: 2-bit (3 future encoding schemes)

These limits are designed to handle virtually any real-world use case.

The world-seed is the base seed you provide when creating a PseudoArray. It defines the entire “world” or “universe” of generated data.

// Different arrays, same world
users := pseudata.NewUserArray(42)
addresses := pseudata.NewAddressArray(42)
user := users.At(0)
addr := addresses.At(0)
// Both PseudoIDs contain worldSeed=42
// They belong to the same "world"

Use cases:

  • Separate production, staging, and test data
  • Isolate data by tenant or customer
  • Group related test scenarios

The type-sequence (typeSeq) identifies the type of object. Each complex type has a predefined sequence number:

TypeTypeSeqDescription
User101User objects (OIDC-compliant)
Address110Address objects (locale-aware)
Custom Types1024+Reserved for future types

Note: TypeSeq uses 16 bits (0-65,535), with built-in types using 0-1023 and custom types starting from 1024.

// Different types have different typeSeq values
users := pseudata.NewUserArray(42) // typeSeq = 101
addresses := pseudata.NewAddressArray(42) // typeSeq = 110
userID := users.At(100).ID() // Contains typeSeq=101
addressID := addresses.At(100).ID() // Contains typeSeq=110
// PseudoIDs are different even at same index!

This means you can identify an object’s type-sequence just by looking at its pseudo-id.

The index is the position of the object in its PseudoArray. It represents the “creation order” within that type.

users := pseudata.NewUserArray(42)
first := users.At(0) // index = 0
second := users.At(1) // index = 1
thousandth := users.At(1000) // index = 1000
id1 := first.ID() // Contains index=0
id2 := second.ID() // Contains index=1
id1000 := thousandth.ID() // Contains index=1000

The index can range from 0 to 1,099,511,627,775 (40 bits), supporting over 1.1 trillion unique objects per type.

Pseudo-ids use the standard RFC 9562 UUID v8 format but encode the three components into the 122 “random” bits:

UUID Format: xxxxxxxx-xxxx-8xxx-yxxx-xxxxxxxxxxxx
└─────────── encoded data ──────────┘
8 = version (v8)
y = variant (8-b)

Bit Distribution:

  • 64 bits: World-seed (full uint64 range)
  • 2 bits: Layout bits (reserved for future layouts, currently always 0)
  • 16 bits: Type-sequence (0-65,535)
  • 40 bits: Index (0-1.1 trillion)

Visual Pattern:

SSSSSSSS-SSSS-8SSS-vSTT-TTIIIIIIIII
S = WorldSeed bits
T = TypeSeq bits
I = Index bits
v = Variant nibble + Layout bits (reserved)
8 = Version (UUID v8)

Example:

Input: worldSeed=42, typeSeq=101, index=1000
Output: 00000000-0000-8002-a806-5000000003e8
└─ world ──┘ v │└ type+index ──┘

PseudoIDs are designed to sort hierarchically:

Primary: World-seed (groups by world)
Secondary: Type-sequence (groups types within a world)
Tertiary: Index (creation order within type)

// Go - Demonstrating sort order
id1 := EncodeID(1, 42, 101, 1000) // World 42, Users, Index 1000
id2 := EncodeID(1, 42, 101, 1001) // World 42, Users, Index 1001
id3 := EncodeID(1, 42, 110, 1000) // World 42, Addresses, Index 1000
id4 := EncodeID(1, 43, 101, 1000) // World 43, Users, Index 1000
// Sort order (lexicographical):
// id1 < id2 (same world/type, index 1000 < 1001)
// id2 < id3 (same world, type 101 < 110)
// id3 < id4 (world 42 < 43)

This makes PseudoIDs perfect for:

  • Database indexing - Natural clustering by world and type
  • Analytics - Group queries by world or type
  • Debugging - Find all objects from a specific world
  • Testing - Isolate test data by world seed

Every generated object has an id() method:

import "github.com/pseudata/pseudata"
users := pseudata.NewUserArray(42)
user := users.At(1000)
pseudoID := user.ID()
fmt.Println(pseudoID) // "00000000-0000-8002-a806-5000000003e8"

You can generate PseudoIDs directly using utility functions:

import "github.com/pseudata/pseudata"
pseudoID := pseudata.EncodeID(
42, // worldSeed
101, // typeSeq (Users)
1000, // index
)

Use cases for direct encoding:

  • Generate test fixtures
  • Create migration scripts
  • Replay specific scenarios
  • Generate IDs for external systems

Extract the components from any pseudo-id:

import "github.com/pseudata/pseudata"
components, err := pseudata.DecodeID("00000000-0000-8002-a806-5000000003e8")
if err == nil {
fmt.Printf("World: %d\n", components.WorldSeed) // 42
fmt.Printf("Type: %d\n", components.TypeSeq) // 101
fmt.Printf("Index: %d\n", components.Index) // 1000
}

Use PseudoIDs as primary keys for deterministic, reproducible databases:

// TypeScript - Generate consistent primary keys
const users = new UserArray(productionSeed);
for (let i = 0; i < 10000; i++) {
const user = users.at(i);
await db.insert("users", {
id: user.id(), // Deterministic PseudoID
name: user.name(),
email: user.email(),
});
}

Benefits:

  • Same IDs across test runs
  • Easy to reference in test assertions
  • Natural clustering in database indexes

Generate matching test data across different services:

# Python backend service
from pseudata import UserArray
users = UserArray(42)
test_user = users[1000]
test_user_id = test_user.id() # "10000000-0000-4000-8a80-1940000003e8"
// TypeScript frontend test
import { UserArray } from "@pseudata/core";
const users = new UserArray(42n);
const testUser = users.at(1000);
const testUserId = testUser.id(); // "10000000-0000-4000-8a80-1940000003e8"
// Same PseudoID! Can test cross-service interactions
await expect(page.locator(`[data-user-id="${testUserId}"]`)).toBeVisible();

Decode PseudoIDs in production logs to understand data origin:

// Go - Production debugging
suspiciousID := "10000000-0000-4000-8a80-1940000003e8"
components, _ := pseudata.DecodeID(suspiciousID)
log.Printf("Issue analysis:")
log.Printf(" Environment seed: %d", components.WorldSeed) // 42
log.Printf(" Object type: %d", components.TypeSeq) // 101 (User)
log.Printf(" Array position: %d", components.Index) // 1000
// Reproduce the exact object
users := pseudata.NewUserArray(components.WorldSeed)
problematicUser := users.At(int(components.Index))

Use different world seeds per tenant:

// Java
long tenant1Seed = 1001;
long tenant2Seed = 2002;
UserArray tenant1Users = new UserArray(tenant1Seed);
UserArray tenant2Users = new UserArray(tenant2Seed);
// PseudoIDs naturally contain tenant information
String t1UserId = tenant1Users.at(0).id(); // Contains seed=1001
String t2UserId = tenant2Users.at(0).id(); // Contains seed=2002
// Can identify tenant from any PseudoID
IDComponents c = IDUtils.decodeId(t1UserId);
long tenantSeed = c.worldSeed; // 1001

Query data by world or type:

# Python - Analytics example
from pseudata.id_utils import decode_id
# Analyze production PseudoIDs
for user_id in production_user_ids:
components = decode_id(user_id)
if components:
metrics[components.world_seed]['count'] += 1
metrics[components.world_seed]['types'].add(components.type_seq)
# Report: "World 42 has 5000 Users, World 43 has 3000 Users"

PseudoIDs are identical across all languages for the same inputs:

// Same world, type, and index → Same PseudoID everywhere
// TypeScript
new UserArray(42n).at(1000).id()
// → "10000000-0000-4000-8a80-1940000003e8"
// Python
UserArray(42)[1000].id()
// → "10000000-0000-4000-8a80-1940000003e8"
// Java
new UserArray(42L).at(1000).id()
// → "10000000-0000-4000-8a80-1940000003e8"
// Go
NewUserArray(42).At(1000).ID()
// → "10000000-0000-4000-8a80-1940000003e8"

This consistency is guaranteed by:

  • Shared test vectors (fixtures/id_test_vectors.json)
  • Identical bit-packing algorithms
  • Comprehensive cross-language test suites

Choose world seeds that reflect your environments:

const DEV_SEED = 1n;
const STAGING_SEED = 2n;
const PRODUCTION_SEED = 42n;
const TEST_SEED = 9999n;
// Clear separation of data
const devUsers = new UserArray(DEV_SEED);
const stagingUsers = new UserArray(STAGING_SEED);

Maintain a registry of your type sequences:

type_sequences.go
const (
TypeSeqUsers = 101
TypeSeqAddresses = 110
TypeSeqOrders = 201
TypeSeqPayments = 202
// Custom types start at 10000
TypeSeqMyCustom = 10000
)

Add PseudoID decoding to your logging:

# Python - Enhanced logging
import logging
from pseudata.id_utils import decode_id
def log_user_action(user_id: str, action: str):
components = decode_id(user_id)
if components:
logging.info(
f"Action: {action}, "
f"World: {components.world_seed}, "
f"Type: {components.type_seq}, "
f"Index: {components.index}"
)

Ensure PseudoIDs come from expected worlds:

// Java - Validation
public boolean isValidProductionUser(String userId) {
IDComponents c = IDUtils.decodeId(userId);
return c.worldSeed == PRODUCTION_SEED
&& c.typeSeq == TypeSeq.USERS;
}
FeatureRandom UUIDPseudoID
Deterministic❌ Different each time✅ Same across languages
Cross-language❌ Random everywhere✅ Identical everywhere
Sortable❌ Random order✅ World → Type → Index
Decodable❌ No metadata✅ Extract components
Use with PseudoArrays❌ Not tied to data✅ Directly from objects
RFC compliant✅ Yes (RFC 4122)✅ Yes (RFC 9562)
Best forUnique identifiersTraceable, reproducible IDs
  • PseudoLink - Learn how to use the 40-bit index as a coordinate system with configurable bit layouts for stateless O(1)O(1) navigation between related indices