Skip to content

Pseudo-Arrays

A pseudo-array is an infinite, deterministic array that generates elements on-demand using pseudo-random algorithms. Each index always produces the same element for a given world-seed and type-sequence, with O(1)O(1) access and zero memory overhead.

The key insight: instead of storing billions of records in memory, pseudo-arrays calculate each element when you access it. The same index always returns the same element because it uses the deterministic PCG32 algorithm with hierarchical seeding: Generator(worldSeed, typeSeq).Advance(index).

In traditional systems, generating test data means creating and storing arrays in memory or databases. With pseudo-arrays, you can instantly access the billionth user or the trillionth address without iterating or storing any data - it calculates on demand and is mathematically guaranteed to be consistent across all languages.

PseudoArray is an abstract base class that concrete types (like UserArray, AddressArray) extend. Each subclass implements the generate() method to define how to create an element at a specific index.

The Generation Process:

  1. Create a pseudo-array with a world-seed and type-sequence
  2. Access any index using .at(index)
  3. Generate the element using Generator(worldSeed, typeSeq).Advance(index)
  4. Return the generated element (not stored, just calculated)

Key Properties:

  • Deterministic: Same worldSeed, typeSeq, and index always produce the same element
  • Infinite: Access any index from 0 to 2^40-1 (over 1 trillion elements)
  • Zero memory: Each element is calculated on-demand, not stored
  • O(1)O(1) access: Direct index access without iteration
  • Cross-language consistent: Same element in Go, Java, Python, TypeScript and more

The Interface tab shows the PseudoArray base class API. All concrete array types extend this class.

The Implementation tab contains the reference implementation showing how the base class works.

/**
* PseudoArray<T>
* Infinite deterministic array of items of type T.
* Provides constant-time random access to deterministically generated items
* without storing them in memory.
*/
declare abstract class PseudoArray<T> {
/**
* @param worldSeed - Base seed for deterministic generation.
* @param typeSeq - Type sequence identifier (e.g., 101 for Users, 110 for Addresses).
*/
constructor(worldSeed: number | bigint, typeSeq: number | bigint);
/**
* Returns the item at the specified index.
* The same index always produces the same item.
* @param index - Array index (0 to 2^40-1).
* @returns Generated item at the specified index.
*/
public at(index: number | bigint): T;
/**
* Subclasses must implement this to define item generation logic.
* @param worldSeed - Base seed for the array.
* @param typeSeq - Type sequence identifier.
* @param index - Array index position.
* @returns Generated item of type T.
*/
protected abstract generate(
worldSeed: number | bigint,
typeSeq: number | bigint,
index: number
): T;
}

PseudoArray is abstract - you use concrete implementations like UserArray and AddressArray.

Generates OIDC-compliant user objects with names, emails, avatars, and temporal data.

import "github.com/pseudata/pseudata"
users := pseudata.NewUserArray(42) // worldSeed = 42, typeSeq = 101
user := users.At(1000)
fmt.Println(user.Name) // "John Smith"
fmt.Println(user.Email) // "john.smith@example.com"
fmt.Println(user.ID()) // PseudoID: "0000002a-0000-8000-a065-00000003e8"
// Access any index instantly
billionthUser := users.At(1_000_000_000) // O(1)

Generates locale-aware address objects with street, city, state, and postal code.

addresses := pseudata.NewAddressArray(42)
address := addresses.At(500)
fmt.Println(address.StreetAddress) // "123 Main St"
fmt.Println(address.City) // "Springfield"
fmt.Println(address.PostalCode) // "12345"

Access any index directly with O(1)O(1) complexity:

const users = new UserArray(42n);
const user1 = users.at(0); // First user
const user2 = users.at(999); // 1000th user
const user3 = users.at(1_000_000); // Millionth user - instant access!

Same data across all languages:

// Frontend (TypeScript)
const users = new UserArray(42n);
const testUser = users.at(1000);
// testUser.id() = "0000002a-0000-8000-a065-00000003e8"
# Backend (Python)
users = UserArray(42)
test_user = users[1000]
# test_user.id() = "0000002a-0000-8000-a065-00000003e8"
# Same ID, same data!

Each worker accesses its own index range without coordination:

// Worker 1: indices 0-999
for (let i = 0; i < 1000; i++) {
const user = users.at(i);
// Process user...
}
// Worker 2: indices 1000-1999
for (let i = 1000; i < 2000; i++) {
const user = users.at(i);
// Process user...
}
// No conflicts, no coordination needed

Traditional array:

const users = [];
for (let i = 0; i < 1_000_000; i++) {
users.push(generateUser(i)); // Stores in memory
}
// Memory usage: ~500MB for 1M users

PseudoArray:

const users = new UserArray(42n);
const user = users.at(1_000_000); // Calculates on demand
// Memory usage: ~100 bytes (just the worldSeed and typeSeq)
const users = new UserArray(42n);
// All of these work instantly:
users.at(1_000); // Thousandth user
users.at(1_000_000); // Millionth user
users.at(1_000_000_000); // Billionth user
users.at(1_099_511_627_775); // Maximum index (2^40-1)
// No pre-generation, no memory issues
const users1 = new UserArray(42n);
const users2 = new UserArray(42n);
users1.at(500).name === users2.at(500).name; // true
users1.at(500).email === users2.at(500).email; // true
users1.at(500).id() === users2.at(500).id(); // true
// Same seed = same data, always
// Go
NewUserArray(42).At(1000).Name
// "John Smith"

All languages produce identical data!

Pseudo-arrays are designed for test data, mock data, and development scenarios where deterministic, repeatable data is valuable.

Ideal for:

  • Unit testing: Consistent test data across test runs
  • Integration testing: Same data across frontend and backend tests
  • Load testing: Generate billions of entities without memory overhead
  • QA/Demos: Reproducible scenarios across environments
  • Development: Realistic data without database dependencies
  • Documentation: Consistent examples in code samples

Not suitable for:

  • Production databases: Use real data from your database
  • User-generated content: This is synthetic, not real user data
  • Unique constraints: Generated data may have collisions (e.g., email uniqueness not guaranteed)
  • Compliance requirements: Not suitable where real anonymized data is required

Key advantage: Deterministic, infinite-scale data with zero memory overhead - perfect for testing and development where consistency matters more than randomness.

Pseudo-arrays use the PCG32 algorithm with hierarchical seeding:

Formula: Generator(worldSeed,typeSeq).Advance(index)Generator(worldSeed, typeSeq).Advance(index)

Why this works:

  • PCG32 streams: Each typeSeq is a separate stream, ensuring no overlap
  • Advance operation: Jumps directly to the correct state for any index
  • Deterministic: Mathematical guarantee of same output for same inputs

Each concrete type has a unique type-sequence identifier:

TypeTypeSeqPurpose
User101OIDC-compliant user objects
Address110Locale-aware address objects
Custom1024+Reserved for future types

This ensures:

  • UserArray(42).at(100) and AddressArray(42).at(100) are independent
  • No overlap between types even with same worldSeed and index
  • Each type has its own “stream” of random numbers

Every element generated by pseudo-arrays has a Pseudo ID that encodes its position:

const users = new UserArray(42n); // worldSeed = 42, typeSeq = 101
const user = users.at(1000); // index = 1000
user.id();
// "0000002a-0000-8000-a065-00000003e8"
// └world─┘ └─typeSeq+index─┘
// 42 101 1000

The PseudoID encodes:

  • World-seed: 42
  • Type-sequence: 101 (User)
  • Index: 1000

This means you can decode a PseudoID to find the exact array position:

import { decodeId } from '@pseudata/core';
const components = decodeId(user.id());
// components.worldSeed = 42
// components.typeSeq = 101
// components.index = 1000
// Reconstruct the exact same user
const users = new UserArray(components.worldSeed);
const sameUser = users.at(components.index);
// sameUser.id() === user.id() // true

You can use pseudo links with the PseudoLink class (see reference) to create indices with relational properties:

import { UserArray, PseudoLink } from '@pseudata/core';
const link = new PseudoLink(17, 3);
const users = new UserArray(42n);
// Create user at specific coordinates
const userIndex = link.spawn(
1, // island: 1
1000, // neighborhood: 1000
0 // connector: 0
);
const user = users.at(userIndex);
// Find related users in same neighborhood
for (let slot = 0; slot <= link.maxConnectors(); slot++) {
const relatedIndex = link.resolve(userIndex, slot);
const relatedUser = users.at(relatedIndex);
// Related users share neighborhood and island bits
}

This combines the infinite scale of pseudo-arrays with the relational capabilities of pseudo-links.

  • Pseudo-IDs - Learn about the deterministic IDs that identify each element
  • Pseudo-Links - Use coordinate-based indices for relational test data
  • Models - See all available types and their properties
  • Primitives - Understand the generator functions used to create element properties