Skip to content

Add a New SDK

This guide walks through adding support for a new programming language (e.g., Rust, C#, PHP, Ruby) to Pseudata. Use the existing implementations (Go, Java, Python, TypeScript) as reference.

Much of the code is generated automatically from TypeSpec definitions. Your main task is to write the foundation code (Generator, SeedFrom, ID Utils) and implement the custom TypeSpec emitters to generate code for your target language.

What You’ll Implement:

  1. Generator - PCG32 random number generator
  2. SeedFrom - String to seed conversion
  3. TypeSpec Emitters - Code generation for your language
  4. PseudoArray - PseudoArray base class
  5. ID Utils - UUID encoding/decoding
  6. Primitives - ~75 data generation methods
  7. Models - Auto-generated via quicktype

Each component has fixture-based tests to ensure cross-language consistency.

Pseudata uses a multi-stage code generation pipeline:

Code Generation Flow

Terminal window
tsp compile src # Generates JSON Schema in tsp-output/

Output: tsp-output/@typespec/json-schema/*.json

  • User.json - User model schema
  • Address.json - Address model schema
  • Resources.json - Resources model schema

Stage 2: JSON Schema → Models (via quicktype)

Section titled “Stage 2: JSON Schema → Models (via quicktype)”
Terminal window
quicktype --src-lang schema \
tsp-output/@typespec/json-schema/*.json \
-o ../sdks/rust/src/models.rs \
--lang rust \
--visibility public \
--derive-debug

Output: models.{go,java,py,ts,rs} - Data structures only

Stage 3: TypeSpec → Code (via custom emitters)

Section titled “Stage 3: TypeSpec → Code (via custom emitters)”
Terminal window
# Automatically runs during tsp compile (configured in tspconfig.yaml)

Custom Emitters (in typespec/lib/):

  1. resource-emitter.js

    • Reads files from typespec/resources/ directory
    • Generates: resources.{go,java,py,ts} with embedded data
    • Includes AVAILABLE_LOCALES and RESOURCES map
  2. interface-emitter.js (called by array-emitter.js)

    • Reads primitives.tsp interface definition
    • Generates: primitives.{go,java,py,ts} interfaces
  3. array-emitter.js

    • Reads @array decorators on models
    • Generates: arrays.{go,java,py,ts} with generator functions
    • Example: UserArray, AddressArray, generateUser(), generateAddress()

Configuration (tspconfig.yaml):

emit:
- "@typespec/json-schema" # Stage 1: JSON Schema
- "../lib/emitter.js" # Stage 3: Custom emitters
Terminal window
cd typespec
npm run generate

This runs:

  1. tsp compile src - TypeSpec → JSON Schema + custom emitters
  2. quicktype - JSON Schema → Models for all languages
  3. Post-processing (e.g., fix-resources-visibility.js for Java)
sdks/
rust/ # Example: Adding Rust
src/
generator.rs # PCG32 implementation
seed_from.rs # String → seed conversion
id_utils.rs # PseudoID encode/decode
primitives.rs # Generated interface
primitives_impl.rs # Implementation
models.rs # Generated models
arrays.rs # Generated arrays
pseudo_array.rs # Generic array base
resources.rs # Generated resources
lib.rs # Public exports
tests/
generator_test.rs # Fixture tests
seed_test.rs # Fixture tests
id_test.rs # Fixture tests
primitives_vectors_test.rs # Fixture tests
array_user_vectors_test.rs # Fixture tests
array_address_vectors_test.rs # Fixture tests
Cargo.toml
README.md

Components should be implemented in order, as later components depend on earlier ones:

1. Generator (PCG32) - Core random number generator
2. SeedFrom - String to seed conversion
3. TypeSpec Emitters - Code generation for your language
├─ resource-emitter.js (embeds locale data)
├─ interface-emitter.js (generates Primitives interface)
└─ array-emitter.js (generates array classes)
4. PseudoArray - PseudoArray wrapper
5. ID Utils - UUID encoding/decoding
6. Primitives Implementation - Implement generated interface
7. Models (via quicktype) - Auto-generated data structures
8. Array Fixture Tests - End-to-end verification

What: Deterministic random number generator using PCG32 algorithm.

Reference:

  • Go: generator.go
  • Specification: fixtures/pcg32_test_vectors.json

Key Methods:

// Example signatures (adjust to language idioms)
struct Generator {
state: u64,
inc: u64,
}
impl Generator {
fn new(seed: u64, seq: u64) -> Self;
fn advance(&mut self, delta: u64);
fn next_int(&mut self) -> u32; // Core PCG32 output: uint32
fn intn(&mut self, n: u32) -> u32; // Uses u32 like Go
fn probability(&mut self, p: f32) -> bool;
fn next_float(&mut self) -> f32;
fn next_bool(&mut self) -> bool;
fn int_range(&mut self, min: u32, max: u32) -> u32; // Uses u32 like Go
fn float_range(&mut self, min: f32, max: f32) -> f32;
fn uuid(&mut self) -> String;
}

Implementation Requirements:

  • PCG32 state: 64-bit state + 64-bit increment (must be odd)
  • Advance: Jump to arbitrary position in O(1) time
  • Bounded random: intn(n) must use unbiased method (rejection sampling)
  • UUID v4: Generate RFC 4122 compliant UUIDs
  • Unsigned semantics: PCG32 produces uint32 values. Languages without unsigned types (e.g., Java) should use wider types (long) to represent the full uint32 range

Test: fixtures/pcg32_test_vectors.json

  • 12 test vectors covering various seeds, sequences, and advance operations
  • Must match exact output sequences across all languages

Example Test Pattern:

#[test]
fn test_generator_with_vectors() {
let vectors: Vec<TestCase> = load_json("../../fixtures/pcg32_test_vectors.json");
for tc in vectors {
let mut gen = Generator::new(tc.inputs.seed, tc.inputs.seq);
gen.advance(tc.inputs.advance);
for (i, expected) in tc.expected.outputs.iter().enumerate() {
assert_eq!(gen.next_int(), *expected, "Output {} mismatch", i);
}
}
}

Step 2: SeedFrom (String → Seed Conversion)

Section titled “Step 2: SeedFrom (String → Seed Conversion)”

What: Deterministic conversion from strings to 64-bit seeds.

Reference:

  • Go: seed_from.go
  • Algorithm: FNV-1a hash

Key Function:

fn seed_from(s: &str) -> u64 {
const OFFSET_BASIS: u64 = 14695981039346656037;
const FNV_PRIME: u64 = 1099511628211;
let mut hash = OFFSET_BASIS;
for byte in s.bytes() {
hash ^= byte as u64;
hash = hash.wrapping_mul(FNV_PRIME);
}
hash
}

Test: fixtures/seed_test_vectors.json

  • 18 test cases including empty strings, ASCII, Unicode, long strings
  • Must produce identical seeds across all languages

Critical: Handle UTF-8 correctly. The input is UTF-8 bytes, not characters.


TypeSpec emitters are TypeScript programs that generate code for your language. They read TypeSpec definitions and generate Primitives interfaces, Arrays, and Resources.

Why Custom Emitters?

  • Primitives: Interface only (implementation is hand-written)
  • Arrays: Need generator functions with @generator decorator logic
  • Resources: Embed file data from typespec/resources/ directory
  • Quicktype only generates data models, not these specialized components
// typespec/lib/emitter.js - Entry point called by tsp compile
export async function $onEmit(context) {
await emitVirtualArrays(context); // → Primitives interface + Arrays
await emitResourceData(context); // → Resources
}

Location: typespec/lib/

Three files to modify:

  1. resource-emitter.js - Embed locale data
  2. interface-emitter.js - Generate Primitives trait/interface
  3. array-emitter.js - Generate array classes + generator functions

Configuration (tspconfig.yaml):

emit:
- "@typespec/json-schema" # Built-in: Models → JSON Schema
- "../lib/emitter.js" # Custom: Everything else

Purpose: Read files from typespec/resources/ and embed as code

This generates the largest file - hundreds of names, cities, etc. per locale

Input: Directory structure

typespec/resources/
global/
email_domains.txt → ["gmail.com", "yahoo.com", ...]
en/
nouns.txt → ["book", "table", ...]
adjectives.txt
verbs.txt
months.txt → ["January", "February", "March", ...]
weekdays.txt → ["Monday", "Tuesday", "Wednesday", ...]
en_US/
given_male_names.txt → ["James", "John", "Robert", ...]
given_female_names.txt → ["Mary", "Patricia", ...]
family_names.txt
cities.txt
streets.txt
zipcodes.txt
US/
address_format.txt → "{street_address}, {locality}, {region} {postal_code}"
street_format.txt
timezones.txt → ["America/New_York", "America/Los_Angeles", ...]
phone_number_patterns.txt → ["(###) ###-####", "###-###-####"]

Your Task: Add Rust resource embedding

Add to resource-emitter.js:

// 1. Add Rust resource generation
function emitRustResources(locales, program) {
let rustCode = "use std::collections::HashMap;\n";
rustCode += "use once_cell::sync::Lazy;\n";
rustCode += "use crate::models::Resources;\n\n";
// Available locales constant
rustCode += "pub static AVAILABLE_LOCALES: &[&str] = &[\n";
rustCode += locales.map((l) => ` "${l}"`).join(",\n") + ",\n";
rustCode += "];\n\n";
// Resources HashMap
rustCode += "pub static RESOURCES: Lazy<HashMap<&'static str, Resources>> = Lazy::new(|| {\n";
rustCode += " let mut m = HashMap::new();\n\n";
for (const locale of locales) {
const resources = loadResourcesForLocale(locale);
rustCode += ` m.insert("${locale}", Resources {\n`;
rustCode += ` email_domains: vec![${formatRustStringArray(resources.emailDomains)}],\n`;
rustCode += ` male_given_names: vec![${formatRustStringArray(resources.maleGivenNames)}],\n`;
rustCode += ` female_given_names: vec![${formatRustStringArray(resources.femaleGivenNames)}],\n`;
rustCode += ` other_given_names: vec![${formatRustStringArray(resources.otherGivenNames)}],\n`;
rustCode += ` family_names: vec![${formatRustStringArray(resources.familyNames)}],\n`;
rustCode += ` cities: vec![${formatRustStringArray(resources.cities)}],\n`;
rustCode += ` streets: vec![${formatRustStringArray(resources.streets)}],\n`;
rustCode += ` nouns: vec![${formatRustStringArray(resources.nouns)}],\n`;
rustCode += ` adjectives: vec![${formatRustStringArray(resources.adjectives)}],\n`;
rustCode += ` verbs: vec![${formatRustStringArray(resources.verbs)}],\n`;
rustCode += ` zipcodes: vec![${formatRustStringArray(resources.zipcodes)}],\n`;
rustCode += ` states: vec![${formatRustStringArray(resources.states)}],\n`;
rustCode += ` address_format: "${resources.addressFormat}".to_string(),\n`;
rustCode += ` street_format: "${resources.streetFormat}".to_string(),\n`;
rustCode += ` timezones: vec![${formatRustStringArray(resources.timezones)}],\n`;
rustCode += ` phone_number_patterns: vec![${formatRustStringArray(resources.phoneNumberPatterns)}],\n`;
rustCode += ` });\n\n`;
}
rustCode += " m\n";
rustCode += "});\n";
return rustCode;
}
function formatRustStringArray(arr) {
return arr.map((s) => `"${s.replace(/\\/g, "\\\\").replace(/"/g, '\\"')}".to_string()`).join(", ");
}
// 2. Register Rust in emitResourceData()
export async function emitResourceData(context) {
const locales = ["ar_SA", "de_DE", "en_CA", "en_GB", "en_US" /* ... */];
// Add Rust
const rustCode = emitRustResources(locales, program);
await program.host.writeFile("sdks/rust/src/resources.rs", rustCode);
// ... existing Go, Java, Python, TypeScript
}

Example Output (Rust, truncated):

// sdks/rust/src/resources.rs (auto-generated, ~3000 lines)
use std::collections::HashMap;
use once_cell::sync::Lazy;
use crate::models::Resources;
pub static AVAILABLE_LOCALES: &[&str] = &[
"ar_SA",
"de_DE",
"en_CA",
"en_GB",
"en_US",
"es_MX",
"fr_CA",
"fr_FR",
"hu_HU",
"ja_JP",
"pt_BR",
"tr_TR",
"vi_VN",
"zh_CN",
];
pub static RESOURCES: Lazy<HashMap<&'static str, Resources>> = Lazy::new(|| {
let mut m = HashMap::new();
m.insert("en_US", Resources {
email_domains: vec!["gmail.com".to_string(), "yahoo.com".to_string(), /* ... */],
male_given_names: vec!["James".to_string(), "Robert".to_string(), /* ... */],
female_given_names: vec!["Mary".to_string(), "Patricia".to_string(), /* ... */],
// ... all other fields
});
m.insert("ar_SA", Resources {
email_domains: vec!["gmail.com".to_string(), /* ... */],
male_given_names: vec!["محمد".to_string(), "أحمد".to_string(), /* ... */],
// ... UTF-8 encoded Arabic names
});
// ... all 14 other locales
m
});

Important:

  • UTF-8 encoding: Properly escape strings (Arabic, Chinese, etc.)
  • String ownership: .to_string() for Rust ownership
  • Lazy initialization: Use once_cell::Lazy for static initialization
  • Large file: ~3000 lines, 200KB+, generated from ~250 resource files

Resource File Resolution:

ar_SA/given_male_names.txt
// resource-emitter.js logic
function getResourceFile(locale, country, language, filename) {
if (exists(`resources/${locale}/${filename}`)) return read(locale);
// Try language-level: ar/nouns.txt
if (exists(`resources/${language}/${filename}`)) return read(language);
// Try country-level: SA/timezones.txt
if (exists(`resources/${country}/${filename}`)) return read(country);
// Try global: global/email_domains.txt
if (exists(`resources/global/${filename}`)) return read(global);
// Fallback: empty or error
return [];
}

Purpose: Generate Primitives trait/interface from TypeSpec definition

Called by: array-emitter.js (automatically)

Input: primitives.tsp interface

interface Primitives {
id(): string;
gender(): string;
locale(): string;
element(items: string[]): string;
digit(length: int32): string;
probability(p: float32): boolean;
// ... ~75 methods total
}

Your Task: Add Rust type mapping and code generation

Add to interface-emitter.js:

// 1. Add Rust type mapping
function mapRustType(tspType) {
if (tspType.endsWith("[]")) {
const elementType = tspType.slice(0, -2);
return `Vec<${mapRustType(elementType)}>`;
}
const typeMap = {
string: "String",
int32: "i32",
int64: "i64",
uint32: "u32",
uint64: "u64",
float32: "f32",
float64: "f64",
boolean: "bool",
numeric: "f64",
};
return typeMap[tspType] || "String";
}
// 2. Add Rust interface generation
function emitRustInterface(interfaceName, methods) {
const methodDecls = methods
.map((m) => {
const params = m.parameters.map((p) => `${snakeCase(p.name)}: ${mapRustType(p.type)}`).join(", ");
const returnType = mapRustType(m.returnType);
return ` fn ${snakeCase(m.name)}(&self${params ? ", " + params : ""}) -> ${returnType};`;
})
.join("\n");
return `pub trait ${interfaceName} {\n${methodDecls}\n}`;
}
// 3. Register Rust in emitAllInterfaces()
async function emitAllInterfaces(program, interfaces) {
// ... existing Go, Java, Python, TypeScript
// Add Rust
const rustCode = interfaces.map((i) => emitRustInterface(i.name, i.methods)).join("\n\n");
await program.host.writeFile("sdks/rust/src/primitives.rs", rustCode);
}

Example Output (Rust):

// sdks/rust/src/primitives.rs (auto-generated)
pub trait Primitives {
fn id(&self) -> String;
fn uuid(&self) -> String;
fn gender(&self) -> String;
fn locale(&self) -> String;
fn element(&self, items: Vec<String>) -> String;
fn digit(&self, length: i32) -> String;
fn letter(&self, length: i32) -> String;
fn alnum(&self, length: i32) -> String;
fn probability(&self, p: f32) -> bool;
fn next_int(&self) -> i64;
fn intn(&self, n: i32) -> i32;
fn next_float(&self) -> f32;
fn next_boolean(&self) -> bool;
fn int_range(&self, min: i32, max: i32) -> i32;
fn float_range(&self, min: f32, max: f32) -> f32;
// ... ~20 more methods
}

Type Mapping Table:

TypeSpecGoJavaPythonTypeScriptRust (example)
stringstringStringstrstringString
string[][]stringString[]list[str]string[]Vec<String>
int32int64intintnumberi32
int64int64longintnumberi64
float32float32floatfloatnumberf32
booleanboolbooleanboolbooleanbool

Testing:

Terminal window
cd typespec
npm run generate
# Check: sdks/rust/src/primitives.rs should be created

Purpose: Generate array classes and generator functions from @array decorators

This emitter parses @generator decorators and generates field assignment code.

Input: TypeSpec model with decorators

// pseudata.tsp
@array(101)
model User {
@generator("id") id: string;
@generator("id") sub: string;
@generator("compositeUserName") name: string;
@generator("genderedGivenName") given_name: string;
@generator("familyName") family_name: string;
@generator("middleName") middle_name: string;
@generator("nickname") nickname: string;
@generator("preferredUsername") preferred_username: string;
@generator("email") email: string;
@generator("gender") gender: string;
@generator("locale") locale: string;
@generator("avatarUrl") picture: string;
@generator("profileUrl") profile: string;
@generator("websiteUrl") website: string;
@generator("probability", 0.85) email_verified: boolean;
@generator("birthdateStr", 18, 65, Default.RefDate) birthdate: string;
@generator("zoneinfo") zoneinfo: string;
@generator("phoneNumber") phone_number: string;
@generator("probability", 0.75) phone_number_verified: boolean;
@generator("dateTimePast", Default.RefDate, Default.Days) updated_at: safeint;
}

Your Task: Add Rust array generation

Add to array-emitter.js:

// 1. Add Rust argument formatting
function formatRustArg(arg, paramType) {
if (typeof arg === "string") {
return `"${arg.replace(/\\/g, "\\\\").replace(/"/g, '\\"')}"`;
} else if (typeof arg === "boolean") {
return arg ? "true" : "false";
} else if (typeof arg === "number") {
// Rust: f32 suffix for floats
return Number.isInteger(arg) ? arg.toString() : `${arg}_f32`;
}
return String(arg);
}
// 2. Add Rust array class generation
function emitRustArrays(models, program) {
let rustCode = "use crate::{PseudoArray, PrimitivesImpl, models::*};\n\n";
for (const model of models) {
const typeSeq = model.decorators.array;
const className = `${model.name}Array`;
// Array struct
rustCode += `pub struct ${className} {\n`;
rustCode += ` inner: PseudoArray<${model.name}>,\n`;
rustCode += `}\n\n`;
// Constructor
rustCode += `impl ${className} {\n`;
rustCode += ` pub fn new(seed: u64) -> Self {\n`;
rustCode += ` Self {\n`;
rustCode += ` inner: PseudoArray::new(seed, ${typeSeq}, generate_${snakeCase(model.name)}),\n`;
rustCode += ` }\n`;
rustCode += ` }\n\n`;
rustCode += ` pub fn at(&self, index: i32) -> ${model.name} {\n`;
rustCode += ` self.inner.at(index)\n`;
rustCode += ` }\n`;
rustCode += `}\n\n`;
// Generator function
rustCode += `fn generate_${snakeCase(model.name)}(world_seed: u64, type_seq: u64, index: i32) -> ${model.name} {\n`;
rustCode += ` let p = PrimitivesImpl::new(world_seed, type_seq, index);\n`;
rustCode += ` ${model.name} {\n`;
for (const field of model.fields) {
const gen = field.decorators.generator;
if (!gen) continue;
const methodName = snakeCase(gen.methodName);
const args = gen.args.map((a) => formatRustArg(a)).join(", ");
const call = args ? `p.${methodName}(${args})` : `p.${methodName}()`;
rustCode += ` ${snakeCase(field.name)}: ${call},\n`;
}
rustCode += ` }\n`;
rustCode += `}\n\n`;
}
return rustCode;
}
// 3. Register Rust in main emit function
export async function $onEmit(context) {
// ... parse models with @array decorator
// Add Rust generation
const rustCode = emitRustArrays(arrayModels, program);
await program.host.writeFile("sdks/rust/src/arrays.rs", rustCode);
// ... existing Go, Java, Python, TypeScript
}

Example Output (Rust):

// sdks/rust/src/arrays.rs (auto-generated)
use crate::{PseudoArray, PrimitivesImpl, models::*};
pub struct UserArray {
inner: PseudoArray<User>,
}
impl UserArray {
pub fn new(seed: u64) -> Self {
Self {
inner: PseudoArray::new(seed, 101, generate_user),
}
}
pub fn at(&self, index: i32) -> User {
self.inner.at(index)
}
}
fn generate_user(world_seed: u64, type_seq: u64, index: i32) -> User {
let p = PrimitivesImpl::new(world_seed, type_seq, index);
User {
id: p.id(),
sub: p.id(),
name: p.composite_user_name(),
given_name: p.gendered_given_name(),
family_name: p.family_name(),
middle_name: p.middle_name(),
nickname: p.nickname(),
preferred_username: p.preferred_username(),
email: p.email(),
gender: p.gender(),
locale: p.locale(),
picture: p.avatar_url(),
profile: p.profile_url(),
website: p.website_url(),
email_verified: p.probability(0.85_f32),
birthdate: p.birthdate_str(18, 65, 1735689600),
zoneinfo: p.zoneinfo(),
phone_number: p.phone_number(),
phone_number_verified: p.probability(0.75_f32),
updated_at: p.date_time_past(1735689600, 30),
}
}
pub struct AddressArray {
inner: PseudoArray<Address>,
}
impl AddressArray {
pub fn new(seed: u64) -> Self {
Self {
inner: PseudoArray::new(seed, 110, generate_address),
}
}
pub fn at(&self, index: i32) -> Address {
self.inner.at(index)
}
}
fn generate_address(world_seed: u64, type_seq: u64, index: i32) -> Address {
let p = PrimitivesImpl::new(world_seed, type_seq, index);
Address {
id: p.id(),
formatted: p.formatted_address(),
street_address: p.street_address(),
locality: p.locality(),
region: p.region(),
postal_code: p.postal_code(),
country: p.country(),
locale: p.locale(),
}
}

Key Points:

  • Float literals: Rust needs _f32 suffix (Java needs f, Python/TS don’t)
  • Snake case: Convert TypeSpec camelCase to Rust snake_case
  • Type mapping: Use your mapRustType() function
  • Error handling: Validate decorator arguments

After implementing all three emitters:

Terminal window
# 1. Generate code
cd typespec
npm run generate
# 2. Check generated files exist
ls -lh ../sdks/rust/src/
# Should see: primitives.rs, arrays.rs, resources.rs
# 3. Verify compilation (Rust example)
cd ../sdks/rust
cargo check
# Should compile without errors (models.rs from quicktype required too)

Common Issues:

  • Missing type mapping: Add to mapRustType()
  • Float literal format: Check formatRustArg()
  • String escaping: Handle quotes and backslashes
  • UTF-8: Test with Arabic/Chinese resource files

Step 4: PseudoArray (PseudoArray Base Class)

Section titled “Step 4: PseudoArray (PseudoArray Base Class)”

What: Generic PseudoArray that generates elements on-demand via a generator function.

Reference:

  • Go: pseudo_array.go
  • TypeScript: sdks/typescript/src/pseudo-array.ts

Core Concept: O(1)O(1) random access without storing elements.

Example Implementation (Rust):

pub struct PseudoArray<T> {
world_seed: u64,
type_seq: u64,
generator_fn: fn(u64, u64, i32) -> T,
}
impl<T> PseudoArray<T> {
pub fn new(world_seed: u64, type_seq: u64, generator_fn: fn(u64, u64, i32) -> T) -> Self {
Self {
world_seed,
type_seq,
generator_fn,
}
}
pub fn at(&self, index: i32) -> T {
(self.generator_fn)(self.world_seed, self.type_seq, index)
}
}

Used by generated arrays:

// Generated by array-emitter.js
pub struct UserArray {
inner: PseudoArray<User>,
}
impl UserArray {
pub fn new(seed: u64) -> Self {
Self {
inner: PseudoArray::new(seed, 101, generate_user),
}
}
pub fn at(&self, index: i32) -> User {
self.inner.at(index)
}
}

No tests needed - tested via array fixture tests.


Step 5: ID Utils (PseudoID Encoding/Decoding)

Section titled “Step 5: ID Utils (PseudoID Encoding/Decoding)”

What: Encode/decode deterministic UUIDs (UUID v8) with embedded metadata.

Reference:

  • Go: id_utils.go
  • Documentation: PseudoID

Bit Layout (128-bit UUID v8):

Bits 63..........................0 15..............0 1 0 15..........0 43..................0
worldSeed[63:0] typeSeq[15:0] skip index[43:0]
UUID: SSSSSSSS-SSSS-8SSS-vSNN-TTTT-IIIIIIIIIIII
v = version bits (1000 = v8)
skip = 2 reserved bits (must be 0)

Detailed Format:

UUID v8 Format:
SSSSSSSS-SSSS-8SSS-vvTT-IIIIIIIIII
S = worldSeed (64 bits)
8 = version nibble (0x8)
v = variant bits (0b10)
-- = skip bits (2 bits, must be 0)
T = typeSeq (16 bits)
I = index (40 bits)
Visual pattern:
worldSeed[63:32] worldSeed[31:16] 8 + worldSeed[15:3] variant + skip + typeSeq[15:14] typeSeq[13:0] + index[39:32] index[31:0]

Functions to implement:

fn encode_id(world_seed: u64, type_seq: u16, index: u64) -> String;
fn decode_id(uuid: &str) -> Option<(u64, u16, u64)>; // (worldSeed, typeSeq, index)

Validation: Decode must fail if skip bits != 0 (forward compatibility).

Test: fixtures/id_test_vectors.json

  • 13 test vectors covering edge cases, bit boundaries, validation
  • Test vector #13 has should_decode: false (invalid skip bits)

What: Data generation methods using Generator and Resources.

Files:

  • Interface (generated): primitives.rs from TypeSpec
  • Implementation: primitives_impl.rs (hand-written)
  • Resources: resources.rs (generated from typespec/resources/)

Reference:

  • Go: primitives_impl.go
  • TypeSpec: typespec/src/primitives.tsp

Structure:

struct PrimitivesImpl {
world_seed: u64,
type_seq: u64,
index: i32,
}
impl Primitives for PrimitivesImpl {
fn rng(&self) -> Generator {
Generator::new(self.world_seed, self.type_seq).advance(self.index)
}
fn id(&self) -> String {
encode_id(self.world_seed, self.type_seq as u16, self.index as u64)
}
fn locale(&self) -> String {
let rng = self.rng();
AVAILABLE_LOCALES[rng.intn(AVAILABLE_LOCALES.len())]
}
// ... ~75 more primitive methods
}

Critical Pattern:

  • Store rng() in local variable if used more than once in a function
  • Each rng() call creates a new generator at the same position
  • Multiple calls without storing = same value repeated!
// ❌ WRONG - repeats same character
fn digit(&self, length: i32) -> String {
(0..length).map(|_| self.rng().intn(10).to_string()).collect()
}
// ✅ Correct - advances through sequence
fn digit(&self, length: i32) -> String {
let mut rng = self.rng(); // Store once
(0..length).map(|_| rng.intn(10).to_string()).collect()
}

Test: fixtures/primitives_test_vectors.json

  • 12 test cases covering all ~75 primitive methods
  • Generated with Go: go test -run TestPrimitivesWithVectors -update

Resource Access:

  • Resources are locale-dependent (e.g., RESOURCES["en_US"])
  • Field naming: TypeSpec uses camelCase, adjust for language conventions
  • Python uses dict access: resources["maleGivenNames"]
  • Java/TS use property access: resources.maleGivenNames

Step 7: Models (Auto-generated via quicktype)

Section titled “Step 7: Models (Auto-generated via quicktype)”

What: Data structures generated from TypeSpec models via quicktype.

Reference:

  • TypeSpec: typespec/src/pseudata.tsp
  • Generation: typespec/package.json scripts

Generation Flow:

Terminal window
# 1. TypeSpec compiles to JSON Schema
tsp compile src
# Output: tsp-output/@typespec/json-schema/User.json, Address.json
# 2. Quicktype generates models from JSON Schema
quicktype --src-lang schema \
tsp-output/@typespec/json-schema/*.json \
-o ../sdks/rust/src/models.rs \
--lang rust \
--visibility public \
--derive-debug

Quicktype Options by Language:

Terminal window
# Go
quicktype --lang go --package pseudata --just-types-and-package
# Java
quicktype --lang java --package dev.pseudata --just-types
# Python
quicktype --lang python --just-types
# TypeScript
quicktype --lang typescript --just-types
# Rust (example)
quicktype --lang rust --visibility public --derive-debug

What quicktype generates:

  • Data structures (User, Address, Resources)
  • JSON serialization/deserialization
  • Type annotations

What quicktype does NOT generate:

  • Primitives interface (custom emitter)
  • Array implementations (custom emitter)
  • Resource data (custom emitter)
  • Generator functions (custom emitter)

Example Output:

#[derive(Serialize, Deserialize, Debug)]
pub struct User {
pub id: String,
pub sub: String,
pub name: String,
pub given_name: String,
pub family_name: String,
pub middle_name: String,
pub nickname: String,
pub preferred_username: String,
pub email: String,
pub gender: String,
pub locale: String,
pub picture: String,
pub profile: String,
pub website: String,
pub email_verified: bool,
pub birthdate: String,
pub zoneinfo: String,
pub phone_number: String,
pub phone_number_verified: bool,
pub updated_at: i64,
}

What: End-to-end verification using fixture-based testing.

Test:

  • fixtures/array_user_test_vectors.json
  • fixtures/array_address_test_vectors.json

TypeSeq Values:

  • 101 - UserArray
  • 110 - AddressArray
  • Custom arrays start from 1024

Array Test Pattern:

// Generated from TypeSpec @array decorator
struct UserArray {
inner: PseudoArray<User>,
}
impl UserArray {
fn new(seed: u64) -> Self {
Self {
inner: PseudoArray::new(seed, 101, generate_user),
}
}
fn at(&self, index: i32) -> User {
self.inner.at(index)
}
}
fn generate_user(world_seed: u64, type_seq: u64, index: i32) -> User {
let p = PrimitivesImpl::new(world_seed, type_seq, index);
User {
id: p.id(),
sub: p.id(),
name: p.composite_user_name(),
given_name: p.gendered_given_name(),
// ... populate all 20 fields
}
}

Pseudata’s core value proposition is deterministic cross-language consistency: the same seed must produce identical data in every language. Fixture-based testing is the only way to guarantee this.

Without fixtures, you could have:

  • Go generating "John Smith" for seed 42
  • Java generating "John Smyth" due to a resource loading bug
  • Each language passing its own tests, but cross-language compatibility broken

Fixtures provide a single source of truth that all languages must match exactly. This catches:

  • RNG state bugs (calling rng() multiple times incorrectly)
  • UTF-8 handling issues (byte vs. rune slicing)
  • Type mapping errors (signed vs. unsigned integers)
  • Resource access bugs (wrong locale fallback)
  • Formatting inconsistencies (date/phone number patterns)

All tests follow the same pattern:

  1. Go generates fixtures with -update flag
  2. All languages read and validate against fixtures
  3. Fixtures are committed to the repository

Test Execution:

Terminal window
# Generate fixtures (from Go)
cd /path/to/pseudata
go test -run TestGeneratorWithVectors -update # pcg32_test_vectors.json
go test -run TestSeedFromVectors -update # seed_test_vectors.json
go test -run TestIDUtilsWithVectors -update # id_test_vectors.json
go test -run TestPrimitivesWithVectors -update # primitives_test_vectors.json
go test -run TestUserArrayWithVectors -update # array_user_test_vectors.json
go test -run TestAddressArrayWithVectors -update # array_address_test_vectors.json
# Run tests (Rust example)
cargo test
use serde::{Deserialize, Serialize};
use std::fs;
#[derive(Deserialize)]
struct TestCase {
id: i32,
description: String,
inputs: TestInputs,
expected: TestExpected,
}
#[test]
fn test_component_with_vectors() {
let fixture = fs::read_to_string("../../fixtures/component_test_vectors.json")
.expect("Run: cd ../../ && go test -update");
let test_cases: Vec<TestCase> = serde_json::from_str(&fixture).unwrap();
for tc in test_cases {
// ... run test
assert_eq!(actual, tc.expected.value, "{} failed", tc.description);
}
}

Create package.json / Cargo.toml / setup.py:

# Cargo.toml example
[package]
name = "pseudata"
version = "0.0.1"
edition = "2021"
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
uuid = "1.0"
[dev-dependencies]
# Test dependencies

Create README.md in SDK directory:

# Pseudata - Rust SDK
Deterministic mock data generation for testing.
## Installation
```toml
[dependencies]
pseudata = "0.0.1"
```
## Quick Start
```rust
use pseudata::{Generator, UserArray, seed_from};
// Generate deterministic users
let seed = seed_from("test-scenario-1");
let users = UserArray::new(seed);
let user = users.at(0);
println!("Email: {}", user.email);
println!("Name: {}", user.name);
```
## Running Tests
```bash
cargo test
```

Once your SDK is implemented and tested locally, integrate it into the automated CI/CD pipeline.

Add a Taskfile.yml in your SDK directory (sdks/your-language/Taskfile.yml):

version: "3"
tasks:
setup:
desc: Install dependencies for your language
cmds:
- # your dependency installation command
# Example: cargo fetch (Rust)
# Example: dotnet restore (C#)
test:
desc: Run tests
cmds:
- # your test command
# Example: cargo test
# Example: dotnet test
lint:
desc: Lint code
cmds:
- # your linter command
# Example: cargo clippy
# Example: dotnet format --verify-no-changes
format:
desc: Format code
cmds:
- # your formatter command
# Example: cargo fmt
# Example: dotnet format
clean:
desc: Clean build artifacts
cmds:
- # your clean command
# Example: cargo clean
# Example: dotnet clean

Add your SDK to the root Taskfile.yml:

includes:
# ... existing includes ...
rust: # your language name
taskfile: ./sdks/rust/Taskfile.yml
dir: ./sdks/rust
tasks:
# Update aggregate commands to include your SDK:
setup:
desc: Install dependencies for ALL languages
deps: [go:setup, python:setup, typescript:setup, java:setup, typespec:setup, rust:setup]
test:
desc: Run tests for ALL languages
deps: [go:test, python:test, typescript:test, java:test, rust:test]
lint:
desc: Run linters for ALL languages
deps: [go:lint, python:lint, typescript:lint, java:lint, rust:lint]
format:
desc: Format ALL code
deps: [go:format, python:format, typescript:format, java:format, typespec:format, rust:format]
clean:
desc: Clean ALL artifacts
deps: [go:clean, python:clean, typescript:clean, java:clean, rust:clean]

The setup-env composite action (.github/actions/setup-env/action.yml) centralizes all toolchain setup. Add your language here:

# Add input parameter (after existing setup-node)
inputs:
setup-rust:
description: "Setup Rust toolchain (only used when all=false)"
required: false
default: "false"
# Add setup step
runs:
using: "composite"
steps:
# ... existing steps ...
- name: Setup Rust
if: inputs.all == 'true' || inputs.setup-rust == 'true'
uses: actions-rust-lang/setup-rust-toolchain@1fbea72663f6d4c03efaab13560c8a24cfd2a7cc # v1.0.0
shell: bash
with:
toolchain: stable
cache: true
# For C#:
- name: Setup .NET
if: inputs.all == 'true' || inputs.setup-dotnet == 'true'
uses: actions/setup-dotnet@6bd8b7f7774af54e05809fcc5431931b3eb1ddee # v4.0.1
shell: bash
with:
dotnet-version: "9.0"
cache: true
cache-dependency-path: "sdks/csharp/YourProject.csproj"

Why Setup-Env Action?

  • Single source of truth: Language versions defined once
  • Consistency: Same setup across CI, validation, and release workflows
  • Maintainability: Update Go version once, applies everywhere
  • Automatic caching: Dependency caching configured per language

Security: All actions MUST be pinned to commit SHAs (not tags):

# ❌ Wrong (tag-based, mutable)
uses: actions-rust-lang/setup-rust-toolchain@v1
# ✅ Correct (SHA-pinned, immutable)
uses: actions-rust-lang/setup-rust-toolchain@1fbea72663f6d4c03efaab13560c8a24cfd2a7cc # v1.0.0

Update .github/workflows/reusable-ci.yml to pass your language flag to setup-env:

steps:
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup Development Environment
uses: ./.github/actions/setup-env
with:
all: "false"
setup-go: ${{ inputs.sdk == 'go' }}
setup-python: ${{ inputs.sdk == 'python' }}
setup-java: ${{ inputs.sdk == 'java' }}
setup-node: ${{ inputs.sdk == 'typescript' || inputs.sdk == 'typespec' }}
setup-rust: ${{ inputs.sdk == 'rust' }} # Add your language

Create .github/workflows/ci-rust.yml (or your language):

# CI (Rust) Workflow
#
# Triggers the shared CI workflow for the Rust SDK.
# This workflow ensures that the Rust codebase (located in sdks/rust)
# passes all standard checks (setup, lint, test) across supported operating systems.
#
# Matrix Optimization & Triggers:
# - on: pull_request: Runs only on [ubuntu-latest] for fast feedback loops.
# - on: push: Expands matrix to [ubuntu-latest, windows-latest, macos-latest]
# to ensure cross-platform compatibility after merging.
name: CI (Rust)
on:
push:
branches: ["main"]
paths: ["sdks/rust/**", "Taskfile.yml", ".github/**"]
pull_request:
branches: ["main"]
paths: ["sdks/rust/**", "Taskfile.yml", ".github/**"]
jobs:
check:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
include:
- os: ${{ github.event_name == 'push' && 'windows-latest' || '' }}
- os: ${{ github.event_name == 'push' && 'macos-latest' || '' }}
exclude:
- os: ""
uses: ./.github/workflows/reusable-ci.yml
with:
sdk: rust
path: sdks/rust
os: ${{ matrix.os }}

Key Points:

  • Name Pattern: Must be CI (YourLanguage) - the aggregator discovers workflows matching ^CI\\(.+\\)
  • Path Filters: Include your SDK directory, root Taskfile, and .github/** (covers all workflow/action changes)
  • Matrix Strategy: Ubuntu-only on PRs, full matrix on push to main
  • fail-fast: false: Tests all platforms even if one fails

Add publishing job to .github/workflows/release.yml:

# Add after existing publish jobs:
# ============================================================================
# JOB: PUBLISH RUST (crates.io)
# Runs ONLY after validation passes.
# ============================================================================
publish-rust:
needs: [release-please, validate] # Must wait for validation!
if: ${{ needs.release-please.outputs.releases_created }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup Development Environment
uses: ./.github/actions/setup-env
with:
all: "false"
setup-rust: "true"
- name: Publish
env:
CARGO_REGISTRY_TOKEN: ${{ secrets.CARGO_TOKEN }}
run: task rust:publish

Key Changes from Old Pattern:

  • Uses setup-env action instead of manual Task + toolchain setup
  • Depends on validate job - ensures quality gate passes first
  • Consistent with other publish jobs - same pattern across all languages

For Other Languages:

# C# / NuGet
publish-csharp:
needs: [release-please, validate]
if: ${{ needs.release-please.outputs.releases_created }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
- name: Setup Development Environment
uses: ./.github/actions/setup-env
with:
all: "false"
setup-dotnet: "true"
- name: Publish
env:
NUGET_API_KEY: ${{ secrets.NUGET_API_KEY }}
run: task csharp:publish

Add required secrets in GitHub repository settings (Settings → Secrets and variables → Actions):

Common Secrets (already configured):

  • PYPI_TOKEN - Python package publishing
  • NPM_TOKEN - TypeScript package publishing
  • MAVEN_CENTRAL_USERNAME, MAVEN_CENTRAL_TOKEN - Java Maven Central (via Central Publishing Maven Plugin)
  • MAVEN_GPG_PRIVATE_KEY, MAVEN_GPG_PASSPHRASE - Java artifact signing

Add for Your Language:

  • Rust: CARGO_TOKEN (from crates.io)
  • C#: NUGET_API_KEY (from nuget.org)
  • Ruby: RUBYGEMS_API_KEY (from rubygems.org)
  • PHP: No secret needed (Packagist auto-syncs from Git tags)
  1. Push to feature branch:

    Terminal window
    git checkout -b test/add-rust-sdk
    git add sdks/rust/ .github/
    git commit -m "feat: add Rust SDK with CI"
    git push origin test/add-rust-sdk
  2. Create Pull Request:

    • CI should trigger automatically
    • Check that CI (Rust) workflow runs
    • Verify tests pass on ubuntu-latest
    • Check that CI aggregator discovers and includes your workflow
  3. After PR approval, merge to main:

    • Full matrix (ubuntu, windows, macos) runs
    • Verify all platforms pass
  4. Test validation workflow (optional):

    Terminal window
    # Manually trigger validation for your branch
    gh workflow run validate.yml --ref test/add-rust-sdk
  5. Test release workflow (after merge):

    • Wait for release-please to create release PR
    • Merge release PR
    • Verify validation runs and passes
    • Check publishing job runs correctly
    • Verify package appears in registry

Update these files to include your language:

Required:

  • Main README.md - Add SDK to language list
  • sdks/README.md - Add SDK documentation link
  • Root Taskfile.yml - Add to aggregate commands (already done in 7.2)
  • .github/actions/setup-env/action.yml - Add toolchain setup (already done in 7.3)

If applicable:

  • typespec/docs/decorators/resources.md - If new resource types added
  • typespec/src/resources.tsp - If new resources defined
  • Cross-language compatibility matrix

  • Generator: PCG32 implementation with fixture tests
  • SeedFrom: FNV-1a hash with fixture tests
  • ID Utils: Encode/decode with fixture tests
  • PseudoArray: Generic PseudoArray base class
  • Primitives: Interface + implementation + fixture tests
  • Models: User, Address data structures (auto-generated via quicktype)
  • Arrays: UserArray + AddressArray + fixture tests
  • Resource Emitter: Embed locale data from typespec/resources/
  • Interface Emitter: Generate Primitives interface
  • Array Emitter: Generate array classes + generator functions
  • Taskfile: Created sdks/your-language/Taskfile.yml with setup/test/lint/format/clean tasks
  • Root Taskfile: Updated to include your language in aggregate commands
  • Setup-Env Action: Added toolchain setup to .github/actions/setup-env/action.yml
  • Reusable CI: Updated to pass language flag to setup-env action
  • CI Workflow: Created .github/workflows/ci-yourlanguage.yml with matrix strategy
  • Release Workflow: Added publish job to .github/workflows/release.yml (depends on validate)
  • Secrets: Configured package registry API tokens in GitHub settings
  • CI Testing: Verified workflow runs on draft PR and passes on all platforms
  • Validation: Manually triggered validate.yml workflow for your branch
  • Package: Published to language package registry
  • SDK README: Installation and usage examples
  • Main README: Updated language listing
  • sdks/README: Updated SDK table
  • Website Docs: Updated language support pages

Wrong: Multiple rng() calls repeat values

fn bad(&self) -> String {
format!("{}{}", self.rng().intn(10), self.rng().intn(10)) // Same digit twice!
}

Correct: Store rng in local variable

fn good(&self) -> String {
let mut rng = self.rng();
format!("{}{}", rng.intn(10), rng.intn(10)) // Different digits
}

Ensure byte-level operations for seed_from, rune/character operations for text primitives.

JavaScript/TypeScript need json-bigint for 64-bit integers.

Java requires f suffix on float literals in generated code.

Match TypeSpec camelCase: maleGivenNames not male_given_names in runtime code.


  • Reference Implementations: Check Go, Java, Python, TypeScript in sdks/
  • Test Vectors: All fixtures in fixtures/ directory
  • TypeSpec: Definitions in typespec/src/
  • GitHub Issues: Report problems or ask questions

Once implemented and tested:

  1. Create Pull Request with SDK in sdks/your-language/
  2. Include CI/CD Integration:
    • Taskfile with all standard tasks
    • CI workflow file
    • Release workflow updates
    • Updated root Taskfile
  3. Update Documentation:
    • Main README.md with language listing
    • sdks/README.md with SDK table entry
    • Cross-language compatibility matrix
    • Website language support pages
  4. Test Thoroughly:
    • All fixture tests pass locally
    • CI passes on all platforms (ubuntu, windows, macos)
    • Release workflow tested on staging branch
  5. Provide Examples:
    • SDK README with installation instructions
    • Usage examples for all major features
    • Package registry publication proof

Review Process:

  • Maintainers will verify cross-language consistency
  • Check CI/CD integration completeness
  • Review code quality and documentation
  • Test package installation from registry

Last Updated: 2025-01-07